Authors
Guillaume Marçais, David Pellow, Daniel Bork, Yaron Orenstein, Ron Shamir, Carl Kingsford
Publication date
2017/7/12
Journal
Bioinformatics
Volume
33
Issue
14
Pages
i110-i117
Publisher
Oxford University Press
Description
Motivation
The minimizers scheme is a method for selecting k-mers from sequences. It is used in many bioinformatics software tools to bin comparable sequences or to sample a sequence in a deterministic fashion at approximately regular intervals, in order to reduce memory consumption and processing time. Although very useful, the minimizers selection procedure has undesirable behaviors (e.g. too many k-mers are selected when processing certain sequences). Some of these problems were already known to the authors of the minimizers technique, and the natural lexicographic ordering of k-mers used by minimizers was recognized as their origin. Many software tools using minimizers employ ad hoc variations of the lexicographic order to alleviate those issues.
Results
We provide an in-depth analysis of the effect of k-mer ordering on the performance of the minimizers …
Total citations
20182019202020212022202320244791318126
Scholar articles
G Marçais, D Pellow, D Bork, Y Orenstein, R Shamir… - Bioinformatics, 2017