Authors
Guillaume Marçais, Carl Kingsford
Publication date
2011/3/15
Journal
Bioinformatics
Volume
27
Issue
6
Pages
764-770
Publisher
Oxford University Press
Description
Motivation: Counting the number of occurrences of every k-mer (substring of length k) in a long string is a central subproblem in many applications, including genome assembly, error correction of sequencing reads, fast multiple sequence alignment and repeat detection. Recently, the deep sequence coverage generated by next-generation sequencing technologies has caused the amount of sequence to be processed during a genome project to grow rapidly, and has rendered current k-mer counting tools too slow and memory intensive. At the same time, large multicore computers have become commonplace in research facilities allowing for a new parallel computational paradigm.
Results: We propose a new k-mer counting algorithm and associated implementation, called Jellyfish, which is fast and memory efficient. It is based on a multithreaded, lock-free hash table optimized for counting k …
Total citations
201120122013201420152016201720182019202020212022202320249184696111151217272290379478531550413