Authors
M Bidartondo, TD Bruns, M Blackwell, I Edwards, AFS Taylor, T Horton, N Zhang, U Koljalg, G May, TW Kuyper, and others
Publication date
2008
Journal
Science
Volume
319
Pages
1666
Description
GenBank, the public repository for nucleotide and protein sequences, is a critical resource for molecular biology, evolutionary biology, and ecology. While some attention has been drawn to sequence errors (1), common annotation errors also reduce the value of this database. In fact, for organisms such as fungi, which are notoriously difficult to identify, up to 20% of DNA sequence records may have erroneous lineage designations in GenBank (2). Gene function annotation in protein sequence databases is similarly error-prone (3, 4). Because identity and function of new sequences are often determined by bioinformatic analyses, both types of errors are propagated into new accessions, leading to long-term degradation of the quality of the database.
Currently, primary sequence data are annotated by the authors of those data, and can only be reannotated by the same authors. This is inefficient and unsustainable over …
Total citations
20082009201020112012201320142015201620172018201920202021202220232024417162420231911221513101719141510
Scholar articles