Authors
Michael Gribskov, Nina L Robinson
Publication date
1996/3/1
Journal
Computers & chemistry
Volume
20
Issue
1
Pages
25-33
Publisher
Pergamon
Description
In this paper, we borrow the idea of the receiver operating characteristic (ROC) from clinical medicine and demonstrate its application to sequence comparison. The ROC includes elements of both sensitivity and specificity, and is a quantitative measure of the usefulness of a diagnostic. The ROC is used in this work to investigate the effects of scoring table and gap penalties on database searches. Studies on three families of proteins, 4Fe-4S ferredoxins, lysR bacterial regulatory proteins, and bacterial RNA polymerase σ-factors lead to the following conclusions: sequence families are quite idiosyncratic, but the best PAM distance for database searches using the Smith-Waterman method is somewhat larger than predicted by theoretical methods, about 200 PAM. The length independent gap penalty (gap initation penalty) is quite important, but shows a broad peak at values of about 20–24. The length dependent gap …
Total citations
1997199819992000200120022003200420052006200720082009201020112012201320142015201620172018201920202021202220232024684261125346035565441353723231919211512161091182