View article

[PDF] from ufrgs.br

Balancing training data for automated annotation of keywords: a case study

Authors

GEAPA Batista, AL Bazan, Maria Carolina Monard

Publication date

2003/12

Conference

Proceedings of the Second Brazilian Workshop on Bioinformatics

Pages

35-43

Description

There has been an increasing interest in tools for automating the annotation of databases. Machine learning techniques are promising candidates to help curators to, at least, guide the process of annotation which is mostly done manually. Following previous works on automated annotation using symbolic machine learning techniques, the present work deals with a common problem in machine learning: that classes usually have skewed class prior probabilities, ie, there is a large number of examples of one class compared with just few examples of the other class. This happens due to the fact that a large number of proteins is not annotated for every feature. Thus, we analyze and employ some techniques aiming at balancing the training data. Our experiments show that the classifiers induced from balanced data sampled with our method are more accurate than those induced from the original data.

Total citations

Cited by 494

20092010201120122013201420152016201720182019202020212022202320245 3 2 2 5 3 10 21 28 30 50 76 89 104 57

Scholar articles

Balancing training data for automated annotation of keywords: a case study.

GE Batista, ALC Bazzan, MC Monard - Wob, 2003