Authors
GEAPA Batista, AL Bazan, Maria Carolina Monard
Publication date
2003/12
Conference
Proceedings of the Second Brazilian Workshop on Bioinformatics
Pages
35-43
Description
There has been an increasing interest in tools for automating the annotation of databases. Machine learning techniques are promising candidates to help curators to, at least, guide the process of annotation which is mostly done manually. Following previous works on automated annotation using symbolic machine learning techniques, the present work deals with a common problem in machine learning: that classes usually have skewed class prior probabilities, ie, there is a large number of examples of one class compared with just few examples of the other class. This happens due to the fact that a large number of proteins is not annotated for every feature. Thus, we analyze and employ some techniques aiming at balancing the training data. Our experiments show that the classifiers induced from balanced data sampled with our method are more accurate than those induced from the original data.
Total citations
20092010201120122013201420152016201720182019202020212022202320245322531021283050768910457