Authors
Shantanu Godbole
Publication date
2002/1
Journal
Annual Progress Report, Indian Institute of Technology–Bombay, India
Description
A common way to evaluate a multi-way classifier is a confusion matrix that plots, for each of the learned concepts, the true class of test instances against the predicted classes. Aggregate accuracy figures of the classifier are obtained by summing up the diagonal entries of the confusion matrix. However, invaluable information about the relationships amongst classes is often ignored. In this report we show various ways in which the notion of similarity amongst subsets of classes from the confusion matrix can be exploited.
First, we provide a mechanism of generating more meaningful intermediate levels of hierarchies in large flat sets of classes. This provides valuable navigational aid in browsing large text collections like Internet directories. Second, we show how large multi-class classification tasks can be scaled up with the number of classes. This angle to text classification has been ignored so far in much existing work. New methods like Support Vector Machines have high accuracy but are expensive to run, do not scale to large number of classes, and are not inherently designed for multi-class tasks. We propose a two stage scheme where a confusion matrix from a fast, mediocre accuracy classifier like naive Bayes can be used to derive a graph, where classes are linked to each other based on their degree of confusion with each other. For each class we then identify a sub graph where classes confuse with it. We have now broken up the initial large multi-class problem into smaller sub tasks where, for each class only its relevant sub graph needs to be considered. We use high accuracy, expensive classifiers like SVMs for these sub tasks. The …
Total citations
200220032004200520062007200820092010201120122013201420152016201720182019202020212022202320241112132112163114211