Authors
Konstantinos Sechidis, Borja Calvo, Gavin Brown
Publication date
2014/9/15
Conference
ECML/PKDD (3)
Pages
66-81
Description
We propose a set of novel methodologies which enable valid statistical hypothesis testing when we have only positive and unlabelled (PU) examples. This type of problem, a special case of semi-supervised data, is common in text mining, bioinformatics, and computer vision. Focusing on a generalised likelihood ratio test, we have 3 key contributions: (1) a proof that assuming all unlabelled examples are negative cases is sufficient for independence testing, but not for power analysis activities; (2) a new methodology that compensates this and enables power analysis, allowing sample size determination for observing an effect with a desired power; and finally, (3) a new capability, supervision determination, which can determine a-priori the number of labelled examples the user must collect before being able to observe a desired statistical effect. Beyond general hypothesis testing, we suggest the tools will …
Total citations
2015201620172018201920202021202220232024514314111
Scholar articles
K Sechidis, B Calvo, G Brown - Machine Learning and Knowledge Discovery in …, 2014