View article

[PDF] from academia.edu

A statistical method for system evaluation using incomplete judgments

Authors

Javed A Aslam, Virgil Pavlu, Emine Yilmaz

Publication date

2006/8/6

Book

Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Pages

541-548

Description

We consider the problem of large-scale retrieval evaluation, and we propose a statistical method for evaluating retrieval systems using incomplete judgments. Unlike existing techniques that (1) rely on effectively complete, and thus prohibitively expensive, relevance judgment sets, (2) produce biased estimates of standard performance measures, or (3) produce estimates of non-standard measures thought to be correlated with these standard measures, our proposed statistical technique produces unbiased estimates of the standard measures themselves.Our proposed technique is based on random sampling. While our estimates are unbiased by statistical design, their variance is dependent on the sampling distribution employed; as such, we derive a sampling distribution likely to yield low variance estimates. We test our proposed technique using benchmark TREC data, demonstrating that a sampling pool derived …

Total citations

Cited by 210

20062007200820092010201120122013201420152016201720182019202020212022202320242 22 16 18 22 11 9 18 12 11 14 14 8 7 8 7 5 5

Scholar articles

A statistical method for system evaluation using incomplete judgments

JA Aslam, V Pavlu, E Yilmaz - Proceedings of the 29th annual international ACM …, 2006