Authors
Myriam C Traub, Thaer Samar, Jacco Van Ossenbruggen, Jiyin He, Arjen de Vries, Lynda Hardman
Publication date
2016/6/19
Book
Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries
Pages
7-16
Description
Bias in the retrieval of documents can directly influence the information access of a digital library. In the worst case, systematic favoritism for a certain type of document can render other parts of the collection invisible to users. This potential bias can be evaluated by measuring the retrievability for all documents in a collection. Previous evaluations have been performed on TREC collections using simulated query sets. The question remains, however, how representative this approach is of more realistic settings.
To address this question, we investigate the effectiveness of the retrievability measure using a large digitized newspaper corpus, featuring two characteristics that distinguishes our experiments from previous studies: (1) compared to TREC collections, our collection contains noise originating from OCR processing, historical spelling and use of language; and (2) instead of simulated queries, the collection comes …
Total citations
2017201820192020202120222023202436254242
Scholar articles
MC Traub, T Samar, J Van Ossenbruggen, J He… - Proceedings of the 16th ACM/IEEE-CS on Joint …, 2016