Authors
Thaer Samar, Myriam C Traub, Jacco van Ossenbruggen, Lynda Hardman, Arjen P de Vries
Publication date
2018/3
Journal
International Journal on Digital Libraries
Volume
19
Pages
57-75
Publisher
Springer Berlin Heidelberg
Description
A Web archive usually contains multiple versions of documents crawled from the Web at different points in time. One possible way for users to access a Web archive is through full-text search systems. However, previous studies have shown that these systems can induce a bias, known as the retrievability bias, on the accessibility of documents in community-collected collections (such as TREC collections). This bias can be measured by analyzing the distribution of the retrievability scores for each document in a collection, quantifying the likelihood of a document’s retrieval. We investigate the suitability of retrievability scores in retrieval systems that consider every version of a document in a Web archive as an independent document. We show that the retrievability of documents can vary for different versions of the same document and that retrieval systems induce biases to different extents. We quantify this bias …
Total citations
2017201820192020202120222023202413324431
Scholar articles
T Samar, MC Traub, J van Ossenbruggen, L Hardman… - International Journal on Digital Libraries, 2018