Authors
Myriam C Traub, Jacco Van Ossenbruggen, Lynda Hardman
Publication date
2015
Conference
Research and Advanced Technology for Digital Libraries: 19th International Conference on Theory and Practice of Digital Libraries, TPDL 2015, Poznań, Poland, September 14-18, 2015, Proceedings 19
Pages
252-263
Publisher
Springer International Publishing
Description
Humanities scholars increasingly rely on digital archives for their research instead of time-consuming visits to physical archives. This shift in research method has the hidden cost of working with digitally processed historical documents: how much trust can a scholar place in noisy representations of source texts? In a series of interviews with historians about their use of digital archives, we found that scholars are aware that optical character recognition (OCR) errors may bias their results. They were, however, unable to quantify this bias or to indicate what information they would need to estimate it. This, however, would be important to assess whether the results are publishable. Based on the interviews and a literature study, we provide a classification of scholarly research tasks that gives account of their susceptibility to specific OCR-induced biases and the data required for uncertainty estimations. We …
Total citations
20152016201720182019202020212022202320241482814813163
Scholar articles
MC Traub, J Van Ossenbruggen, L Hardman - Research and Advanced Technology for Digital …, 2015