Authors
Yuerong Hu, Ming Jiang, Ted Underwood, J Stephen Downie
Publication date
2020/8/1
Book
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020
Pages
405-408
Description
This paper investigates the limitations and challenges of the curated datasets provided by digital libraries in support of digital humanities (DH) research. Our presented work provides a use case utilizing an English literature dataset of 178,381 volumes curated by the HathiTrust Research Center (HTRC) for measuring the change of three literature genres. These volumes were selected from over 17 million digitized items in the HathiTrust Digital Library. We demonstrate our methods and workflow for improving the representativeness and scholarly usability of the existing datasets. We analyzed and effectively overcame three common limitations: duplicate volumes, uneven distribution of data and OCR errors. We suggest that stakeholders of digital libraries should flag and address these limitations to improve their provisions' usability in the context of digital humanities research.
Total citations
20212022202320242111
Scholar articles
Y Hu, M Jiang, T Underwood, JS Downie - Proceedings of the ACM/IEEE Joint Conference on …, 2020