Authors
Federica Mandreoli, Riccardo Martoglia, Paolo Tiberio
Publication date
2004/11
Journal
International Journal on Digital Libraries
Volume
4
Pages
223-244
Publisher
Springer Berlin Heidelberg
Description
The ever-growing volumes of textual information from various sources have fostered the development of digital libraries, making digital content readily accessible but also easy for malicious users to plagiarize, thus giving rise to security problems. In this paper, we introduce a duplicate detection scheme that is able to determine, with a particularly high accuracy, the degree to which one document is similar to another. Our pairwise document comparison scheme detects the resemblance between the content of documents by considering document chunks, representing contexts of words selected from the text. The resulting duplicate detection technique presents a good level of security in the protection of intellectual property while improving the availability of the data stored in the digital library and the correctness of the search results. Finally, the paper addresses efficiency and scalability issues by introducing …
Total citations
2006200720082009201020112012201320142015201620172018201920202021111211111
Scholar articles
F Mandreoli, R Martoglia, P Tiberio - International Journal on Digital Libraries, 2004