Authors
Timothy C Hoad, Justin Zobel
Publication date
2003/2/1
Journal
Journal of the American society for information science and technology
Volume
54
Issue
3
Pages
203-215
Publisher
Wiley Subscription Services, Inc., A Wiley Company
Description
The widespread use of on‐line publishing of text promotes storage of multiple versions of documents and mirroring of documents in multiple locations, and greatly simplifies the task of plagiarizing the work of others. We evaluate two families of methods for searching a collection to find documents that are coderivative, that is, are versions or plagiarisms of each other. The first, the ranking family, uses information retrieval techniques; extending this family, we propose the identity measure, which is specifically designed for identification of coderivative documents. The second, the fingerprinting family, uses hashing to generate a compact document description, which can then be compared to the fingerprints of the documents in the collection. We introduce a new method for evaluating the effectiveness of these techniques, and demonstrate it in practice. Using experiments on two collections, we demonstrate that the …
Total citations
20032004200520062007200820092010201120122013201420152016201720182019202020212022202320247108232523302443394636374117211821121382
Scholar articles
TC Hoad, J Zobel - Journal of the American society for information science …, 2003