View article

[PDF] from aclanthology.org

An evaluation framework for plagiarism detection

Authors

Martin Potthast, Benno Stein, Alberto Barrón-Cedeño, Paolo Rosso

Publication date

2010/8/23

Conference

Proceedings of the 23rd international conference on computational linguistics: Posters

Pages

997-1005

Publisher

Association for Computational Linguistics

Description

We present an evaluation framework for plagiarism detection. 1 The framework provides performance measures that address the specifics of plagiarism detection, and the PAN-PC-10 corpus, which contains 64 558 artificial and 4000 simulated plagiarism cases, the latter generated via Amazon’s Mechanical Turk. We discuss the construction principles behind the measures and the corpus, and we compare the quality of our corpus to existing corpora. Our analysis gives empirical evidence that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.

Total citations

Cited by 428

2010201120122013201420152016201720182019202020212022202320247 24 28 48 35 33 49 45 28 29 23 20 27 14 10

Scholar articles

An evaluation framework for plagiarism detection

M Potthast, B Stein, A Barrón-Cedeño, P Rosso - Coling 2010: Posters, 2010