Authors
Martin Potthast, Benno Stein, Alberto Barrón-Cedeño, Paolo Rosso
Publication date
2010/8/23
Conference
Proceedings of the 23rd international conference on computational linguistics: Posters
Pages
997-1005
Publisher
Association for Computational Linguistics
Description
We present an evaluation framework for plagiarism detection. 1 The framework provides performance measures that address the specifics of plagiarism detection, and the PAN-PC-10 corpus, which contains 64 558 artificial and 4000 simulated plagiarism cases, the latter generated via Amazon’s Mechanical Turk. We discuss the construction principles behind the measures and the corpus, and we compare the quality of our corpus to existing corpora. Our analysis gives empirical evidence that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.
Total citations
20102011201220132014201520162017201820192020202120222023202472428483533494528292320271410
Scholar articles
M Potthast, B Stein, A Barrón-Cedeño, P Rosso - Coling 2010: Posters, 2010