• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

The Sequence Labeling Approach for Text Alignment of Plagiarism Detection

Vol. 13, No. 9, September 29, 2019
10.3837/tiis.2019.09.026, Download Paper (Free):

Abstract

Plagiarism detection is increasingly exploiting text alignment. Text alignment involves extracting the plagiarism passages in a pair of the suspicious document and its source document. The heuristics have achieved excellent performance in text alignment. However, the further improvements of the heuristic methods mainly depends more on the experiences of experts, which makes the heuristics lack of the abilities for continuous improvements. To address this problem, machine learning maybe a proper way. Considering the position relations and the context of text segments pairs, we formalize the text alignment task as a problem of sequence labeling, improving the current methods at the model level. Especially, this paper proposes to use the probabilistic graphical model to tag the observed sequence of pairs of text segments. Hence we present the sequence labeling approach for text alignment in plagiarism detection based on Conditional Random Fields. The proposed approach is evaluated on the PAN@CLEF 2012 artificial high obfuscation plagiarism corpus and the simulated paraphrase plagiarism corpus, and compared with the methods achieved the best performance in PAN@CLEF 2012, 2013 and 2014. Experimental results demonstrate that the proposed approach significantly outperforms the state of the art methods.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
L. Kong, Z. Han, H. Qi, "The Sequence Labeling Approach for Text Alignment of Plagiarism Detection," KSII Transactions on Internet and Information Systems, vol. 13, no. 9, pp. 4814-4832, 2019. DOI: 10.3837/tiis.2019.09.026.

[ACM Style]
Leilei Kong, Zhongyuan Han, and Haoliang Qi. 2019. The Sequence Labeling Approach for Text Alignment of Plagiarism Detection. KSII Transactions on Internet and Information Systems, 13, 9, (2019), 4814-4832. DOI: 10.3837/tiis.2019.09.026.

[BibTeX Style]
@article{tiis:22224, title="The Sequence Labeling Approach for Text Alignment of Plagiarism Detection", author="Leilei Kong and Zhongyuan Han and Haoliang Qi and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2019.09.026}, volume={13}, number={9}, year="2019", month={September}, pages={4814-4832}}