• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

A New Fine-grain SMS Corpus and Its Corresponding Classifier Using Probabilistic Topic Model

Vol. 12, No. 2, February 27, 2018
10.3837/tiis.2018.02.004, Download Paper (Free):

Abstract

Nowadays, SMS spam has been overflowing in many countries. In fact, the standards of filtering SMS spam are different from country to country. However, the current technologies and researches about SMS spam filtering all focus on dividing SMS message into two classes: legitimate and illegitimate. It does not conform to the actual situation and need. Furthermore, they are facing several difficulties, such as: (1) High quality and large-scale SMS spam corpus is very scarce, fine categorized SMS spam corpus is even none at all. This seriously handicaps the researchers’ studies. (2) The limited length of SMS messages lead to lack of enough features. These factors seriously degrade the performance of the traditional classifiers (such as SVM, K-NN, and Bayes). In this paper, we present a new fine categorized SMS spam corpus which is unique and the largest one as far as we know. In addition, we propose a classifier, which is based on the probability topic model. The classifier can alleviate feature sparse problem in the task of SMS spam filtering. Moreover, we compare the approach with three typical classifiers on the new SMS spam corpus. The experimental results show that the proposed approach is more effective for the task of SMS spam filtering.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
J. Ma, Y. Zhang, Z. Wang, B. Chen, "A New Fine-grain SMS Corpus and Its Corresponding Classifier Using Probabilistic Topic Model," KSII Transactions on Internet and Information Systems, vol. 12, no. 2, pp. 604-625, 2018. DOI: 10.3837/tiis.2018.02.004.

[ACM Style]
Jialin Ma, Yongjun Zhang, Zhijian Wang, and Bolun Chen. 2018. A New Fine-grain SMS Corpus and Its Corresponding Classifier Using Probabilistic Topic Model. KSII Transactions on Internet and Information Systems, 12, 2, (2018), 604-625. DOI: 10.3837/tiis.2018.02.004.

[BibTeX Style]
@article{tiis:21673, title="A New Fine-grain SMS Corpus and Its Corresponding Classifier Using Probabilistic Topic Model", author="Jialin Ma and Yongjun Zhang and Zhijian Wang and Bolun Chen and ", journal="KSII Transactions on Internet and Information Systems", DOI={10.3837/tiis.2018.02.004}, volume={12}, number={2}, year="2018", month={February}, pages={604-625}}