• KSII Transactions on Internet and Information Systems
    Monthly Online Journal (eISSN: 1976-7277)

A Novel Feature Selection Method in the Categorization of Imbalanced Textual Data

Vol. 12, No.8, August 31, 2018
10.3837/tiis.2018.08.010, Download Paper (Free):

Abstract

Text data distribution is often imbalanced. Imbalanced data is one of the challenges in text classification, as it leads to the loss of performance of classifiers. Many studies have been conducted so far in this regard. The proposed solutions are divided into several general categories, include sampling-based and algorithm-based methods. In recent studies, feature selection has also been considered as one of the solutions for the imbalance problem. In this paper, a novel one-sided feature selection known as probabilistic feature selection (PFS) was presented for imbalanced text classification. The PFS is a probabilistic method that is calculated using feature distribution. Compared to the similar methods, the PFS has more parameters. In order to evaluate the performance of the proposed method, the feature selection methods including Gini, MI, FAST and DFS were implemented. To assess the proposed method, the decision tree classifications such as C4.5 and Naive Bayes were used. The results of tests on Reuters-21875 and WebKB figures per F-measure suggested that the proposed feature selection has significantly improved the performance of the classifiers.


Statistics

Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2015)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article

[IEEE Style]
Jafar Pouramini, Behrouze Minaei-Bidgoli and Mahdi Esmaeili, "A Novel Feature Selection Method in the Categorization of Imbalanced Textual Data," KSII Transactions on Internet and Information Systems, vol. 12, no. 8, pp. 3725-3748, 2018. DOI: 10.3837/tiis.2018.08.010

[ACM Style]
Pouramini, J., Minaei-Bidgoli, B., and Esmaeili, M. 2018. A Novel Feature Selection Method in the Categorization of Imbalanced Textual Data. KSII Transactions on Internet and Information Systems, 12, 8, (2018), 3725-3748. DOI: 10.3837/tiis.2018.08.010