Optimal Feature Subset Selection Based on Combining Document Frequency and Term Frequency for Text Classification

Authors

  • Thirumoorthy Karpagalingam Department of Computer Science and Engineering, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu, India
  • Muneeswaran Karuppaiah Department of Computer Science and Engineering, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu, India

DOI:

https://doi.org/10.31577/cai_2020_5_881

Keywords:

Feature selection, text classification, document frequency, term frequency

Abstract

Feature selection plays a vital role to reduce the high dimension of the feature space in the text document classification problem. The dimension reduction of feature space reduces the computation cost and improves the text classification system accuracy. Hence, the identification of a proper subset of the significant features of the text corpus is needed to classify the data in less computational time with higher accuracy. In this proposed research, a novel feature selection method which combines the document frequency and the term frequency (FS-DFTF) is used to measure the significance of a term. The optimal feature subset which is selected by our proposed work is evaluated using Naive Bayes and Support Vector Machine classifier with various popular benchmark text corpus datasets. The experimental outcome confirms that the proposed method has a better classification accuracy when compared with other feature selection techniques.

Downloads

Download data is not yet available.

Downloads

Published

2021-03-25

How to Cite

Karpagalingam, T., & Karuppaiah, M. (2021). Optimal Feature Subset Selection Based on Combining Document Frequency and Term Frequency for Text Classification. Computing and Informatics, 39(5), 881–906. https://doi.org/10.31577/cai_2020_5_881