Active Multi-Field Learning for Spam Filtering

Authors

  • Wuying Liu School of Foreign Language, Linyi University, 276005 Linyi, Shandong
  • Lin Wang School of Foreign Language, Linyi University, 276005 Linyi, Shandong
  • Mianzhu Yi School of Foreign Language, Linyi University, 276005 Linyi, Shandong
  • Nan Xie School of Foreign Language, Linyi University, 276005 Linyi, Shandong

Keywords:

Spam filtering, active multi-field learning, email spam, short message service spam, TREC spam track

Abstract

Ubiquitous spam messages cause a serious waste of time and resources. This paper addresses the practical spam filtering problem, and proposes a universal approach to fight with various spam messages. The proposed active multi-field learning approach is based on: 1) It is cost-sensitive to obtain a label for a real-world spam filter, which suggests an active learning idea; and 2) Different messages often have a similar multi-field text structure, which suggests a multi-field learning idea. The multi-field learning framework combines multiple results predicted from field classifiers by a novel compound weight, and each field classifier calculates the arithmetical average of multiple conditional probabilities predicted from feature strings according to a data structure of string-frequency index. Comparing the current variance of field classifying results with the historical variance, the active learner evaluates the classifying confidence and regards the more uncertain message as the more informative sample for which to request a label. The experimental results show that the proposed approach can achieve the state-of-the-art performance at greatly reduced label requirements both in email spam filtering and short text spam filtering. Our active multi-field learning performance, the standard (1-ROCA) % measurement, even exceeds the full feedback performance of some advanced individual classifying algorithm.

Downloads

Download data is not yet available.

Downloads

How to Cite

Liu, W., Wang, L., Yi, M., & Xie, N. (2015). Active Multi-Field Learning for Spam Filtering. Computing and Informatics, 33(6), 1400–1427. Retrieved from http://147.213.75.17/ojs/index.php/cai/article/view/2819