Logo Goletty

Index-based Online Text Classification for SMS Spam Filtering
Journal Title Journal of Computers
Journal Abbreviation jcp
Publisher Group Academy Publisher
Website http://ojs.academypublisher.com
PDF (1,153 kb)
   
Title Index-based Online Text Classification for SMS Spam Filtering
Authors Wang, Ting; Liu, Wuying
Abstract We proposed a novel index-based online text classification method, investigated two index models, and compared the performances of various index granularities for English and Chinese SMS message. Based on the proposed method, six individual classifiers were implemented according to various text features of Chinese message, which were further combined to form an ensemble classifier. The experimental results from English corpus show that the relevant feature among words can increase the classification confidence and the trigram co-occurrence feature of words is an appropriate relevant feature. The experimental results from real Chinese corpus show that the performance of classifier applying word-level index model is better than the one applying document-level index model. The trigram segment outperforms the exact segment in indexing, so it is not necessary to segment Chinese text exactly when indexing by our proposed method. Applying parallel multi-thread ensemble learning, our proposed method has constant time complexity, which is critical to large scale data and online filtering.
Publisher ACADEMY PUBLISHER
Date 2010-06-01
Source Journal of Computers Vol 5, No 6 (2010): Special Issue: Recent Advances in Computer Science and Engineering - Track on C
Rights Copyright © ACADEMY PUBLISHER - All Rights Reserved.To request permission, please check out URL: http://www.academypublisher.com/copyrightpermission.html.

 

See other article in the same Issue


Goletty © 2024