Utilizing High-quality Feature Extension Mode to Classify Chinese Short-text
|
Title | Utilizing High-quality Feature Extension Mode to Classify Chinese Short-text |
Authors | |
Abstract | This paper presents a method of classifying Chinese short-texts that have weak concept signal, in which high-quality feature extension modes are extracted and used effectively. In the method, a feature extension mode is considered as a set of terms that have co-occurrence relationship in the training data, and three measures that decide whether it is high-quality, i.e., Confidence, category homoplasy and relevancy strength, are presented. Then, an algorithm, which extracts high-quality feature extension modes from training data, is designed. Next, Chinese short-text classification algorithm utilizing feature extension modes is presented, in which a short-text is extended by adding new features or modifying the weights of initial features, according to the relationship between non-feature term and feature extension mode. The experiments show that (1) A high-quality feature extension mode is helpful to improve Chinese short-text classification; (2) the proposed method can obtain a higher classification performance comparing with the conventional text classification methods |
Publisher | ACADEMY PUBLISHER |
Date | 2010-12-01 |
Source | Journal of Networks Vol 5, No 12 (2010) |
Rights | Copyright © ACADEMY PUBLISHER - All Rights Reserved.To request permission, please check out URL: http://www.academypublisher.com/copyrightpermission.html. |