Logo Goletty

Key Information Expansion Applied in Spoken Document Classification based on Lattice
Journal Title Journal of Computers
Journal Abbreviation jcp
Publisher Group Academy Publisher
Website http://ojs.academypublisher.com
PDF (809 kb)
   
Title Key Information Expansion Applied in Spoken Document Classification based on Lattice
Authors Xiang, Xue-Zhi; Zhang, Zhuo; Zhang, Lei
Abstract Traditionally, query words or key words in spoken document classification are generated by manual. In this paper, based on CHI-square, TFIDF and maximum poster probability (MPP) features, a new hybrid feature for key information extraction is proposed. It can combine the advantages of these three features, and the weight of each word in hybrid feature can be further integrated into the classification system. Here, the weights of key words can reveal the relationship between words and topic to some extent. Furthermore, when the query words or key words are not enough, key information expansion part based on focus score can be added to dig the latent information about the topic. In the key information expansion part, not only the documents with key words occurring but also the other documents with no key word participate into the expansion procedure. Additionally, in the classification system, document length as prior information is adopted when no query is found. The whole classification system is based on lattice, which has more information than 1-best result in speech recognition system. Among CHI-square, TFIDF and MPP, the system performance of MPP is a little worse than the others. CHI-square is a little better than TFIDF when the key words number is increasing. Among these feature, hybrid feature can almost obtain the best performance under the same condition. Combined with document length information, the classification system performance is further enhanced, especially for less key information condition. Experiments show that when the system is combined weight and document length information, hybrid feature can obtain the best performance with a MAP of 0.7817 under 50 key words. When key information is not enough, key information expansion can improve the system performance when only 1, 5, 10 key words here. In the proposed key information expansion approach, since the focus factor is introduced to adjust the effect of documents with no key words, some function words can be avoided to some extent, and the number of expansion words can be under control.
Publisher ACADEMY PUBLISHER
Date 2011-05-03
Source Journal of Computers Vol 6, No 5 (2011): Special Issue: Selected Best Papers of the International Workshop on CSEEE 2011
Rights Copyright © ACADEMY PUBLISHER - All Rights Reserved.To request permission, please check out URL: http://www.academypublisher.com/copyrightpermission.html.

 

See other article in the same Issue


Goletty © 2024