Analysis and Improved Recognition of Protein Names Using Transductive SVM
|
Title | Analysis and Improved Recognition of Protein Names Using Transductive SVM |
Authors | |
Abstract | We first analyzed protein names using various dictionaries and databases and found five problems with protein names; i.e., the treatment of special characters, the treatment of homonyms, cases where the protein-name string may be a substring of a different protein-name string, cases where one protein exists in different organisms, and the treatment of modifiers. We confirmed that we could use a machine-learning approach to recognizing protein names to solve these problems. Thus, machine-learning methods have recently been used in research to recognize protein names. A classifier trained in a specific domain, however, can cause overfitting and be so inflexible that it can only be used in that domain. We therefore developed a new corpus on breast cancer and investigated the flexibility of classifiers trained on the GENIA [1] or the breast-cancer corpora. We used a transductive support vector machine (SVM) to avoid overfitting, and we evaluated the effect of transductive learning. We found that transductive SVM prevented overfitting in experiments and yielded higher accuracies than were obtained from the conventional SVM. The transductive SVM increased the F-scores (70.46 to 79.64 and 70.63 to 74.61) in our two experiments for the criterion of “Sub” that we define in this paper. |
Publisher | ACADEMY PUBLISHER |
Date | 2008-01-01 |
Source | Journal of Computers Vol 3, No 1 (2008) |
Rights | Copyright © ACADEMY PUBLISHER - All Rights Reserved.To request permission, please check out URL: http://www.academypublisher.com/copyrightpermission.html. |