Logo Goletty

Domain-aware Evaluation of Named Entity Recognition Systems for Croatian
Journal Title CIT. Journal of Computing and Information Technology
Journal Abbreviation CIT
Publisher Group University of Zagreb
Website http://cit.srce.unizg.hr/index.php/CIT
PDF (449 kb)
   
Title Domain-aware Evaluation of Named Entity Recognition Systems for Croatian
Authors Agic, Zeljko; Bekavac, Bozo
Abstract We provide an evaluation of the currently available named entity recognition systems for Croatian. The evaluation puts special emphasis on domain dependence. To this goal, we manually annotated a dataset of approximately 1 million tokens of Croatian text from various domains within the newspaper text genre. The dataset was annotated using a three-class named entity tagset – denoting personal names, locations and organizations. We give insight to feature selection, domain sensitivity and effects of increase in training set size for statistical named entity recognition using the state-of-the-art Stanford NER system. We also sketch a comparison of publicly available named entity recognition systems for Croatian considering domain dependence, regardless of their underlying paradigms. Our top-performing system achieved an F1-score of 0.884 in a mixed-domain testing scenario, scoring 0.925 and 0.843 in the two domains separated for the experiment. The system shows consistency in state-of-the-art scores for detecting names of persons, locations and organizations.
Publisher University of Zagreb, University Computing Centre - SRCE
Date 2013-10-31
Source Journal of Computing and Information Technology Vol 21, No 3 (2013)
Rights CIT. Journal of Computing and Information Technology is an open access journal. Authors who publish with this journal agree to the following terms:Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work´s authorship and initial publication in this journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal´s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

 

See other article in the same Issue


Goletty © 2024