Logo Goletty

On Separation of English Numerals from Multilingual Document Images
Journal Title Journal of Multimedia
Journal Abbreviation jmm
Publisher Group Academy Publisher
Website http://ojs.academypublisher.com
PDF (417 kb)
   
Title On Separation of English Numerals from Multilingual Document Images
Authors Dhandra, Basanna V.; Hangarge, Mallikarjun
Abstract For Optical Character Recognition (OCR) of bilingual or multilingual document containing text words in regional language and numerals in English, it is necessary to identify different script forms before running an individual OCR of the scripts. In this paper, an attempt is made for separation of English numerals at word level from bilingual and trilingual documents representing Kannada, Devnagari, Tamil, Odiya and Malayalam scripts by using discriminating features such as aspect ratio, strokes densities, eccentricity, etc. as a tool. The k-nearest neighbour algorithm is used to classify the new word images and the algorithm is tested on 6000 sample words with a five fold cross validation test. The algorithm is robust with respect to font styles, sizes and noise. The results obtained are quite encouraging.
Publisher ACADEMY PUBLISHER
Date 2007-11-01
Source Journal of Multimedia Vol 2, No 6 (2007)
Rights Copyright © ACADEMY PUBLISHER - All Rights Reserved.To request permission, please check out URL: http://www.academypublisher.com/copyrightpermission.html. 

 

See other article in the same Issue


Goletty © 2024