WORD CANDIDATE GENERATION IN CYRILLIC OCR BASED ON ALN CLASSIFIERS

Dejan Gorgevik, Dragan Mihajlov, and Ljupco Josifovski

Abstract: The process of recognizing characters from a scanned image can be divided into three main operational steps: document layout analysis, character recognition, and contextual postprocessing. This paper addresses the step of contextual postprocessing using the extended information received from the pattern classifiers as well as the information about the pattern preclassification according to its shape. Every pattern is examined by several classifiers, and their decisions are combined in a list of character candidates along with their levels of confidence. Combining the character candidates on different letter positions generates a list of word candidates and the lexicon is being checked for their existence. Word candidates are generated one at a time, in sequence of descending word confidence, and the first candidate found in the lexicon is accepted.

back to list of publications

download 162 KB gzipped postscript