Authors
Anurag Bhardwaj, Suryaprakash Kompalli, Srirangaraj Setlur, Venu Govindaraju
Publication date
2008/1/28
Conference
Document Recognition and Retrieval XV
Volume
6815
Pages
68150O
Publisher
International Society for Optics and Photonics
Description
This paper describes an OCR-based technique for word spotting in Devanagari printed documents. The system accepts a Devanagari word as input and returns a sequence of word images that are ranked according to their similarity with the input query. The methodology involves line and word separation, pre-processing document words, word recognition using OCR and similarity matching. We demonstrate a Block Adjacency Graph (BAG) based document cleanup in the pre-processing phase. During word recognition, multiple recognition hypotheses are generated for each document word using a font-independent Devanagari OCR. The similarity matching phase uses a cost based model to match the word input by a user and the OCR results. Experiments are conducted on document images from the publicly available ILT and Million Book Project dataset. The technique achieves an average precision of 80% for …
Total citations
2009201020112012201320142015201620172018201920202021202220231222111121
Scholar articles
A Bhardwaj, S Kompalli, S Setlur, V Govindaraju - Document Recognition and Retrieval XV, 2008