View article

An OCR based approach for word spotting in Devanagari documents

Authors

Anurag Bhardwaj, Suryaprakash Kompalli, Srirangaraj Setlur, Venu Govindaraju

Publication date

2008/1/28

Conference

Document Recognition and Retrieval XV

Volume

6815

Pages

68150O

Publisher

International Society for Optics and Photonics

Description

This paper describes an OCR-based technique for word spotting in Devanagari printed documents. The system accepts a Devanagari word as input and returns a sequence of word images that are ranked according to their similarity with the input query. The methodology involves line and word separation, pre-processing document words, word recognition using OCR and similarity matching. We demonstrate a Block Adjacency Graph (BAG) based document cleanup in the pre-processing phase. During word recognition, multiple recognition hypotheses are generated for each document word using a font-independent Devanagari OCR. The similarity matching phase uses a cost based model to match the word input by a user and the OCR results. Experiments are conducted on document images from the publicly available ILT and Million Book Project dataset. The technique achieves an average precision of 80% for …

Total citations

Cited by 14

2009201020112012201320142015201620172018201920202021202220231 2 2 2 1 1 1 1 2 1

Scholar articles

An OCR based approach for word spotting in Devanagari documents

A Bhardwaj, S Kompalli, S Setlur, V Govindaraju - Document Recognition and Retrieval XV, 2008