Authors
Venu Govindaraju, Swapnil Khedekar, Suryaprakash Kompalli, Faisal Farooq, Srirangaraj Setlur, Ramanaprasad Vemulapati
Publication date
2004/1/23
Conference
First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings.
Pages
122-133
Publisher
IEEE
Description
We present methodologies for three important tasks that will eventually enable digital access of multilingual Indian document images. First, we describe several document image analysis techniques necessary to prepare Devanagari document images for OCR. The second task is OCR for machine printed Devanagari words without the help of a lexicon. We describe the OCR methodology and show how it is being extended to other Indian languages. Finally, we describe a versatile platform that facilitates automatic segmentation of document images in multiple Indian languages and an interface to capture the ground truth corresponding to the text. We use transliterated English text and virtual keyboards in a range of Indian languages for this purpose. The multilingual data entry capabilities of the tool and its underlying UNICODE data representation within a structured XML document also allow users to annotate …
Total citations
200420052006200720082009201020112012201320142015201620172018201920202021311122112321112
Scholar articles
V Govindaraju, S Khedekar, S Kompalli, F Farooq… - First International Workshop on Document Image …, 2004