IV Ramakrishnan, Saikat Mukherjee, Guizhen Yang, Hasan Davulcu
Publication date
Patent office
Application number
(57) ABSTRACT A method for extracting an attribute occurrence from tem plate generated Semi-Structured document comprising multi attribute data records comprises identifying a first Set of attribute occurrences in the template generated Semi-struc tured document using an ontology. The method further comprises determining a boundary of each multi-attribute data record in the template generated Semi-structured docu ment, learning a pattern for an attribute corresponding to an identified attribute occurrence of the first set in the template generated Semi-structured document, and applying the pat tern within the boundary of each multi-attribute data record in the template generated Semi-structured document to extract a Second Set of attribute occurrences.
Total citations
Scholar articles