Inventors
IV Ramakrishnan, Saikat Mukherjee, Guizhen Yang, Hasan Davulcu
Publication date
2005/3/10
Patent office
US
Application number
10658312
Description
(57) ABSTRACT A method for extracting an attribute occurrence from tem plate generated Semi-Structured document comprising multi attribute data records comprises identifying a first Set of attribute occurrences in the template generated Semi-struc tured document using an ontology. The method further comprises determining a boundary of each multi-attribute data record in the template generated Semi-structured docu ment, learning a pattern for an attribute corresponding to an identified attribute occurrence of the first set in the template generated Semi-structured document, and applying the pat tern within the boundary of each multi-attribute data record in the template generated Semi-structured document to extract a Second Set of attribute occurrences.
Total citations
Scholar articles
IV Ramakrishnan, S Mukherjee, G Yang, H Davulcu - US Patent App. 10/658,312, 2005