View article

Apparatus and methods for concept-centric information extraction

Inventors

Daniel Kifer, Srujana Merugu, Ankur Jain, Sathiya Keerthi Selvaraj, Alok S Kirpal, Philip L Bohannon, Raghu Ramakrishnan

Publication date

2010/9/23

Patent office

Application number

12408450

Description

Disclosed are methods and apparatus for extracting (or anno tating) structured information from web content. Web content of interest from a particular domain is represented as one or more tree instances having a plurality of branching nodes that each correspond to a web object such that the tree instances correspond to one or more structured data instances. The particular domain is associated with domain knowledge that includes one or more presentation rulesets that each specifies a particular structure for a set of data instances, a domain specific concept labeler, one or more specified properties of the web objects in the tree instances, and a concept schema that specifies a representation of the data to be extracted from the web content. A structured data instance that conforms to the concept schema is extracted from the one or more tree instances based on the domain knowledge for the particular domain …

Total citations

Cited by 68

20092010201120122013201420152016201720182019202020212022202320241 1 3 6 4 3 6 5 5 6 3 5 11 9

Scholar articles

Apparatus and methods for concept-centric information extraction

D Kifer, S Merugu, A Jain, SK Selvaraj, AS Kirpal… - US Patent App. 12/408,450, 2010