Patent 9323731 was granted and assigned to Google on April, 2016 by the United States Patent and Trademark Office.
Systems and techniques for extracting data from unstructured documents are described. One such method involves assigning one or more labels to one or more nodes in a first object model of a first web page; comparing a second object model of a second web page to the first object model; if the first object model matches the second object model to a determined degree, extracting from the second web page data associated with nodes in the second object model that match labeled nodes in the first object model; and providing the extracted data for storage in a structured database in a manner associated with the labels.