Patent attributes
The system is configured to create a generalized document automation framework that captures relevant data from documents based upon replicating historical human actions associated with a document. The system may use machine vision and natural language processing to match a new document to a document that was already human extracted in an existing corpus. This is accomplished by comparing both visual elements and textual elements. This match can be verified by statistical approaches by comparing the match metrics across multiple documents. After the match has been found and verified, the system then uses the historical extractions from the historical document and maps the extractions to similar regions in the new document based upon again both visual and text commonalities between documents. Data is then extracted from these regions of interest in the new document, sanity checked for data integrity against historical data, and then passed downstream for processing.