Patent attributes
A computer program product includes program instructions configured to cause a processor, to: perform optical character recognition (OCR) on an image of a document; extract an identifier of the document from the image based at least in part on the OCR; compare at least portions of the identifier with content from one or more reference data sources; and determine whether the identifier is valid based at least in part on the comparison. The content comprises global address information; while the content from the reference is derived from geographic information. Deriving the content from the geographic information includes: obtaining the geographic information; and parsing the geographic information according to a set of predefined heuristic rules, where the heuristic rules are configured to normalize the global address information obtained from the one or more sources according to a single convention for representing address information.