Patent 10445569 was granted and assigned to A9.com on October, 2019 by the United States Patent and Trademark Office.
Approaches provide for recognizing and locating text represented in image data. For example, image data that includes representations of text can be obtained. A width-focused recognition engine can be configured to analyze the image data to determine a base-set of words. The base-set of words can be associated with logical structure information that describes a geometric relationship between words in the base-set of words. A set of bounding boxes that includes one or more base words can be determined, as well as a confidence value for each base word. A depth-focused recognition engine can be configured to analyze the image data to determine a focused-set of words, the focused-set of words associated with a set of bounding boxes and confidence values for respective words. A set of merged words can be determined from a set of overlapping bounding boxes that overlap a threshold amount. The set of merged words can include at least a portion of the base-set of words and/or the focused-set of words and are selected based at least in part on respective confidence values of words in the set of overlapping bounding boxes. Thereafter, a final set of words that includes the merged set of words and appended words can be determined.