Patent attributes
Various embodiments enable a computing device to capture multiple images (or video) of text and provide at least a portion of the same to a recognizer to separately recognize text from each image. Each of the recognized outputs will typically include one or more text strings for each image. Substrings common to each of the one or more text strings are computed and compared to each text string within each image to determine an alignment consensus for each substring within the text. A template string is generated that includes each common substring in a position corresponding to a determined alignment for a respective substring. A character frequency vote is then applied to unresolved portions and the final text string is determined by filling the unresolved spaces with the character having the highest occurrence rate for a respective space.