Provided herein are systems and methods for extracting text from a document. Different optical character recognition (OCR) tools are used to extract different versions of the text in the document. Metrics evaluating the quality of the extracted text are compared to identify and select higher quality extracted text. A selected portion of text is compared to a threshold to ensure minimal quality. The selected portion of text is then saved. Error correction can be applied to the selected portion of text based on errors specific to the OCR tools or the document contents.