Patent attributes
Techniques described herein relate to a method for predicting field values of documents. The method may include identifying a field prediction model generation request; obtaining, training documents from a document manager; selecting a first training document; making a first determination that the first training document is a text-based document; performing text-based data extraction to identify first words and first boxes included in the first training document; identifying first keywords and first candidate words included in the first training document based on the first words and the first boxes; and generating a first annotated training document using the first keywords and the first candidate words, wherein the first annotated training document comprises color-based representation masks for the first keywords, the first candidate words, and first general words included in the first training document.