Patent attributes
Methods, systems, and computer programs are presented for providing access to a cloud data platform including a machine learning model for performing a plurality of iterations, by at least one hardware processor, to generate a Natural Language Processing (NLP) model. The cloud data platform performs each iteration by receiving real-world documents and enabling information retrieval from the real-world documents without annotated training data. Each iteration includes receiving data comprising text data, layout data, and image data and analyzing the text data, the layout data, and the image data. The cloud data platform generates one or more outputs from the machine learning model by applying the iterative training on new data, based at least in part on the analyzing of the text data, the layout data, and the image data.