Patent attributes
A method and system for generating synthetic form image involves obtaining a multitude of field value data and associated field labels for a chosen type of form document from an electronic data source, classifying the multitude of field value data into a multitude of data categories, where the multitude of data categories, learning statistical data distributions for categorical and numerical data types using the classified categorical and numerical data, and sampling data elements randomly using the learned data distributions to generate synthetic data for categorical and numerical data. The method also involves assembling the synthetic data for the multitude of data categories with the associated field labels to generate a labeled synthetic textual data set, rendering the labeled synthetic textual data set over a structured form layout image to produce a synthetic form image, and storing the synthetic form image and the labeled synthetic textual data set.