Patent attributes
Method includes generating a base model by training a pretrained model using a base training dataset including first training datapoints identifying tables in historical document images that include the tables and text, where the generated base model is configured to extract the tables as objects; and generating a table extraction model by training the base model using an enhanced training dataset including second training datapoints that are different from the first training datapoints and identify a plurality of cells disposed in each of the tables in a row direction and a column direction. The table extraction model is trained to output content of the tables and table information in an XML format, the table information including cell level information of the plurality of cells that is searchable via a query configured to provide target content that corresponds to one or more cells.