Patent attributes
Computerized systems and methods for identifying a table in a document include: removing from a document content other than text characters and associated size, position and format information; converting each text character into a block covering the corresponding text character; converting each page of the document into a corresponding image file; drawing a set of horizontal lines spanning a width of the document, each block super-scored and under-scored by at least one of the horizontal lines; drawing a set of vertical lines spanning all or a portion of a length of the document; removing a subset of redundant vertical lines; and determining, based on the set of horizontal lines and the subset of vertical lines, (i) a set of table coordinates corresponding to a table in the document, and (ii) one or more sets of cell coordinates corresponding to one or more cells in the table.