Patent attributes
Methods, systems, and computer program products for table recognition in PDF documents are provided herein. A computer-implemented method includes discretizing one or more contiguous areas of a PDF document; identifying one or more white-space separator lines within the one or more discretized contiguous areas of the PDF document; detecting one or more candidate table regions within the one or more discretized contiguous areas of the PDF document by clustering the one or more white-space separator lines into one or more grids; and outputting at least one of the candidate table regions as a finalized table in accordance with scores assigned to each of the one or more candidate table regions based on (i) border information and (ii) cell structure information.