Patent attributes
Disclosed are techniques and systems to detect a layout of a source document. A process may include receiving content from a first page and a second page of the source document, designating sections in each page along a first direction of the page, and assigning similar sections to a group. For the group, the process may proceed by dividing sections for each page into discrete portions associated with 2D coordinate areas, and identifying sets of 2D coordinate areas for the discrete portions that contain content. The number of times each portion contains some content may be compared to a threshold to determine a layout of the group of sections.