Patent attributes
Methods and apparatus for generating layout-preserved text output from portable document format (PDF) input are described. A layout-preserved text generation method may generate layout-preserved text output from PDF input that includes the text along with indentations, spaces, newlines, and paging and that thus preserves the global document layout view of the original PDF input document. The layout-preserved text generation method may transform the PDF (X, Y) document space into a text file grid space while preserving a similar global view of the text and layout from the PDF (X, Y) document space. This transformation may include determining a base size per grid that may produce accurate layout in the text output from the PDF input.