Patent attributes
The technology disclosed relates to systems and methods for device-dependent display of an article from a PDF file. The article can have multiple columns. The system can use a library to render the article from the PDF file. The rendering can include bounding boxes positioned at on-page coordinates that can include one or more images and multiple text blocks of glyphs. The system can partition the text blocks and images in two or more columns using dynamically adjusted valleys between columns. The system can set a reading order of the article after rendering. The system can merge and split text blocks to form paragraphs of text. The system includes logic to infer semantic information about typographic roles of the paragraphs from at least the font information. The system can cause display of the article in a device-dependent format using the semantic information and the reading order.