Patent attributes
The present disclosure relates to a method for generating a structure of a PDF-document, wherein the PDF-document comprises elements. The method comprises detecting document cells of the PDF-document dependent on commands of a page description language for printing the elements of the PDF-document. The method comprises determining parts of the PDF-document dependent on the PDF-document by a machine learning module. The determining of the respective part comprises associating a respective portion of the elements of the PDF-document with the respective part. Furthermore, a respective label may be assigned to the respective part. The method may further comprise using a symbolic artificial intelligence module, wherein rules of the symbolic AI-module for reconciling the document cells with the parts may be applied. The elements of the structure of the PDF-document may be generated and labelled dependent on a result of the reconciling and dependent on the respective label to the respective part.