Patent attributes
A structured document is translated into an initial hierarchical data structure in accordance with syntactic elements defined in the structured document. The initial hierarchical data structure includes a plurality of nodes, and each node corresponds to one of the syntactic elements. The method then annotates a node with a set of attributes including geometric parameters of semantic elements in the structured document that are associated with the node in accordance with a pseudo-rendering of the structured document. Finally, the method merges the nodes in the initial hierarchical data structure into a tree of merged nodes in accordance with their respective attributes and a set of predefined rules such that each merged node is associated with a semantically distinct region of the pseudo-rendered document. The predefined rules include rules for merging nodes associated with semantic elements that have nearby positions and/or compatible attributes in the pseudo-rendered document.