A method for extracting a document structure is disclosed. The method may include determining a position of reference information in a layout file, and extracting items related to the reference information from the determined position of the layout file. An apparatus for extracting a document structure is also disclosed. The apparatus may include a processor configured to determine a position of reference information in a layout file; and to extract items related to the reference information from the determined position of the layout file. The apparatus may further include a storage device configured to store the extracted items.