Patent attributes
A method of detection of numbered captions in a document includes receiving a document including a sequence of document pages and identifying illustrations on pages of the document. For each identified illustration, associated text is identified. An imitation page is generated for each of the identified illustrations, each imitation page comprising a single illustration and its associated text. For a sequence of the imitation pages, a sequence of terms is identified. Each term is derived from a text fragment of the associate text of a respective imitation page. The terms of a sequence complying with at least one predefined numbering scheme which defines a form and an incremental state of the terms in a sequence. The terms of the identified sequence of terms are construed as being at least a part of a numbered caption for a respective illustration in the document.