US Patent 11769072 Document structure extraction using machine learning

Patent 11769072 was granted and assigned to Adobe Inc. on September, 2023 by the United States Patent and Trademark Office.

Overview Structured Data Issues Contributors Activity

All edits

Edits on 17 Oct, 2024

"update inverses"

Golden AI

edited on 17 Oct, 2024

Edits made to:

Infobox (+1 properties)

Infobox

Patent Citations Received

‌

US Patent 12118055 Accessibility profile customization

Edits on 27 Sep, 2023

"Created via: Entity Importer"

Golden AI

created this topic on 27 Sep, 2023

Edits made to:

Infobox (+16 properties)

Article (+988 characters)

‌

US Patent 11769072 Document structure extraction using machine learning

Article

Patent abstract

The structure of an untagged document can be derived using a predictive model that is trained in a supervised learning framework based on a corpus of tagged training documents. Analyzing the training documents results in a plurality of document part feature vectors, each of which correlates a category defining a document part (for example, “title” or “body paragraph”) with one or more feature-value pairs (for example, “font=Arial” or “alignment=centered”). Any suitable machine learning algorithm can be used to train the predictive model based on the document part feature vectors extracted from the training documents. Once the predictive model has been trained, it can receive feature-value pairs corresponding to a portion of an untagged document and make predictions with respect to the how that document part should be categorized. The predictive model can therefore generate tag metadata that defines a structure of the untagged document in an automated fashion.

Infobox

Is a