Patent attributes
Some embodiments provide a program that receives a request to sectionize a document, uses a visual model to identify a set of candidate section headers in the document, and uses a language model to determine a type of section header for at least one candidate section header in the set of candidate section headers in the document. Some embodiments provide a program that receives a request to anonymize data in a document, uses a visual model to identify a set of candidate confidential sections in the document that are each predicted to include a collection of confidential data, uses a language model to identify terms in each candidate confidential section that are determined to be confidential data, analyzes the document to identify a set of terms in the document based on the identified terms in the set of candidate confidential sections, and redacts the set of terms in the document.