Patent attributes
Techniques for sectionalizing clinical documents are provided. In one set of embodiments, a computer system can, for each page of a clinical document: identify one or more section header candidates in the page and, for each section header candidate, attempt to classify the section header candidate as corresponding to one of a plurality of section types using a first classifier or a second classifier. The computer system can further partition the page into one or more sections based on corresponding section header candidates that have been successfully classified using either the first classifier or the second classifier, where the partitioning includes associating each section with a section type in the plurality of section types in accordance with the classification of the section's corresponding section header candidate. The computer system can then validate, for each section, the section's section type via an analysis of the body of the section.