Patent attributes
Systems and methods are provided for generating classification data which is used for classifying documents. The method includes reading documents in a form of a spreadsheet; collecting cell values in each of the documents; finding one or more common cell values among the collected values; counting, for each of the common cell values, a number of the documents having the common cell value; storing, if the number of the documents is equal to or larger than a predetermined number, the common cell value as a candidate header label in a memory; calculating a distance between cell locations of the candidate header labels in each of the documents; choosing, according to the calculated distance, two or more candidate header labels among the candidate header labels for each of the documents; and storing one or more combinations of the chosen two or more candidate header labels as the classification data.