In an illustrative embodiment, methods and systems for automatically labeling unstructured data include accessing unstructured data representing data entry and analyzing the unstructured data by applying natural language processing to a text component of the unstructured data to obtain a set of term counts of words and/or phrases identified in the text component. Analyzing may include applying at least one clustering algorithm to the set of term counts to determine a term cluster, identifying a preexisting term cluster most closely matching the term cluster, and applying, to the unstructured data, a predefined label corresponding to the preexisting term cluster. The unstructured data may be analyzed to obtain formatting counts of formatting elements, and a formatting cluster may be determined and applied to match to a preexisting formatting cluster, thus deriving a predefined label corresponding to the preexisting formatting cluster.