Patent attributes
Methods and apparatus related to clustering documents based on one or more classification terms and optionally based on similarity of structural paths of the documents. In some implementations, the documents are communications such as structured emails or other structured communications. In some of those implementations, clustering the communications includes identifying a plurality of classification terms indicative of a classification, identifying a corpus of communications that includes communications that are not labeled with an association to the classification, and determining a cluster of the communications based on occurrence of one or more of the classification terms in the communications of the cluster.