Patent attributes
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for clustering data elements. In one aspect, a method includes determining a respective linkage value for each of multiple cluster pairs, where each cluster pair includes a respective first cluster and a respective second cluster. Determining a linkage value for a cluster pair includes determining a set of pairwise similarity values for the cluster pair. Each pairwise similarity value defines a similarity measure between: (i) a particular data element from the first cluster of the cluster pair, and (ii) a given data element from the second cluster of the cluster pair. The linkage value for the cluster pair is assigned as a given percentile of the set of pairwise similarity values, wherein the given percentile is greater than 0 and less than 100. A cluster pair is merged based on the linkage values of the cluster pairs.