Patent attributes
An approximate data structure to represent clusters of observation records of a data set is identified. A hierarchical representation of a plurality of clusters, including the targeted number of clusters among which the observation records are to be distributed, is generated. Each node of the hierarchy comprises an instance of the approximate data structure. Until a set of termination criteria are met, iterations of a selected clustering methodology are run. In a given iteration, distances of observation records from the cluster representatives of a current version of the model are computed using the hierarchical representation, and a new version of the model with modified cluster representatives is generated.