Patent attributes
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for social identity clustering. In one aspect, a method includes receiving a connection graph representing public social data, where the nodes represent social identities and each edge is either a “me” edge between identities claimed to belong to the same person or a “friend” edge between identities claimed to belong to different identities. The method further includes converting the graph to a cluster graph in which each node initially corresponds to a single node of the connection graph. The method further includes updating the cluster graph by iteratively merging cluster nodes based on an analysis of the weight of the “me” edges connecting them, and then replacing the merged cluster nodes within the graph with a new cluster node containing the merged cluster nodes, where the edges of the new cluster node are the aggregated edges of the merged cluster nodes.