Patent attributes
Systems and methods for improving the time and cost to calculate connected components in a distributed graph are disclosed. One method includes reducing a quantity of map-reduce rounds used to determine a cluster assignment for a node in a large distributed graph by alternating between two hashing functions in the map stage of a map-reduce round and storing the cluster assignment for the node in a memory. Another method includes reducing a quantity of messages sent during map-reduce rounds by performing a predetermined quantity of rounds to generate, for each node, a set of potential cluster assignments, generating a data structure in memory to store a mapping between each node and its potential cluster assignment, and using the data structure during remaining map-reduce rounds, wherein the remaining map-reduce rounds do not send messages between nodes. The method can also include storing the cluster assignment for the node in a memory.