Patent attributes
Systems and methods for adding labels to a graph are disclosed. One system includes a plurality of computing devices including processors and memory storing an input graph generated based on a source data set, where an edge represents a similarity measure between two nodes in the input graph, the input graph being distributed across the plurality of computing devices, and some of the nodes are seed nodes associated with one or more training labels from a set of labels, each training label having an associated original weight. The memory may also store instructions that, when executed by the processors, cause the plurality of distributed computing devices to propagate the training labels through the input graph using a sparsity approximation for label propagation, resulting in learned weights for respective node and label pairs, and automatically update the source data set using node and label pairs selected based on the learned weights.