Patent attributes
An iterative method of sampling real world event data to generate a subset of data that is used for training a classifier. Graph Based Sampling uses an iterative process of evaluating and adding randomly selected event data sets to a training data set. In Graph Based Sampling, at each iteration, a two event data sets are randomly selected from a stored plurality of event data sets. A proximity function is used to generate a correlation or similarity value between each of these randomly selected real world event data sets, and the current training data set. One of the randomly selected event data sets is then added to the training data set based on the proximity value. This process of selection and addition is repeated until the subset of training set is a pre-determined size.