Patent attributes
An embodiment may include a machine learning based classifier that maps input observations into respective categories and a database containing a corpus of training data for the classifier. The training data includes a plurality of entries, each entry having an observation respectively associated with a ground truth category thereof. A computing device may be configured to select, from the training data, a plurality of subsets each containing a different number of entries. The computing device may also be configured to, for each particular subset: (i) divide the particular subset into a training portion and a validation portion, (ii) train the classifier with the training portion, (iii) provide the validation portion as input to the classifier as trained, and (iv) based on how entries of the validation portion are mapped to the categories, determine a respective precision for the particular subset.