Systems and techniques for improving the training of machine learning classifiers are disclosed. A classifier is trained using a set of validated documents that are accurately associated with a set of class labels. A subset of non-validated documents is also identified and is used to further train and improve accuracy of the classifier.