Patent attributes
A computing device predicts an event or classifies an observation. A trained labeling model is executed with unlabeled observations to define a label distribution probability matrix. A label is selected for each observation. A mean observation vector and a covariance matrix are computed from the unlabeled observations selected to have each respective label. A number of eigenvalues that have a smallest value is selected from each covariance matrix and used to define a null space for each respective label. A distance value is computed for a distance vector computed to the mean observation vector and projected into the null space associated with the label selected for each respective observation. A diversity rank is determined for each respective observation based on minimum computed distance values. A predefined number of observations having highest values for the diversity rank are included in labeled observations and removed from the unlabeled observations.