Patent attributes
A computing device automatically classifies an observation vector. A label set defines permissible values for a target variable. Supervised data includes a labeled subset that has one of the permissible values. A converged classification matrix is computed based on the supervised data and an unlabeled subset using a prior class distribution matrix that includes a row for each observation vector. Each column is associated with a single permissible value of the label set. A cell value in each column is a likelihood that each associated permissible value of the label set occurs based on prior class distribution information. The value of the target variable is selected using the converged classification matrix. A weighted classification label distribution matrix is computed from the converged classification matrix. The value of the target variable for each observation vector of the plurality of observation vectors is output to a labeled dataset.