Patent attributes
A computing device automatically classifies an observation vector. (a) A converged classification matrix is computed that defines a label probability for each observation vector. (b) The value of the target variable associated with a maximum label probability value is selected for each observation vector. Each observation vector is assigned to a cluster. A distance value is computed between observation vectors assigned to the same cluster. An average distance value is computed for each observation vector. A predefined number of observation vectors are selected that have minimum values for the average distance value. The supervised data is updated to include the selected observation vectors with the value of the target variable selected in (b). The selected observation vectors are removed from the unlabeled subset. (a) and (b) are repeated. The value of the target variable for each observation vector is output to a labeled dataset.