Patent attributes
The present invention is a method of modeling a single class of data from data containing multiple classes of data of the same type of data by first receiving a collection of data that includes data from multiple classes of data of the same type where the amount of data of the single class of data exceeds that of any other class of data. A first statistical model of the received collection of data is generated. The collection of data is divided into subsets. Each subset of the speech collection of data is scored using the first statistical model. A set of scores is selected. The subsets corresponding to the selected scores are identified. The identified subsets are combined. A second statistical model of the type of the first statistical model is generated for the combined subsets and used as the model of the single class of data.