Patent attributes
A computing device receives training data representing different observations where each observation is categorized into one of options for a target variable. The device obtains computer command(s) for categorizing into one of the options for the target variable. The device generates a sampling scheme for sampling terms of the training data. The device generates sampling models by, for N iterations of the sampling scheme: determining a subset of the training data based on a training data index; sampling, based on a term index, the subset of the training data for a subset of terms; and generating, based on the subset of terms, a sampling model for categorizing, according to the computer command(s). Each sampling model is generated from a different subset of terms such that the sampling models are randomized. The device computes an aggregated model for categorizing test data into one of the options for the target variable.