Patent attributes
Respective correlation metrics between token groups of a particular text attribute of a data set and a prediction target attribute are computed. Based on the correlation metrics, a predictive token group list is created. For various observation records of the data set, values of a derived categorical attribute corresponding to the particular text attribute are determined based on matches between the particular text attribute value and the predictive token group list. A measure of the predictive utility of the particular text attribute is obtained using correlations between the categorical attribute and the prediction target attribute.