There is a need for more effective and efficient predictive data analysis based at least in part on categorical input data. This need can be addressed by, for example, solutions for performing predictive data analysis that utilize at least one of categorical level merging, mutual-information-based feature filtering, feature-correlation-based feature filtering to generate training data feature value arrangements, as well as training and using categorical input machine learning models trained using the training data feature value arrangements.