Patent attributes
Techniques for intelligent compute resource selection and utilization for machine learning training jobs are described. At least a portion of a machine learning (ML) training job is executed a plurality of times using a plurality of different resource configurations, where each of the plurality of resource configurations includes at least a different type or amount of compute instances. A performance metric is measured for each of the plurality of the executions, and can be used along with a desired performance characteristic to generate a recommended resource configuration for the ML training job. The ML training job is executed using the recommended resource configuration.