Patent attributes
Systems and methods are described for providing for serverless inferences against a trained machine learning (ML) model. Rather than obtaining one or more dedicated devices to conduct inferences, users are enabled to create a task on a serverless system that, when invoked, passing input data to a trained ML model and provides a result. To satisfy varying user requirements for inference speed, the system includes a variety of hardware configurations. The system can efficiently allocate resources between different tasks by invoking the task on a particular hardware configuration that is selected based on a current availability of the selected hardware configuration to host an execution environment in which the task is implemented and an expected time to invoke the task on the individual hardware configuration. The system can therefore efficiently allocate resources among inferences using a variety of different ML models.