Patent attributes
Techniques for hosting machine learning models are described. In some instances, a method of receiving a request to perform an inference using a particular machine learning model; determining a group of hosts to route the request to, the group of hosts to host a plurality of machine learning models including the particular machine learning model; determining a path to the determined group of hosts; determining a particular host of the group of hosts to perform an analysis of the request based on the determined path, the particular host having the particular machine learning model in memory; routing the request to the particular host of the group of hosts; performing inference on the request using the particular host; and providing a result of the inference to a requester is performed.