Patent attributes
Efficient cloud service capacity scaling is disclosed. For example, a plurality of services are configured to execute on a plurality of isolated guests, each service being in a real-time latency tolerance or a retriable latency tolerance. A first service in the real-time latency tolerance is added to a scheduling queue while second and third services in the retriable latency tolerance and execute in the plurality of isolated guests. A scheduler determines that a current computing capacity of the plurality of isolated guests is below a minimum capacity threshold. The scheduler determines whether to elevate the second and/or the third service to the real-time latency tolerance. The scheduler determines to, and then elevates the second service to the real-time latency tolerance. The scheduler determines not to elevate the third service, which is then terminated, freeing computing capacity. The first service is then executed in the plurality of isolated guests.