In one embodiment, the present invention includes a method for performing rate limiting in a horizontally distributed and scalable manner. The method includes receiving a request in a rate limiter. In turn, a sleep time can be obtained for the request based at least in part on a time value and an allotted time per request, and the request can be delayed according to the sleep time. The time value can be obtained from a distributed key value store using a key generated from the request. After this sleep time, the request can be forwarded from the rate limiter to the handling server.