Patent 10122602 was granted and assigned to Amazon on November, 2018 by the United States Patent and Trademark Office.
Techniques are described for performing infrastructure testing of a distributed system. Such testing may be performed by an infrastructure testing service that includes, for example, a manager component and multiple agent components each executing on one of multiple computing devices that are implementing the distributed system. The manager utilizes failure information to schedule failures to occur on target host devices. The manager determines if the distributed system is in a healthy state, and if so, provides failure information to the agent on a target host device. The agent then executes one or more commands on the target host device to cause the failure to occur, and monitors the distributed system and the target host device as they recover from the failure. The infrastructure testing service utilizes this monitored information to initiate other actions based on the recovery.