Patent attributes
Methods and apparatus are provided for fault resilient distributed computing using a continuous data protection feature of virtual machines. An exemplary method by a compute node for executing a process of a distributed application comprises providing a virtual machine having continuous data protection to store a copy of a state of the process in a performance storage tier; and providing a virtual machine to intercept messages of the process and to store a copy of the intercepted messages in a message log, wherein the process communicates with a plurality of other processes executing on other compute nodes, and wherein the plurality of processes employ asynchronous checkpointing. The process optionally communicates with the other processes in the distributed application using one or more virtual networks. The state is optionally moved from the performance storage tier to a capacity storage tier when a new state is stored. The stored state and/or the message log can be purged using a stored epoch counter, or when an explicit checkpoint routine has completed.