Patent attributes
An efficient and scalable method of managing data protection in a geographically distributed storage system is provided using erasure coding. Each zone in the geographically distributed storage system of multiple zones or clusters encodes primary backup chunks replicated from original data chunks in remote zones to create coding chunks. The zone retains one of the coding chunks and distributes the remaining coding chunks to remote zones for future recovery from dual and more complex failures. Upon completion, the primary backup chunks and peer backup chunks can safely be deleted to reduce overhead. Encoding k primary backup chunks to create m coding chunks, and distributing the m coding chunks to different remote zones assures recovery from 1 to m zone/cluster/chunk failures. After 1 to m zone failures the unaffected zones are each able to recover all of the unavailable chunks by retrieving from each other any of the unaffected zone's distributed coding chunks and still available original data chunks, and restoring the content of the unavailable chunks by decoding the combination of the retained coding chunk, distributed coding chunk, and any still available original data chunks.