Patent attributes
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for data retention and modification. One of the methods includes dividing partitions into a set of generations according to a retention policy; accumulating modification and deletion events that define changes to be applied to data of the distributed dataset; and when a triggering event occurs for a triggered generation in the set of generations, rolling an oldest partition out of the triggered generation, the rolling comprising: if the oldest partition has reached the end of a retention period for the dataset, marking the oldest partition for deletion in the triggered generation; otherwise: creating a new partition corresponding to the data of the oldest partition, wherein the data is cleaned using a scrubbing process; adding the new partition to a next generation in the set of generations; and marking the oldest partition for deletion in the triggered generation.