Patent attributes
Embodiments are directed to perfect physical garbage collection (PPGC) process that dynamically estimates duplicate containers using a Bloom filter-based dead vector by scanning an index containing a mapping of fingerprints to a container ID for a plurality of containers; returning, for each fingerprint, a fingerprint sequence associating each fingerprint with a respective unique container ID, wherein a last entry of the sequence is preserved and the remaining entries are considered duplicates; and maintaining a duplicate array of counts of the duplicates indexed by container IDs, and wherein the duplicate array comprises a duplicate counter that keeps track of a number of live duplicated segments for each container, and further wherein a live segment is a live duplicate segment if a segment with a same fingerprint exists in another container with a higher container ID.