Is a
Patent attributes
Patent Applicant
Patent Jurisdiction
Patent Number
Patent Inventor Names
Guilherme Menezes0
Abdullah Reza0
Date of Patent
May 28, 2019
Patent Application Number
14974989
Date Filed
December 18, 2015
Patent Citations Received
...
Patent Primary Examiner
Patent abstract
Clustering files in deduplication systems is based on an estimate of similarity between files in a file system. The estimates of similarity are based on how much content the files share, where the estimate of how much content is shared is based on an estimate of segments shared. The estimate of segments shared is based on segment offsets found in the files' bitmap vectors of segment offsets. The found segment offsets are used to generate a cluster definition approximating an optimal data structure for clustering files that share content. The approximated optimal data structure defines clusters hierarchically arranged based on the offset numbers of the found segment offsets.
Timeline
No Timeline data yet.
Further Resources
No Further Resources data yet.