Patent attributes
Example apparatus, methods, and computers perform sampling based data de-duplication. One example method controls a data de-duplication computer to compute a sampling sequence for a sub-block of data and to use the sampling sequence to locate a stored sub-block known to the data de-duplication computer. Upon finding a stored sub-block to compare to, the method includes controlling the data de-duplication computer to determine a degree of similarity (e.g., duplicate, very similar, somewhat similar, very dissimilar, completely dissimilar, x % similar) between the sub-block and the stored sub-block and to control whether and how the sub-block is stored and/or transmitted based on the degree of similarity. The degree of similarity can also control whether and how the data de-duplication computer updates a dedupe data structure(s) that stores information for finding groups of similarity sampling sequence related sub-blocks.