Is a
Patent attributes
Current Assignee
Patent Jurisdiction
Patent Number
Date of Patent
February 4, 2014
Patent Application Number
13349414
Date Filed
January 12, 2012
Patent Citations Received
Patent Primary Examiner
Patent abstract
Dynamic blocking determines which pairs of records in a data set should be examined as potential duplicates. Records are grouped together into blocks by shared properties that are indicators of duplication. Blocks that are too large to be efficiently processed are further subdivided by other properties chosen in a data-driven way. We demonstrate the viability of this algorithm for large data sets. We have scaled this system up to work on billions of records on an 80 node Hadoop cluster.
Timeline
No Timeline data yet.
Further Resources
No Further Resources data yet.