Patent attributes
Various embodiments of a system and method for highly scalable data clustering are described. Embodiments may include generating contexts of information from item-level data for multiple items; each context may include tokens that each represents an aggregate characteristic of items associated with that context. Embodiments may also include comparing the tokens of different contexts to determine measures of similarity between the contexts. Embodiments may also include grouping at least some of the contexts into clusters with other contexts based on the determined measures of similarity. Embodiments may include, subsequent to detecting a first context and a second context as being members of a common cluster, correcting item level-data of an item associated with the second context based on item-level data of an item associated with the first context.