Patent attributes
Computer-implemented systems and methods for efficiently searching large data volumes for one or more items with a definable degree of similarity. The systems and methods may include functionality directed to selecting at least one token from the one or more tokens in a target item, the token including an identifiable character string defining, fully or partially, at least one of a name, an address, an entity or other identifier associated with the target item; extracting a character from the identifiable character string after the character string is standardized to a known common version of the character string; responsive to a character distribution lookup, determining that the extracted character corresponds to a first shard from among a plurality of discrete shards; and grouping the item into the first shard, the character distribution lookup being adjustable overtime to provide for a balanced distribution of items across the plurality of discrete shards.