Patent attributes
Techniques herein improve computational efficiency for wildcard searches by using numeric string hashes. In an embodiment, a plurality of query K-gram tokens for a term in a query are generated. Using a first index, an intersection of hash tokens is determined, wherein said first index indexes each query K-gram token of said K-gram tokens to a respective subset of hash tokens of a plurality of hash tokens, each of hash token of said plurality of hash tokens corresponding to a term found in one or more documents of a corpus of documents. The intersection of hash tokens comprises only hash tokens indexed to all of said plurality of query K-gram tokens by said first index. Using a second index, documents of said corpus of documents that contain said term are determined, said second index indexing said hash tokens to a plurality of terms in said corpus of documents and for each term of said plurality of terms, a respective subset of documents of corpus of documents that contain said each term.