A computer-implemented method searches a database for a particular string. One or more processors receive data as an input string, and then identify multiple k-grams in, unique characters in, and a length of the input string. The one or more processors perform binary locality sensitive hashing on the k-grams, the unique characters, and the length for the input string, and then sum the binary locality sensitive hashings to create a first addition vector, which is used to generate a first binary vector. The same process is performed on a particular string being requested to generate a second binary vector. The one or more processors then search the database for the particular string that was requested using the second binary vector in a large scale hamming distance query process that determines a hamming distance between the first binary code and the second binary code.