Patent attributes
Text can be encoded into DNA sequences. Each word from a document or other text sample can be encoded in a DNA sequence or DNA sequences and the DNA sequences can be stored for later retrieval. The DNA sequences can be stored digitally, or actual DNA molecules containing the sequences can be synthesized and stored. In one example, the encoding technique makes use of a polynomial function to transform words based on the Latin alphabet into k-mer DNA sequences of length k. Because the whole bits required for the DNA sequences are smaller than the actual strings of words, storing documents using DNA sequences may compress the documents relative to storing the same documents using other techniques. In at least one example, the mapping between words and DNA sequences is one-to-one and the collision ratio for the encoding is low.