Techniques are described for tokenizing information to be stored in an untrusted environment. During tokenization, one or more strings in a file or data stream are replaced with a token. The token may be generated as a random number or a counter, such that the replaced string may not be derived based on the token. Token-to-string mapping data may be stored in a trusted environment, and the tokenized information may be stored in the untrusted environment. Users may search the tokenized information based on non-sensitive search terms present in a whitelist that is accessible from the untrusted environment, the whitelist providing a token-to-string mapping for the non-sensitive terms. The search results may be provided as redacted information, in which the non-sensitive strings have been detokenized based on the whitelist while the sensitive strings remain tokenized.