Patent attributes
Techniques are provided for assessing uniqueness of information using string-based collection frequency techniques. One method comprises obtaining multiple collections of documents from at least one data source; determining a collection frequency for a given character string based on a number of the collections comprising the given character string relative to a total number of the collections; assigning a uniqueness rating to the given character string based at least in part on a comparison of the collection frequency of the given character string to a collection frequency of one or more additional character strings in one or more of the plurality of collections; and performing an automated action using the given character string based on the assigned uniqueness rating. The automated action may comprise protecting the given character string and/or identifying the given character string as important information satisfying one or more importance criteria.