Patent attributes
Sets of files may be represented using signatures. As described, an audit system can scan a file hierarchy that includes a root directory and a plurality of elements (e.g., directories, data files, and archive files) to identify elements satisfying an element selection criteria. The audit system creates element descriptors by identifying, for each respective identified element, one or more element component values and creating an element descriptor from the element component values. The audit system forms a string descriptor comprising an aggregation of the element descriptors and generates a signature for the string descriptor. The signature may be stored in association with metadata for the root directory. The audit system can identify multiple sets of files represented by equivalent signatures and record the representations of the set of files compactly.