Patent attributes
To perform multi-pattern searching, a preprocessing engine populates a SUFFIX table, a PREFIX table and a PATTERN table. The SUFFIX table combines data conventionally stored in SHIFT and HASH tables. Pointers in the SUFFIX table refer to corresponding segments in the PREFIX table. Each PREFIX table segment is sorted by a prefix hash. A PATTERN table includes a hash of each full pattern sorted and grouped into segments, with each segment corresponding to a suffix hash and prefix hash combination. Pointers in the PREFIX table refer to corresponding segments in the PATTERN table. The PREFIX and PATTERN can be kept in secondary storage, allowing potentially billions of patterns to be used. After preprocessing, patterns are evaluated against a source file. A document metric is determine to qualitatively describe the similarity between the source file and each pattern file.