Patent attributes
To protect a dataset with low overhead, a cybersecurity appliance uses multiple structures to facilitate efficient matching (“matching infrastructure”) when applying data leakage prevention rules. The cybersecurity appliance creates the matching infrastructure in advance from scanning the dataset to be protected. The cybersecurity appliance creates the matching infrastructure with differentiation among tokens occurring in the dataset at different frequencies: unique, infrequent or rare, and frequent. The differentiation of tokens into different classes of frequency of occurrence within the dataset allows efficient matching with a bias towards the less frequently occurring tokens which are more likely the tokens that are sensitive while still allowing efficient matching of frequent tokens that form a restricted data pattern of a DLP rule.