Patent attributes
An online system enforces policies to content items that are distributed on its platform and blocks content items that violate one or more of those policies. To identify content items that are slightly varied from each other, the online system generates an embedding for each of the known content items that have already been determined to be noncompliant with one or more policies. The online system then groups the known noncompliant content items that are clustered together in the embedding space. The texts of the group of known noncompliant content items are converted to finite state automata and are merged to generate a common automaton. The common automaton is used to generate a common regular expression that is used to screen new content items. When a new content item matches the textual pattern defined by the common regular expression, the system may block the new content item.