A system and method for deduplicating messages is provided. Duplicate copies of messages are excluded from a set of deduplicated messages. The set of deduplicated messages can then be sampled to obtain a sample set usable for ensuring compliance according to a set of rules. One method for deduplicating messages involves receiving a message, determining whether the message is a duplicate copy, and adding the message to the set of deduplicated messages, if it is determined that the message is not a duplicate copy.