Patent attributes
Methods, apparatus, systems, and computer-readable media are provided for selecting pattern matching segments suitable for electronic communication clustering. A set of pattern matching segments may be identified that match at least one of a corpus of electronic communication addresses. A measure of coverage of each of the set of pattern matching segments across the corpus of electronic communication addresses may be determined. A score associated with each pattern matching segment may be determined based on the measure of coverage and one or more measures of flexibility associated with each of the set of pattern matching segments. One or more of the pattern matching segments may be selected based on the determine scores. A corpus of electronic communications may then be grouped into a plurality of clusters based on a comparison of the one or more selected pattern matching segments to electronic communication addresses associated with the corpus of electronic communications.