Patent attributes
Nonsense words are removed from incoming emails and visually similar (look-alike) characters are replaced with the actual, corresponding characters, so that the emails can be more accurately analyzed to see if they are spam. More specifically, an incoming email stream is filtered, and the emails are normalized to enable more accurate spam detection. In some embodiments, the normalization comprises the removal of nonsense words and/or the replacement of look-alike characters according to a set of rules. In other embodiments, more and/or different normalization techniques are utilized. In some embodiments, the language in which an email is written is identified in order to aid in the normalization. Once incoming emails are normalized, they are then analyzed to detect spam or other forms of undesirable email, such as phishing emails.