Patent attributes
Various embodiments describe a website analyzer that can be used for the automatic identification of unauthorized or malicious websites. A website analyzer can include heuristics for automatically identifying a collection of behaviors typical of unauthorized websites. Some embodiments automatically scan content hosted across server computers in a virtual environment and proactively identify potentially malicious websites. The embodiments can also be used to automatically scan content on public networks, such as the Internet. In particular embodiments, the website analyzer can include a semantic analysis engine and a link analysis engine. The semantic analysis engine can use the tag-level structure of HTML pages to formulate metrics which define similarity of web page content. The link analysis engine can compare the structure of embedded URIs and scripts to define metrics which quantify the difference of links between an authorized site and a potentially malicious site.