Patent attributes
Systems and methods of classifying domain names are disclosed. Character-based n-grams are derived from a domain name in order to classify such domain name in one or more categories. In one aspect, a geometrical approach is used. Domain name character-based n-grams are mapped to vector points in a multidimensional space. The relationship between a domain name vector point and vector points of other domain names is used as an indicator of the classification of the domain name vector point. In another aspect, a statistical approach is used. Relative frequencies of one or more character-based n-grams in various classifications are used as indicators. Each character-based n-gram can be associated with a respective probability that indicates a likelihood that the character-based n-gram is found in a domain name of a given classification. Such a probability can serve as an estimator of a classification of a new domain name having such character-based n-gram.