Patent attributes
Character string analysis and classification can be useful in a variety of contexts, including examining web URLs to determine whether a URL indicates that a user is attempting to take a particular action on an electronic service platform. In some cases, however, URLs or other string data may have “noise” in them, such as random sub-strings, that prevents a string from being properly classified. Sometimes it may be useful to classify a string into a category, however, and it may be important to do this quickly (e.g. during an active user interaction with a website). Learning tables allowing for O(1) lookup can be established by tokenizing strings and then using probability analysis to eliminate tokens that appear an insufficient number of times. This allows for quick and accurate string classification, which may be useful in numerous circumstances.