Patent attributes
Indexing and querying in multiple languages is accomplished using an ordered chain of filters and/or other such components. When receiving information to be indexed or for a query, the information can be tokenized and typed based at least in part on the language of each token. The character types can be adjusted if appropriate for the languages, and the tokens can be further segmented using a dictionary for the respective language types. Once appropriate tokens are determined, relevant synonyms in each appropriate language can be determined and typed accordingly. If necessary the case of the tokens and synonyms can be adjusted and further segmented based on punctuation. The terms and synonyms then can be used as part of the index or as part of the search query to include other terms or phrases based on relevance to the original information.