Patent attributes
Methods, systems, and articles of manufacture for the analysis of large document archives by matching queries against documents are disclosed. These include generating an ordered sequence of query tokens for each query, generating an ordered sequence of document tokens for each document, selecting an ordered sequence of document tokens from the tokenized one or more documents, selecting an ordered sequence of query tokens from the tokenized one or more queries, configuring a buffer to hold a subsequence of the selected ordered sequence of document tokens, comparing the selected ordered sequence of query tokens to successive subsequences of the selected ordered sequence of document tokens in the configured buffer where each of the successive subsequences and the selected ordered sequence of query tokens have the same length in tokens, and determining a match result based upon the comparison.