Patent attributes
A system and method textually analyze documents. A frequency distribution is generated for the documents, and an intersection between the documents is determined. For each word in the intersection, the frequency of the word in the first document is compared with the frequency of the word in the second document, and the lower frequency is selected. A similarity measure between the first document and the second document is determined as a function of a count of the words in the intersection, a count of the words in the second document, the selected lower frequencies, and the frequency distribution for the words in the second document.