Patent attributes
An approach for determining similar text documents. The approach can calculate a first set of vectors for a first cluster of text documents and a first comparison vector for a text document of interest. The approach can select a subset of text documents from the cluster of text documents based on comparing the vectors from the first set of vectors to the first comparison vector and picking a predetermined number of closest comparison text documents. The approach can calculate a second set of vectors for the subset of documents and second comparison vector for the document of interest. The approach can generate similarity ratings for the subset of documents based on pairwise comparisons of the second comparison vector and the second set of vectors. The approach can generate a ranked list of the second cluster of text documents based on the similarity ratings.