Patent attributes
A computer-implemented method performed at a server system having one or more processors and memory, the method comprising receiving a set of curated documents comprising one or more documents identified as being relevant to a sector, analyzing the set of curated documents to determine one or more words and a count of each of the one or more words for all documents of the curated set of documents, further analyzing the set of curated documents, by analyzing one or more n-grams based on the one or more words, determining a first score based on a term frequency and a global document frequency of each of the one or more words of each of the one or more n-grams, determining a document vector based on averages of the first score, where the document vector comprises a perfect document for the sector, and storing the document vector in the data store.