Patent attributes
A system for short text identification can determine a plurality of topics and a representative noun that identifies each of the topics in a data repository. The system can determine a co-occurrence matrix for the training words stored in the corpus and determine a word vector embedding for each of the training words in the corpus to relate each of the training words in the corpus to other ones of the training words in the corpus in an n-dimensional vector space. The system can determine word tokens for words in short text in documents in the data repository that is separate and distinct from the corpus and determine sentence vectors for short text based on the word vectors in each short text and determine a plurality of topics in the documents based on clustering of sentence vectors, wherein the plurality of topics indicates topics that are predominant in the documents in the data repository.