Patent attributes
Systems and methods are disclosed for hierarchical categorical and sub-categorical topic modeling allowing, in response to a query in natural language, a set of results to be determined which are both semantically relevant to the user and diverse, by containing information complementary or adjacent to that of the query. Such a paradigm permits exploration and discovery of new topics and ideas in large collections of documents. In some embodiments, one or more non-negative matrix factorization (“NMF”) algorithms are applied in determining a hierarchical topic model including the semantically-related categories and sub-categories. The dataset may include authorized social media data collection, and machine learning techniques can optimize the generation of the topic model and/or the search results.