Patent attributes
Embodiments of the present disclosure relate generally to semantic indexing to improve search results of a large corpus. Some embodiments, with at least one of the keywords of the search query encoded by a semantic vector in a semantic vector space, identify a plurality of candidate publications in the publication corpus, the plurality of candidate publications encoded by a cluster of a plurality of semantic vectors in the semantic vector space, the identifying based on proximity in the semantic vector space between the at least one of the keywords of the search query and keywords in the plurality of candidate publications, the proximity based on a first machine-learned model that projects the at least one keyword in the search query and the keywords in the plurality of candidate publications into the semantic vector space.