Patent attributes
Provided are processes of balancing between exploration and optimization with knowledge discovery processes applied to unstructured data with tight interrogation budgets. Natural language texts may be processed, such as into respective vectors, by a natural language processing model. An output vector of (or intermediate vector within) an example NLP model may include over 500 dimensions, and in many cases 700-800 dimensions. A process may manage and measure semantic coverage by defining geometric characteristics, such as size or a relative distance matrix, of a sematic space corresponding to an evaluation during which the natural language texts are obtained based on the vectors of the natural language texts. A system executing the process may generate a visualization of the semantic space, which may be reduced to or is a latent embedding space, by reducing the dimensionality of vectors while preserving their relative distances between the high and reduced dimensionality forms.