Patent attributes
An apparatus in one embodiment comprises a processing platform implementing a data set discovery engine. The data set discovery engine comprises a data set indexer configured to generate similarity indexes for a plurality of data sets, and a relativistic retriever coupled to the data set indexer and configured to obtain a suitability template for a query and to execute the query against one or more of the similarity indexes based at least in part on the suitability template. A given one of the similarity indexes comprises at least first and second auxiliary information generated from respective ones of at least first and second different similarity measures of a plurality of different similarity measures. The first and second similarity measures comprise selected ones of the plurality of different similarity measures that are supported by the data set discovery engine with the supported similarity measures comprising both frequency-based and non-frequency-based similarity measures.