Patent attributes
Techniques are described for fast approximate conditional sampling by randomly sampling a dataset and then performing a nearest neighbor search on the pre-sampled dataset to reduce the data over which the nearest neighbor search must be performed and, according to an embodiment, to effectively reduce the number of nearest neighbors that are to be found within the random sample. Furthermore, KD-Tree-based stratified sampling is used to generate a representative sample of a dataset. KD-Tree-based stratified sampling may be used to identify the random sample for fast approximate conditional sampling, which reduces variance in the resulting data sample. As such, using KD-Tree-based stratified sampling to generate the random sample for fast approximate conditional sampling ensures that any nearest neighbor selected, for a target data instance, from the random sample is likely to be among the nearest neighbors of the target data instance within the unsampled dataset.