Patent attributes
Evaluating risk of sensitive data associated with a target data set includes a computer system receiving a pattern that defines sensitive data and a selection of a data set as the target data set for evaluating. The system determines portions of the target data set from which to select sample data sets and determines, responsive to a confidence limit and sizes of the respective portions of the target data, a size of a sample data set for each respective target data set portion. The system randomly samples the target data set portions to provide sample data sets of the determined sample data set sizes and determines whether there is an occurrence of the sensitive data in each sample data set by searching for the pattern in the sample data sets. The system determines a proportion of the sample data sets that have the occurrence of the sensitive data.