Patent attributes
Data drift or dataset shift is detected between training dataset and test dataset by training a scoring function using a pooled dataset, the pooled dataset including a union of the training dataset and the test dataset; obtaining an outlier score for each instance in the training dataset and the test dataset based at least in part on the scoring function; assigning a weight to each outlier score based at least in part on training contamination rates; determining a test statistic based at least in part on the outlier scores and the weights; determining a null distribution of no dataset shift for the test statistic; determining a threshold in the null distribution; and when the test statistic is greater than or equal to the threshold, identifying dataset shift between the training dataset and the test dataset.