A first classification is received from a neural network regarding a training dataset sent to the neural network. A modified training dataset with a perturbation of the training dataset is identified, where this modified training dataset causes the neural network to return a second classification. The perturbation is analyzed to identify a negative rule of the neural network.