Patent attributes
A method for protecting a machine learning model includes: generating a first adversarial example by modifying an original input using an attack tactic, wherein the model accurately classifies the original input but does not accurately classify at least the first adversarial example; training a defender to protect the model from the first adversarial example by updating a strategy of the defender based on predictive results from classifying the first adversarial example; updating the attack tactic based on the predictive results from classifying the first adversarial example; generating a second adversarial example by modifying the original input using the updated attack tactic, wherein the trained defender does not protect the model from the second adversarial example; and training the defender to protect the model from the second adversarial example by updating the at least one strategy of the defender based on results obtained from classifying the second adversarial example.