Patent attributes
A computer-implemented method relates to training a machine learning system to detect an adversarial attack. The method includes classifying a first sequence as belonging to a first class indicative of a nominal sequence based on a first prediction that the first sequence includes an unperturbed version of sensor data. The method also includes classifying a second sequence as belonging to a second class indicative of an adversarial sequence based on a second prediction that the second sequence includes a perturbed version of the sensor data. Combined loss data is generated for a collection of sequences and is based on a first average loss with respect to incorrect classifications of the first class and a second average loss with respect to incorrect classifications of the second class. Parameters of the machine learning system are updated based on the combined loss data. Once trained, the machine learning system is operable to generate a first label to indicate that an input sequence is classified as belonging to the first class and generate a second label to indicate that the input sequence is classified as belonging to the second class, thereby enabling a control system to operate in a nominal manner based on the first class and a defensive manner based on the second class.