Patent attributes
An approach to identifying a corrective action for a data storage device (DSD), such as one implemented in a fleet of DSDs in a data center, involves receiving error data about excursions from normal operational behavior of the DSD, inputting data representing a particular excursion into a probabilistic decision network which characterizes a set of DSD operational metrics and certain DSD controller rules that represent internal controls of the DSD and corresponding conditional relationships among the operational metrics, determining from the decision network the likelihood that one or more possible causes was a contributing factor to the particular excursion, and determining a corrective action for the particular excursion based on the determined likelihood of a particular cause of the one or more possible causes. The corrective action may then be shared with the DSD for in-situ execution of corresponding self-repair operations.