Provided are systems, methods and computer program products for evaluating subsystem performance. In some embodiments, a method comprises perturbing a first attribute of a first subsystem of a system that includes a plurality of subsystems, determining a change in a second attribute of a second subsystem of the system in response to the perturbing of the first attribute, where at least one output of the first subsystem is passed to the second subsystem, and determining a value for a performance metric of the system based on a correlation of the performance metric with the first and second attributes. In some embodiments, the system is a software stack of an autonomous vehicle (AV) and the performance metric is an objective function output that measures a quality of the AV's driving behavior.