The disclosed embodiments provide a system for identifying root causes of performance issues. During operation, the system obtains a call graph containing a set of call paths for a set of services. Next, the system determines, based on a load test of the set of services, severity scores for the set of services, wherein the severity scores represent levels of abnormal behavior in the set of services. The system then groups the severity scores by the set of call paths and identifies, based on the grouped severity scores, one or more services as potential root causes of performance issues in the set of services. Finally, the system outputs the identified one or more services as the potential root causes of the performance issues.