Patent attributes
A system and method for adaptively identifying and correcting issues in a computing system, such as a distributed node computing system, are described. The method includes receiving node data from a group of nodes, the node data describing one or more operational characteristics of a node. The operational characteristics may include CPU load, memory load, latency, or other operational data that describes node performance. Reachability data for the group of nodes is generated by trying to contact each node. Code version data is generated for each node that identifies which version of code applications in the node are running. The nodes are grouped into clusters using density-based clustering to identify outliers. A correlation is determined between the reachability, code version, and outlier data to identify problems and issue corrective actions.