Patent attributes
An apparatus comprises a multi-cluster distributed data processing platform. Each of the clusters of the platform is configured to perform processing operations utilizing local data resources accessible within a corresponding data zone. A first application is initiated in one of the clusters, and data resources to be utilized by the application are determined. For one or more of the data resources identified as local data resources for the associated cluster, processing operations are performed utilizing those local data resources in that cluster in accordance with the first application. For one or more of the data resources identified as remote data resources for the associated cluster, one or more additional applications are initiated in one or more additional ones of the clusters. This repeats recursively for each additional application until all processing required by the first application is complete. Processing results from the clusters are aggregated and provided to a client.