Patent attributes
The disclosed computer-implemented method for provisioning distributed datasets may include (1) identifying a dataset, where a production cluster stores a primary instance of the dataset by distributing data objects within the dataset across the production cluster according to a first partitioning scheme, (2) receiving a request for a testing instance of the dataset on a testing cluster, where the testing cluster is to distribute storage of data objects across the testing cluster according to a second partitioning scheme, (3) locating a copied instance of the dataset, (4) partitioning the copied instance of the dataset according to the second partitioning scheme, thereby generating a plurality of partitions, and (5) providing the testing instance of the dataset by providing storage access for each node within the testing cluster to a corresponding partition within the plurality of partitions. Various other methods, systems, and computer-readable media are also disclosed.