Patent attributes
One embodiment of the present invention sets forth a technique for optimizing data in a dataset. The technique includes determining, based on one or more attributes of a dataset, an optimization that is associated with at least one of a file encoding, a file size, and a sort column. The technique also includes identifying a plurality of candidate configurations associated with the dataset and corresponding to the optimization, and for each candidate configuration, generating a corresponding set of evaluation metrics associated with the first optimization. The technique further includes determining, based on the sets of evaluation metrics corresponding to the plurality of candidate configurations, a set of configurations in the plurality of candidate configurations to be applied to the dataset. Finally, the technique includes modifying the dataset based on the set of configurations.