Patent attributes
Embodiments maintain a data pool that includes heterogeneous data sets, and receiving a first data batch of a data set from a data source into the data pool. Embodiments determine a current state of the data set based on a data set state diagram including a plurality of data set states, and identify a condition of the first data batch. Embodiments further set a data batch state for the first data batch, based on a data batch state diagram, and update the data batch state of a prior data batch received before the first data batch, based on the condition of the first data batch. Embodiments additionally transition the data set state diagram, based on the condition of the first data batch, to an updated data set state. Embodiments maintain a data state repository storing the data set state for each of the plurality of heterogeneous data sets.