Patent attributes
Embodiments of the system in include a memory that stores a metamodel including a plurality of predefined characteristics for data sets. A data repository stores a plurality of heterogeneous data sets, each of the plurality of data sets comprising a plurality of data batches received over time. An interface receives a new data set for storage into the data repository, and data health reasoner to retrieve the stored metamodel from the memory, the stored metamodel including a plurality of predefined characteristics. The data health reasoner determines measured values of a subset of the plurality of predefined characteristics identified based on the stored metamodel, and determines a set of data health metrics for the data set based on the measured values of the subset of the set of the predefined characteristics. The data health reasoner formulates a plurality of data validation assertions for the data set and apply the plurality of data validation assertions to each instance of the data set. A user interface receives a request from a service that consumes data from the data set, and provides the plurality of data validation assertions to the service with the data from the data set, wherein each of the plurality of heterogeneous data sets in the data repository have a plurality of data validation assertions derived from the stored meta-model.