Patent attributes
Embodiments are directed to data ingestion over a network. Raw data and integrated data associated with a plurality of separate data sources may be provided such that the raw data includes content associated with a plurality of subjects. Categorization models may be employed to categorize the raw data based on various features, such as, format, structure, data source, variability, volume, or associated entities. Matching models may be determined based on the categorization of the of the raw data, the integrated data and the content associated with the plurality of subjects. Matching models may generate a plurality of unified facts based on the raw data and the integrated data such that each unified fact is associated with a score associated with a quality of its match with a unified schema.