Patent attributes
Techniques for configuring and validating a data pipeline system deployment are described. In an embodiment, a template is a file or data object that describes a package of related jobs. For example, a template may describe a set of jobs necessary for deduplication of data records or a set of jobs performing machine learning on a set of data records. The template can be defined in a file, such as a JSON blob or XML file. For each job specified in the template, the template may identify a set of dataset dependencies that are needed as input for the processing of that job. For each job specified in the template, the template may further identify a set of configuration parameters needed for deployment of the job. In an embodiment, a server uses the template and the configuration parameter values collected via the GUI to generate code for the package of jobs. The code may be stored in a version control system. In an embodiment, the code may be compiled, executed, and deployed to a server for processing the data.