Industry attributes
Other attributes
MLOps, or Machine Learning Operations, is a set of practices that streamline the process of taking machine learning (ML) models to production as well as maintaining and monitoring them. MLOps aims to increase the quality of the management process and automate the deployment of machine learning and deep learning (DL) models in large-scale production environments. A collaborative process, MLOps often involves data scientists, DevOps engineers, and IT staff. The ML lifecycle consists of many complex processes. By adopting MLOps practices, data scientists and engineers can increase the pace of model development and production, implementing continuous integration and deployment (CI/CD) practices with proper monitoring, validation, and governance of ML models.
The intended benefits of MLOps include efficiency, scalability, and risk reduction. MLOps applies to the entire ML lifecycle. Key phases include the following:
- Data gathering
- Data analysis
- Data transformation/preparation
- Model training & development
- Model validation
- Model serving
- Model monitoring
- Model re-training
Different reports valued the MLOps market at $983.6 million in 2021 and $1.1 billion in 2022. With the proliferation of ML models, the market is expected to grow significantly with a CAGR (compound annual growth rate) between 37.5% and 41%.
There are a number of similarities between MLOps and DevOps, and many MLOps principles were derived from DevOps. DevOps offers a continuously iterative approach to shipping applications, and MLOps borrows principles and applies them to take machine learning models to production. The following are differences:
- MLOps is much more experimental in nature. Data Scientists and ML engineers have to tweak various features—hyperparameters, parameters, and models—while also keeping track of and managing the data and the code base for reproducible results.
- Hybrid team composition that includes a range of personnel beyond software developers. ML projects usually include data scientists or ML researchers, focusing on exploratory data analysis, model development, and experimentation. They might not be experienced software engineers who can build production-class services.
- Testing an ML system involves model validation, model training, and so on—in addition to conventional code tests, such as unit testing and integration testing.
- Automated deployment, organizations can’t deploy an offline-trained ML model as a prediction service. A multi-step pipeline to automatically retrain and deploy the model is required. This pipeline adds complexity needing automated steps that data scientists do manually before deployment to train and validate new models.
- ML models in production can have reduced performance due to suboptimal coding but also due to constantly evolving data profiles. Models can decay in more ways than conventional software systems, and MLOps need to plan for it.
- ML models in production need to be monitored, as well as the summary statistics of data that built the model such that the model can be refreshed when needed.