Machine Learning Operations (MLOps) is a set of good practices that simplify and automate machine learning (ML) workflows and deployments in large-scale production environments. An MLOps workflow allows for setting up monitoring pipelines, scaling ML operations, and better collaboration among data scientists.
In real-world projects, only a small fraction of the ML system comprises ML code, with automation, model analysis, serving infrastructure, monitoring, etc., being the other major components. This disparity is because ML is not just code but code with constantly changing data, unlike software development, and a disconnect between the two can cause major issues like slow deployment, data drift, lack of reproducibility, etc.
What is the need for MLOps?
MLOps help address various problems across the ML project lifecycle, such as deployment, monitoring, governance, and others. Some of the issues it resolves are mentioned below:
- Slow deployment or backlog of models waiting to be deployed
- No standardized process for pushing models from development to production
- Slow troubleshooting process
- Lack of monitoring of models in production
- Model decay is not known to data scientists
- Models not being updated in production
These are some of the general issues that MLOps addresses through its principles, which are mentioned in the next section.
What are the principles of MLOps?
Version Control
Version Control is the practice of tracking changes so that changes can be easily rolled back to the previous version if necessary, which enables reproducibility. We can keep track of changes in the dataset and the ML model configuration, which eases model development, governance, and accountability. Some of the tools for version control include DVC and ModelDB for versioning the training sets, model artifacts, and metadata.
Automation
Automating different stages in an ML pipeline ensures consistency, scalability, and reproducibility. Tasks like data validation, model training and testing, deployment, and monitoring of the ML model can be automated to speed up the process and ensure more reliable processes and easier problem discovery. DVC, Kubeflow, and MLflow are some of the tools for automating ML pipelines.
Continuous X
In an MLOps pipeline, any change in the system triggers four activities:
- Continuous integration for validating and testing the code.
- Continuous delivery for automatically deploying the newly trained model.
- Continuous training for retraining the ML model.
- Continuous monitoring of the model based on metrics related to the business.
GitHub Actions and Jenkins are a few examples of tools that help set up such pipelines.
Model governance
Model governance fosters collaboration between data scientists, data engineers, and business stakeholders, uses clear documentation, effective communication, and feedback mechanisms to improve the model's performance, and ensures data privacy.
What are the benefits of MLOps?
The recent surge in the human digital footprint in the past few years has necessitated leveraging this data for solving real-life business problems. This has highlighted the importance of scalable ML models that are able to handle such a growing demand for data-driven insights, and MLOps plays a key role in achieving this target. Some of its benefits are mentioned below:
- Faster deployment: MLOps helps organizations achieve their data science goals faster and more efficiently. The use of automation pipelines results in better go-to-market times with lower operational costs.
- Better productivity: MLOps also helps boost an organization's productivity by standardizing the development process. Since reusability is an important aspect of MLOps, data scientists and engineers can reuse the ML models across different applications and use already existing code for rapid experimentation and model training.
- Efficient model deployment: MLOps improves model management in production through model monitoring systems that allow for better troubleshooting. Moreover, data scientists can choose the best model for their task through model versioning and even reuse the same model for similar applications. Lastly, through CI/CD pipelines, the quality of the model is maintained, and performance degradation is limited.
Resources: