Applying DevOps Practices of Continuous Automation for Machine Learning
Abstract
:1. Introduction
- RQ1: Which DevOps tools have been selected to design and implement automate deployment pipelines?
- RQ2: What challenges have been reported or adopting DevOps continuous practices?
- RQ3: How does DevOps impact the ML manual pipeline method in terms of performance, scalability, and monitoring?
- RQ4: How can we find a new approach for the ML manual pipeline method to improve the ML tracking lifecycle?
- a review of the two principal DevOps components—continuous integration (CI) and continuous delivery (CD)—in the ML context,
- a ML manual pipeline design with the components in details,
- a ML automate pipeline design with CI/CD components in details
2. Background
2.1. Continuous Software Engineering
2.2. DevOps
2.2.1. DevOps Principles
- Culture represented by human communication, technical processes, and tools
- Automation of processes
- Measurement of KPIs
- Sharing feedback, best practices, and knowledge
2.2.2. DevOps Model and Practices
2.3. Existing Literature Reviews
3. Machine Learning Lifecycle Methodologies
3.1. CRISP-DM Methodology
- Business Understanding Phase: In this phase, determine business objectives, assess the situation, establish data mining goals, and produce the project plan.
- Data Understanding Phase: In this phase, the initial data is available for the exploratory analysis, and evaluation of the data quality.
- Data Preparation Phase: In this phase, the preparation of data is a multistage process that comprises several individual steps. These steps are feature extraction, data cleaning, data reduction, data selection, and transformation.
- Modeling Phase: In this phase, the machine learning model is selected for the specific problem.
- Evaluation Phase: In this phase, the results can be processed by the selection of the ML model. Also, a review may be performed to check if the business understanding is achieved.
- Deployment Phase: In this phase, the steps are plan deployment, plan monitoring, and maintenance.
3.2. SEMMA Methodology
- Sample step: In this step, sample data is limited to collection and analysis of the data contained in form.
- Explore step: Understand the data exploring the outliers, patterns, and relationships.
- Modify step: Modify the data by selecting, transforming and deriving the required feature to enable reaching an outcome.
- Model step: Model the data using data analytics algorithms and tools to establish the results.
- Assess step: In this step, the resulting outcome is assessed in multiple stages by evaluating the usability and reliability of the findings from the data mining process.
3.3. Team Data Science Process (TDSP) Methodology
3.3.1. Business Understanding
3.3.2. Data Acquisition and Understanding
3.3.3. Modeling
3.3.4. Deployment
3.3.5. Customer Acceptance
3.3.6. Team Definition
4. Machine Learning Pipelines
4.1. Machine Learning Manual Pipeline
- Business problem framing
- Dataset features and storage
- ML analytical methodology
- ML trained model
- Model registry storage
- ML testing model
- ML results
4.2. Proposed Machine Learning Automate Pipeline with CI/CD
- Business problem analysis
- Dataset features and storage
- ML analytical methodology
- Pipeline CI components
- Pipeline CD components
- Automated ML triggering
- Model registry storage
- Monitoring and performance
- Production ML service
- Source code management (SCM)
- Push/pull changes to the repository to trigger a continuous delivery build
- Check out the latest code and the associated data version from the data repository storage.
- Run the unit tests
- Build/run machine learning model code
- Testing and validation
- Package the model and build the container image
- Push the container image to the registry
- Recreate: Terminate the version A and then roll out the version B.
- Ramped: Version B is rolled out in an increment way to replace version A.
- Blue/Green: Version B is released beside version A, once B is validated and confirmed its functionality traffic is switched to the new version B.
- Canary: Make a subset of the traffic to go to version B while old version A is still serving then proceed to roll out to full users.
- A/B testing: Running both versions and comparing results
5. Techniques for Scalable Machine Learning Models
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- DevOps Documentation. Available online: https://devops.com (accessed on 10 March 2020).
- Moreira, M. The Agile Enterprise: Building and Running Agile Organizations, 1st ed.; Apress: Berkeley, CA, USA, 2017. [Google Scholar]
- Farroha, B.S.; Farroha, D.L. A framework for managing mission needs, compliance, and trust in the DevOps environment. In Proceedings of the IEEE Military Communications Conference, Baltimore, MD, USA, 6–8 October 2014; pp. 288–293. [Google Scholar]
- Kitchenham, B.; Chartes, S. Guidelines for performing systematic literature reviews in software engineering. EBSE Technical Report (EBSE-2007-01); Keele University and Durham University Joint Report. 2007. Available online: https://edisciplinas.usp.br/pluginfile.php/4108896/mod_resource/content/2/slrPCS5012_highlighted.pdf (accessed on 7 July 2020).
- Fitzgerald, B.; Stol, K.J. Continuous Software Engineering: A Roadmap and Agenda. J. Syst. Softw. 2017, 123, 176–189. [Google Scholar] [CrossRef]
- Bosch, J. Continuous Software Engineering: An Introduction. In Continuous Software Engineering; Springer: Berlin/Heidelberg, Germany, 2014; pp. 3–13. [Google Scholar]
- Leppanen, M.; Makinen, S.; Pagels, M.; Eloranta, V.P.; Itkonen, J.; Mantyla, M.V.; Mannisto, T. The Highways and Country Roads to Continuous Deployment. IEEE Softw. 2015, 32, 64–72. [Google Scholar] [CrossRef]
- Weber, I.; Nepal, S.; Zhu, L. Developing Dependable and Secure Cloud Applications. IEEE Internet Comput. 2016, 20, 74–79. [Google Scholar] [CrossRef]
- Humble, J. Continuous Delivery vs. Continuous Deployment. Available online: https://continuousdelivery.com/2010/08/continuous-delivery-vs-continuous-deployment (accessed on 15 June 2020).
- 2015 State of DevOps Report. Available online: https://puppetlabs.com/2015-devops-report (accessed on 15 June 2020).
- Chen, L. Continuous Delivery: Huge Benefits, but Challenges Too. IEEE Softw. 2015, 32, 50–54. [Google Scholar] [CrossRef]
- Humble, J. What is Continuous Delivery? Available online: https://continuousdelivery.com/2010/02/continuous-delivery/ (accessed on 15 June 2020).
- Jenkins. Build Great Things at any Scale. Available online: https://jenkins.io (accessed on 10 April 2020).
- Chacon, S.; Straub, B. Pro Git, 2nd ed.; Apress: Berkeley, CA, USA, 2014. [Google Scholar]
- Christof, E.; Gallardo, G.; Hernantes, J.; Serrano, N. DevOps. IEEE Softw. 2016, 33, 94–100. [Google Scholar]
- Azure Documentation. Available online: https://www.azure.microsoft.com (accessed on 10 March 2020).
- Google Documentation. Available online: https://google.com (accessed on 10 March 2020).
- AWS Documentation. Available online: https://aws.amazon.com (accessed on 10 March 2020).
- Sculley, D.; Holt, G.; Golovin, D.; Davydov, E.; Phillips, T.; Ebner, D.; Chaudhary, V.; Young, M.; Crespo, J.F.; Dennison, D. Hidden technical debt in Machine Learning Systems. In Advances in Neural Information Processing Systems; Curran Associates: New York, NY, USA, 2015; Volume 28, pp. 2503–2511. [Google Scholar]
- Kontsevoi, B.; Soroka, E.; Terekov, S. TETRA as a set of techniques and tools for calculating technical debt principal and interest. In Proceedings of the IEEE/ACM International Conference on Technical Debt, TechDebt, Montreal, QC, Canada, 26–27 May 2019; pp. 64–65. [Google Scholar]
- Karamitsos, I.; Albarhami, S.; Apostolopoulos, C. Tweet Sentiment Analysis (TSA) for cloud providers using classification algorithms and latent semantic analysis. J. Data Anal. Inf. Process. 2019, 7, 276–294. [Google Scholar] [CrossRef] [Green Version]
- Virmani, M. Understanding DevOPs and bridging the gap from continuous integration to continuous delivery. In Proceedings of the 5th International Conference on Innovative Computing Technology INTECH, Pontevedra, Spain, 22–25 May 2015; pp. 78–82. [Google Scholar]
- Erich, F.M.; Amrit, C.; Daneva, M. A qualitative study of DevOps usage in practice. J. Softw. Evol. Process 2017, 29, 1–20. [Google Scholar] [CrossRef]
- Lwakatare, L.E.; Kuvaja, P.; Oivo, M. Relationship of DevOps to Agile, Lean and Continuous Deployment. In Product-Focused Software Process Improvement, Proceedings of the 17th International Conference, PROFES 2016, Trondheim, Norway, 22–24 November 2016; Springer: Cham, Switzerland, 2016; Volume 10027, pp. 399–415. [Google Scholar]
- Forsgren, N.; Humble, J.; Gene, K. The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations; IT Revolution Press: Portland, OR, USA, 2018. [Google Scholar]
- Shearer, D. The CRISP-DM Model: The new Blueprint for Data Mining. J. Data Wareh. 2000, 5, 13–18. [Google Scholar]
- SAS Enterprise Miner- SEMMA SAS Institute Inc. Available online: http://www.sas.com/technologies/analytics/datamining/miner/semma.html (accessed on 20 May 2020).
- Microsoft Azure Team Data Science Process. Available online: https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview (accessed on 10 March 2020).
- Humble, J.; Farley, D. Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation; Pearson Education, Inc.: Boston, MA, USA, 2011. [Google Scholar]
- Docker Documentation. Available online: https://docs.docker.com (accessed on 10 April 2020).
- Helm Team: The Package Manager for Kubernetes. Available online: https://helm.sh (accessed on 15 April 2020).
- Google: Kubernetes. Available online: https://kubernetes.io (accessed on 15 April 2020).
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Karamitsos, I.; Albarhami, S.; Apostolopoulos, C. Applying DevOps Practices of Continuous Automation for Machine Learning. Information 2020, 11, 363. https://doi.org/10.3390/info11070363
Karamitsos I, Albarhami S, Apostolopoulos C. Applying DevOps Practices of Continuous Automation for Machine Learning. Information. 2020; 11(7):363. https://doi.org/10.3390/info11070363
Chicago/Turabian StyleKaramitsos, Ioannis, Saeed Albarhami, and Charalampos Apostolopoulos. 2020. "Applying DevOps Practices of Continuous Automation for Machine Learning" Information 11, no. 7: 363. https://doi.org/10.3390/info11070363
APA StyleKaramitsos, I., Albarhami, S., & Apostolopoulos, C. (2020). Applying DevOps Practices of Continuous Automation for Machine Learning. Information, 11(7), 363. https://doi.org/10.3390/info11070363