From DevOps to MLOps: Overview and Application to Electricity Market Forecasting

: In the Software Development Life Cycle (SDLC), Development and Operations (DevOps) has been proven to deliver reliable, scalable software within a shorter time. Due to the explosion of Machine Learning (ML) applications, the term Machine Learning Operations (MLOps) has gained signiﬁcant interest among ML practitioners. This paper explains the DevOps and MLOps processes relevant to the implementation of MLOps. The contribution of this paper towards the MLOps framework is threefold: First, we review the state of the art in MLOps by analyzing the related work in MLOps. Second, we present an overview of the leading DevOps principles relevant to MLOps. Third, we derive an MLOps framework from the MLOps theory and apply it to a time-series forecasting application in the hourly day-ahead electricity market. The paper concludes with how MLOps could be generalized and applied to two more use cases with minor changes.


Introduction
Sufficient motivation for the DevOps process emerged around 2009 [1].At this time, Development and Operations teams struggled to achieve smooth rollouts of software products.The main reason for this struggle was that the software developers were not concerned about deployments and the operation teams were not concerned about the development processes.DevOps is a set of processes that utilizes cross-functional teams to build, test, and release software faster, in a reliable and repeatable manner, through automation [1][2][3].Recently, investment in Machine Learning (ML) applications has enabled stakeholders to solve complex business use cases that were difficult to solve.However, in most cases, ML applications are only a tiny part of a more extensive software system, and this small fraction of ML code is surrounded by a variety of software, libraries, and configuration files [4].Hence, the main challenge in ML applications is to build continuous software engineering practices [5,6], such as DevOps [7], which can promise stakeholders the seamless integration and deployment known as MLOps [8,9].MLOps refers to DevOps principles applied to ML applications.This paper introduces both DevOps and MLOps, provides a detailed explanation of both, and explains how to implement MLOps from the perspective of DevOps.Before diving deep into these technologies, it is helpful to understand some history behind DevOps and how MLOps has evolved from DevOps.This paper makes three contributions: 1.The literature on the motivations and the state of the art of MLOps is reviewed.
2. An overview of MLOps theory and DevOps theory relevant to the implementation of MLOps is presented, and an MLOps framework is proposed.3. The proposed framework is applied to a time-series forecasting application as a case study.The case study is implemented with MLOps pipelines.
Most importantly, this paper systematically presents the concept of MLOps from DevOps and explores how to implement and extend the generic MLOps pipeline to multiple use cases.These two aspects are the motivation and significance of this paper.The remainder of the article is organized as follows: Section 2 reviews related works from the context of MLOps.Section 3 provides an overview of DevOps principles that lead to the development of MLOps, along with our generic MLOps framework.Section 4 presents the application of the proposed generic MLOps use case for forecasting an hourly day-ahead electricity market price.

Related Work 2.1. Software Development Life Cycle
The Software Development Life Cycle (SDLC) is a methodology with defined processes for creating high-quality software [10].SDLC processes include different phases, such as planning, analysis, design, and implementation.The Waterfall model, Spiral model, Iterative model, Prototype model, V-model, Rapid Development model (RAD), and Agile model are some of the major SDLC models [10,11].For successful project implementation, it is crucial to select a proper SDLC model depending on different parameters, such as software complexity and type [12,13].Dayal Chauhan et al. [14] analyzed the impacts of various SDLC methods on the cost and risk of projects.Several authors have classified all of the available SDLC models into two types of methodology: heavyweight and lightweight methodologies [10,11,15].Heavyweight methodologies are mainly process-oriented, might not entertain requirement changes, and emphasize documentation.The Waterfall, Spiral, and Incremental models are a few examples of heavyweight methodologies [15,16].Lightweight methodologies are mainly people-oriented, entertain frequent requirement changes, have short development life cycles, and involve the customer.The Prototyping, RAD, and Agile models are a few examples of lightweight methodologies [15,16].
Several authors have compared lightweight and heavyweight methodologies and provided more insights on the SDLC selection process.For instance, Ben-Zahia and Jaluta [11] discussed the criteria for selecting proper SDLC models based on people, process, or plan orientations.A few authors have defined a third methodology called the hybrid development methodology, which uses both heavyweight and lightweight methods [17].Khan et al. [10] used the analytic hierarchy process to select the best SDLC model from all three methodologies.Among the lightweight, heavyweight, and hybrid SDLC methodologies, Waterfall and Agile are the most used SDLC methods, based on different parameters such as usability [18,19], cost [20,21], safety [22,23], and customer involvement [24].Several authors have shown that customers are transitioning to Agile from the traditional Waterfall SDLC due to the advantages of Agile, including short development life cycle, frequent changes, customer involvement, and usability [19,22].However, Agile SDLC methods have also been criticized by some practitioners in cases where the requirements do not often change [25] or where there are human-related challenges related to a lack of emotional intelligence.Various authors [26,27] have analyzed issues such as quality and productivity with traditional software development strategies such as Waterfall and compared them to the Agile methodology, where clear advantages can be seen in Agile.

Agile and DevOps
Agile methodologies are the most widely implemented project management approaches in modern software systems.The 12 principles of the Agile Manifesto [28,29] characterize the process integrity and methods of Agile project management, which are applied to different Agile methodologies.Scrum, extreme programming, lean software development, and crystal methodologies are some of the Agile methodologies [14,30,31].Ever-changing business needs demand a continuous process in software development, delivery, and system operations [32].Implementing these continuous software practices in Agile has enabled fast delivery of software [33].Martin Fowler introduced the idea of Continuous Integration (CI) and, later, J. Humble and D. Farley extended these ideas into Continuous Delivery (CD) as a concept of the deployment pipeline [34].Several authors, including Arachchi et al. [34] and Süß et al. [35], conducted research on automating CI and CD for Agile.With Agile, faster software development, quality improvement, frequent requirement changes, and customer involvement are achieved throughout the project.Nevertheless, the structural gap between the Development and Operations teams remains with the Agile methodologies.Development and Operations (DevOps) practices close this gap [36].
The working nature of software developers and operational professionals is different, and they work in isolation.This isolation might cause some conflicts between them [3].The Development and Operations teams work under different departments and leadership [37].Working in isolation leads to different Key Performance Indices (KPIs) [38], which are important in most organizations as they are used to define the performance of an individual or a team [39].Not only do KPIs define the performance alone, they are also a function of several metrics [40].An example of a developer KPI could be the time needed to roll out a feature with minor or no bugs.For the operations team, one KPI could be the time taken to roll out a feature with less or no downtime.To improve the KPIs, each team should concentrate on the task at hand.As the size of the software grows, maintainability and scalability become increasingly serious concerns [41,42].For instance: • An error might occur when the operations team tries to deploy a new feature to the production environment [43,44].However, the same code might work on the developer's machine and the Quality Assurance (QA) instance.• The software might not serve its core purpose at all.These are only a subset of the issues that can arise due to the Development and Operations teams working in isolation and chasing their own KPIs.None of the product owners [45], developers [46], or team leaders from either team are responsible for addressing the issues mentioned above.Even though these issues can be solved, this requires more discussions and code exchanges between the Development and Operations teams.Various studies [47][48][49] have been performed to understand the effects of such impacts.Any software project has one more key stakeholder-the customer [50,51].The customer plays a significant role in the software life cycle [13], as the customer's tolerance to risk and ability to support collaborative routines influences key parameters such as project cost and implementation time.
These issues are the driving factors that led to the formulation of the Agile and DevOps methodologies [52][53][54].Agile focuses on formal requirement gathering, on small-medium but rapid releases and, finally, on continuous customer feedback.It enforces collaboration between various teams to ensure rapid reaction times to ever-changing consumer needs [55,56].This collaboration contrasts with the traditional project management approach, which concentrates on long timelines and schedules.DevOps is the extension of the Agile methodology [57,58], and the two can work in tandem [37,51], as shown in Figure 1.The Agile workflow assumes that the software development process is divided into multiple sprints [59,60] and the customer is notified about the changes.A sprint is the smallest duration of Agile, where a team works on an assigned task [60].The main goal for the Development and Operations teams in a sprint is to produce a stable software re- The Agile workflow assumes that the software development process is divided into multiple sprints [59,60] and the customer is notified about the changes.A sprint is the smallest duration of Agile, where a team works on an assigned task [60].The main goal for the Development and Operations teams in a sprint is to produce a stable software release in every sprint cycle [61].Combining Agile activities with the DevOps framework is reasonable considering the software development and delivery aspects.One such example is integrating Agile tools with DevOps tools.

MLOps from DevOps
With the successful adoption of DevOps, various organizations are trying to incorporate continuous practices in ML system development [62].ML developers and operations teams are trying to adopt DevOps concepts to ML systems for end-to-end life-cycle automation.Containerized microservices and cloud-based DevOps have seen good stability and a reasonable success rate in production deployments [63].For example, Kubeflow [64,65] is an ML toolkit for Kubernetes, available for almost all major cloud providers.Such toolkits aim to create ML workflows in which containers are already present.Karamitsos et al. [66] discussed such concepts and issues of applying DevOps practices to ML applications, while John et al. [62] discusses DevOps applications in ML systems.Data play an influential role in any ML application.Unlike conventional software development practices, the SDLC of an ML application revolves around data.In most cases, the core ML code is minimal, but it must integrate with some significant components of a bigger system.Sculley et al. [4] analyzed such systems and explored the hidden technical debt.The complexity increases when ML applications are deployed on the cloud or interfaced with web APIs.Banerjee et al. [67] proposed operationalizing ML applications for such an environment, and MLOps could be implemented for such hybrid cloud deployments.
Several authors have investigated MLOps.Makinen et al. [9] studied the state of ML to understand the extent of MLOps required and to analyze the issues with data; however, the authors did not consider deployment.Some research has been performed on the trends and challenges in MLOps.Tamburri et al. [68] recapped trends in terms of properties such as the fairness and accountability of sustainable ML operations of a software system.Several works propose the MLOps framework for applications [62,69] such as the automotive industry, supply chain management, and IoT.Granlund et al. [70] presented issues related to the MLOps pipeline when multiple organizations are involved, highlighting factors such as scalability.Different cloud providers offer MLOps solutions as a service so that the whole ML product life cycle can be managed on the cloud.Azure MLOps [71] and AWS SageMaker for MLOps [72] are some examples of such cloud services.A few cloud providers have published a detailed guide on the MLOps life cycle for the purpose of building ML applications and MLOps pipelines on the cloud or on-premises servers [73].

Summary of MLOps from DevOps
This section presents state-of-the-art MLOps articles mentioned in Section 2.3 in terms of methodology, novelty, and results.Table 1 provides the summary.

Methodology
Novelty Result [62] Systematic literature review along with a grey literature review to derive a framework MLOps framework that describes the activities involved in the continuous development of the ML model Framework validation in three embedded systems case companies [65] Verified the feasibility of creating an ML pipeline with CI/CD capabilities on various appliances with specific hardware configuration

Performance evaluation of ML pipeline platforms characterized by
Kubeflow on different models according to various metrics.
Consumption of time and resources concerning the ML platform and computational models.

DevOps
DevOps is a set of practices or fault-tolerant workflows built to increase software quality by providing continuous development and delivery through end-to-end automation [35,53,74].DevOps practices bring together the development, testing, and operational software development teams through automation [2].DevOps enables a shorter code-build-deploy loop with a better end product [4,75].A typical DevOps workflow is shown in Figure 2.

DevOps Workflow and Components
The main focus of DevOps is to automate the software delivery process throughout, thereby ensuring continuous delivery and the feedback loop of the software [76].Continuous delivery combines the development, testing, and deployment processes into one streamlined operation.The primary goal is to quicken the whole process through automation.If both Development and Operations teams are practicing DevOps, they can quickly deploy code improvements, improving the transparency between two teams and allowing the end user to see changes quickly [53,74,77].Due to continuous deployment and the involvement of customers in the DevOps workflow, the customers do not have to wait for a monthly/quarterly/yearly software release cycle [53,78] to test or provide feedback about the software.
The DevOps workflow shown in Figure 2 helps the teams to build, test, and deploy software quickly and efficiently through a combination of tools and practices, from development to maintenance.DevOps reduces Time to Market (TTM) and enables Agile software development processes [79,80].These DevOps components are closely related to Agile.DevOps is the next step in the evolution of Agile methodologies [53,79].The following subsections discuss the components-or phases [53]-within the DevOps workflow [58,79,81].

Plan
In this phase, requirements are defined, and the i tools used for DevOps help the user and the develop phase, if Agile is used as the SDLC, the user stories with the teams [82].Various issue-tracking and proj [83], are used in this phase to help with the planning.

Code
Coding is the first step in DevOps automation, a adhere to the agreed coding standards and best practic (TDD), Acceptance Test-Driven Development (ATD ment (BDD) are a few of the best practices [84].One o is still highly neglected is versioning of the software only helps to maintain the software, but also helps t properly [85,86].It is essential to implement good prac as pull requests, proper branching, and commit messa have shown that artifact traceability can be achieved practices.
A branching strategy is used to share and collabo opment teams.Feature-branching-based versioning a two most widely used branching strategies.Some vers (SVN), and Team Foundation Server (TFS).

Plan
In this phase, requirements are defined, and the initial execution plan is created.The tools used for DevOps help the user and the developers to be constantly in sync.In this phase, if Agile is used as the SDLC, the user stories and tech stories should be defined with the teams [82].Various issue-tracking and project-management tools, such as Jira [83], are used in this phase to help with the planning.

Code
Coding is the first step in DevOps automation, and all of the team members should adhere to the agreed coding standards and best practices [2,84].Test-Driven Development (TDD), Acceptance Test-Driven Development (ATDD), and Behavior-Driven Development (BDD) are a few of the best practices [84].One of the significant areas in coding that is still highly neglected is versioning of the software via source control.Versioning not only helps to maintain the software, but also helps to automate the DevOps workflow properly [85,86].It is essential to implement good practices within the source control, such as pull requests, proper branching, and commit messages [87][88][89].Several authors [90,91] have shown that artifact traceability can be achieved if performed adequately with good practices.
A branching strategy is used to share and collaborate the code changes among development teams.Feature-branching-based versioning and trunk-based versioning are the two most widely used branching strategies.Some versioning tools include Git, Subversion (SVN), and Team Foundation Server (TFS).Figure 3 shows a feature branching strategy with different types of branches [92,93], while Figure 4 shows typical trunk-based versioning [94,95].

Build
The software build usually refers to the whole software package, consisting of the business logic, the software dependencies, and the environment [96].A few authors refer to the build phase as the verify phase [81].In either case, the main aim of DevOps build systems is to evaluate the correctness of software artifacts [81,97].Typically, three standard deployment instances are used while developing an application-development, Quality Assurance (QA), and production instances [98,99].The build systems in DevOps make sure that code integrity is maintained.Software build systems are highly dependent on software configuration management systems, and multiple builds are possible among these environments [97,100], including private system builds.Several authors, including Leite et al. [101], mention various build tools for DevOps.

Build
The software build usually refers to the whole software package, con business logic, the software dependencies, and the environment [96].A few to the build phase as the verify phase [81].In either case, the main aim of D systems is to evaluate the correctness of software artifacts [81,97].Typically ard deployment instances are used while developing an application-develo ity Assurance (QA), and production instances [98,99].The build systems in D sure that code integrity is maintained.Software build systems are highly d software configuration management systems, and multiple builds are po these environments [97,100], including private system builds.Several autho Leite et al. [101], mention various build tools for DevOps.

Build
The software build usually refers to the whole software package, co business logic, the software dependencies, and the environment [96].A few to the build phase as the verify phase [81].In either case, the main aim of systems is to evaluate the correctness of software artifacts [81,97].Typicall ard deployment instances are used while developing an application-devel ity Assurance (QA), and production instances [98,99].The build systems in sure that code integrity is maintained.Software build systems are highly software configuration management systems, and multiple builds are p these environments [97,100], including private system builds.Several auth Leite et al. [101], mention various build tools for DevOps.

Test
In the test phase of DevOps, automated testing is performed continuously to ensure the quality of the software artifact [84,102].There are different ways to include test cases such as units and integration while the software is being written.One such method is to use Test-Driven Development (TDD) [103].In this case, the developer writes the test cases first, and then the actual functionality.There is another approach called Behavior-Driven Development (BDD), which is an extension of TDD [104].Some good practices, such as code coverage [105], are part of the DevOps pipeline, and some cloud DevOps services, such as Azure DevOps [106], provide this within the service.

Release
Once all test cases are passed, the software is ready for deployment.By this phase, a code change has already passed a series of manual and automated tests [107].Regular feature releases can be carried out according to a regular schedule or once milestones are met.A manual approval process could be added at the release stage, allowing only a few people within an organization to authorize a release for production [74,78].

Deploy
In the deployment stage, the focus is on continuously deploying or redeploying the software.The deployment varies depending on the type and nature of the application.The deployment process can be automated easily for the production environment, using virtualization or containerization as the orchestration technology.Jenkins is one of the most widely used deployment tools in the industry [53,87].

Operate
The operate phase involves maintaining and troubleshooting applications in a production environment.Teams ensure system reliability and high availability.They aim for zero downtime while reinforcing security [108].Selecting the proper hardware size for scaling or implementing the scaling methods on the cloud is crucial for the application's uptime and ability to handle high loads [109,110].This configuration is performed in the operate phase.

Monitor
After the application is deployed and configured to handle the ever-changing load, it is essential to monitor the application to ensure that it is stable enough to handle the promised uptime [111].The monitor phase helps perform operations such as application health tracking and incident management."New Relic" is one such tool to monitor the application.

DevOps Pipeline
In practice, some of the DevOps components mentioned above are combined to form a sequence of operations called a pipeline.There is no single generalized pipeline structure [80,112].Every pipeline is unique, and the pipeline structure depends on the nature of the application and the implementation technology.In DevOps, the following common pipeline components can be found [53,54,58,77,84]: These pipelines include DevOps components mentioned in the previous section.Most of the DevOps implementations consider CI and CD as the core components.

Continuous Integration
Continuous integration is the practice of integrating code changes from multiple developers into a single source via automation [77].The important practice of CI is that all developers commit the code frequently to the main or trunk branch [113] mentioned in Section 3.1, subsection 'Plan' After the commitment, the code building is performed as explained in Section 3.1, subsection 'Build'.As soon as the build succeeds, the unit test cases are run as explained in Section 3.1, subsection 'Test'.Primarily, CI uses SCM (Software Configuration Management) tools such as Git, as explained in Section 3.1, subsection 'Code', to merge the code changes into SCM for code versioning.CI also performs automated code quality tests, syntax style reviews, and code validation [114].Since many developers integrate the code, the following issues may occur: • Code or merge conflicts: This is the state when SCM is unable to resolve the difference in code between two commits automatically.Until the merge conflict is resolved, the SCM will usually not allow the code to be merged.Usually, merge conflicts happen if the changes are inconsistent when merging changes from two branches [115].• Dead code: Dead code is a block of code never reached by the execution flow [116].
Dead code can be introduced at any step in the programming as part of a code merge.
For instance, consider two feature branches: Feature A and Feature B. Feature A's developers might have considered some exceptions from Feature B and created code blocks to handle that.However, if this exception never happens or some logic in the Feature B branch changes, this exception block will never be executed [117,118].• Code overwrites: As the software evolves, there is a high possibility that the old code needs to be updated to meet the ever-changing requirements.Nevertheless, this might also affect the old features, resulting in code breaks after merging.
One of the primary benefits of CI is that it saves time during the development cycle by identifying and addressing conflicts early [119].The first step in avoiding the abovementioned issues is to set up an automated testing pipeline [80,120].
In a CI testing pipeline, the code should be built successfully with no code conflicts before the tests are run.The CI pipeline tests vary depending on the nature of the application.To get an early warning of the possible issues, it is suggested to run this CI pipeline for every branch of the SCM and not only on the main or trunk branches.The basic CI pipeline is ready once the SCM and an automated unit testing framework are ready.The following are some of the critical points to keep in mind while implementing CI: • Unit tests are fast and cost less in terms of code execution time as they check a smaller block of code [121].• UI tests are slower and more complex to run, more complex to set up, and might also have high costs in terms of execution time and resource usage.Furthermore, a mobile development environment with multiple emulators and environments might add more complexity.Hence, UI tests should be selected more carefully and run less frequently than the unit tests [103,122].• Running these tests automatically for every code commit is preferred, but doing this on a development or feature branch might be costlier than manual testing [123].• To meet the code coverage criteria [105], both white-box and black-box testing could be part of CI.However, white-box testing can be time-consuming, depending on the codebase [124].• Combining code coverage metrics with a test pipeline is the most effective way to know how much code the test suite covers [105].This coverage report can help in eliminating dead code [125].
The following two practices help to avoid or detect early merge conflicts [115]: • Pushing of local changes to the SCM should be performed early and often.
• Changes should be pulled from the SCM before pushing.This frequent code pulling will reveal any merge conflicts early.
Various software builds such as Integration builds should be rebuilt and retested after every change [119].In a CI tool, if every test case runs smoothly without issues, and if there are no merge conflicts, the CI tool shows the current build status as "Pass".If anything goes wrong, this will be changed to "Fail".The priority of the whole development team is to keep the builds "Passing" [126].

Continuous Delivery and Continuous Deployment
Continuous Delivery (CD) and Continuous Deployment are implemented in the DevOps pipeline after CI.In Continuous Delivery, the aim is to keep the application ready for production deployment.At the least, unit test cases, a few optional quality checks, and other tests should have been completed before continuous delivery [54,127].Continuous Delivery/Deployment is a process of deploying the application to various deployment instances, such as testing or production.The key differences between Continuous Delivery and Continuous Deployment are as follows: The Continuous Delivery process is the frequent code shipping to production or test instances manually, whereas Continuous Deployment is the automated deployment of code to the instances.In both cases, the code is kept ready for deployment at any point [53,58,120].
Continuous Delivery (CD) is an extension of the previously discussed CI.CD automatically deploys all code changes to the testing and/or production environment after the code-building stage.It is possible to deploy the application manually from the CD pipeline utilizing a manual trigger.CD is mainly used for automated deployments as soon as the build artifacts are ready.In this paper, a build artifact is defined as a binary such as a container or a web portal build after this binary has passed all of the tests and the build stage [128,129].CD is beneficial when the build artifacts are deployed to production as soon as possible.This frequent deployment ensures that the software release contains small batches, which are easy to debug in case of any issues in the post-production deployment.Figure 5 shows the sample CI/CD pipeline.
Continuous Delivery (CD) and Continuous Deployment are implemented in the DevOps pipeline after CI.In Continuous Delivery, the aim is to keep the application ready for production deployment.At the least, unit test cases, a few optional quality checks, and other tests should have been completed before continuous delivery [54,127].Continuous Delivery/Deployment is a process of deploying the application to various deployment instances, such as testing or production.The key differences between Continuous Delivery and Continuous Deployment are as follows: The Continuous Delivery process is the frequent code shipping to production or test instances manually, whereas Continuous Deployment is the automated deployment of code to the instances.In both cases, the code is kept ready for deployment at any point [53,58,120].
Continuous Delivery (CD) is an extension of the previously discussed CI.CD automatically deploys all code changes to the testing and/or production environment after the code-building stage.It is possible to deploy the application manually from the CD pipeline utilizing a manual trigger.CD is mainly used for automated deployments as soon as the build artifacts are ready.In this paper, a build artifact is defined as a binary such as a container or a web portal build after this binary has passed all of the tests and the build stage [128,129].CD is beneficial when the build artifacts are deployed to production as soon as possible.This frequent deployment ensures that the software release contains small batches, which are easy to debug in case of any issues in the post-production deployment.Figure 5 shows the sample CI/CD pipeline.

Continuous Monitoring
Continuous Monitoring (CM) is an automated DevOps pipeline to monitor the vitals of the deployed application on the production instance.CM comes at the end of the DevOps pipeline and provides real-time data from the monitoring instance.CM helps to avoid and track system downtimes and evaluate application performance, security threats, and compliance concerns [130].

MLOps
Machine Learning Operations (MLOps) is a set of practices that aims to maintain and deploy Machine Learning code and models with high reliability and efficiency.MLOps is primarily based on DevOps practices such as CI and CD to manage the ML life cycle [131,132].The main target of MLOps is to achieve faster development and deployment of the ML models with high quality, reproducibility, and end-to-end tracking.Like DevOps, MLOps also enables a shorter code-build-deploy loop and aims to automate and monitor all steps of ML [9,68,69].

MLOps Workflow and Components
Like DevOps, the focus of MLOps is to automate the software delivery process throughout, ensuring continuous delivery and a feedback loop of the software.However, in most cases, the ML application must work with the other software assets in a DevOps-

Continuous Monitoring
Continuous Monitoring (CM) is an automated DevOps pipeline to monitor the vitals of the deployed application on the production instance.CM comes at the end of the DevOps pipeline and provides real-time data from the monitoring instance.CM helps to avoid and track system downtimes and evaluate application performance, security threats, and compliance concerns [130].

MLOps
Machine Learning Operations (MLOps) is a set of practices that aims to maintain and deploy Machine Learning code and models with high reliability and efficiency.MLOps is primarily based on DevOps practices such as CI and CD to manage the ML life cycle [131,132].The main target of MLOps is to achieve faster development and deployment of the ML models with high quality, reproducibility, and end-to-end tracking.Like DevOps, MLOps also enables a shorter code-build-deploy loop and aims to automate and monitor all steps of ML [9,68,69].

MLOps Workflow and Components
Like DevOps, the focus of MLOps is to automate the software delivery process throughout, ensuring continuous delivery and a feedback loop of the software.However, in most cases, the ML application must work with the other software assets in a DevOps-based CI/CD environment.In such cases, additional steps are introduced to the existing DevOps process because existing DevOps tools and pipelines cannot be applied to ML applications [9].The adaptation of MLOps practices is still in its initial stages, as there is little research on MLOps compared with DevOps [62].For the ML systems, data scientists and operations teams are trying to automate the end-to-end life cycle of ML by utilizing DevOps concepts [66].Due to the variations in ML methodologies, it is challenging to generalize MLOps components.
To date, numerous designs with different components have been proposed, such as Iterative/Incremental processes [132] and Continuous Delivery for Machine Learning (CD4ML) [133].In 2015, Sculley et al. [4] highlighted that in most real-world ML applications, the quantity of actual ML code is significantly smaller than the surrounding infrastructure, and a vast infrastructure supports this small ML code.The authors also discussed technical issues and challenges in ML systems, such as model complexity, reproducibility of the results, testing, and monitoring.Most of these are also relevant for DevOps components.Hence, it is vital to include the data, infrastructure, and core ML code in the MLOps life cycle.Numerous ML life-cycle processes such as CRISP-ML(Q) [134] have been proposed to establish a standard process model for ML development.John et al. [62] presented a maturity model outlining various stages by which companies evolve their MLOps approaches.
A generic MLOps workflow is shown in Figure 6.As mentioned, the concerns cited by Sculley et al. [4]-such as code maintenance problems and system-level issues-are also present in traditional software development, and DevOps solves most of them via CI/CD.This CI/CD process creates reliable pipelines with assured quality to release the software into production.A cross-functional team is a way of involving expertise from different functional areas [78,135], such as data scientists and ML engineers [131].In MLOps, a cross-functional team produces ML applications in small increments based on three parameters: code, data, and models.These can be released and reproduced at any time, using a constant seed value for random sample initialization to set the weights of the trainable layers, if applicable [136].The following subsections define the generic MLOps process model illustrated in Figure 6.
DevOps process because existing DevOps tools and pipelines cannot be applied to ML applications [9].The adaptation of MLOps practices is still in its initial stages, as there is little research on MLOps compared with DevOps [62].For the ML systems, data scientists and operations teams are trying to automate the end-to-end life cycle of ML by utilizing DevOps concepts [66].Due to the variations in ML methodologies, it is challenging to generalize MLOps components.
To date, numerous designs with different components have been proposed, such as Iterative/Incremental processes [132] and Continuous Delivery for Machine Learning (CD4ML) [133].In 2015, Sculley et al. [4] highlighted that in most real-world ML applica tions, the quantity of actual ML code is significantly smaller than the surrounding infra structure, and a vast infrastructure supports this small ML code.The authors also dis cussed technical issues and challenges in ML systems, such as model complexity, repro ducibility of the results, testing, and monitoring.Most of these are also relevant fo DevOps components.Hence, it is vital to include the data, infrastructure, and core ML code in the MLOps life cycle.Numerous ML life-cycle processes such as CRISP-ML(Q [134] have been proposed to establish a standard process model for ML development John et al. [62] presented a maturity model outlining various stages by which companies evolve their MLOps approaches.
A generic MLOps workflow is shown in Figure 6.As mentioned, the concerns cited by Sculley et al. [4]-such as code maintenance problems and system-level issues-are also present in traditional software development, and DevOps solves most of them via CI/CD.This CI/CD process creates reliable pipelines with assured quality to release the software into production.A cross-functional team is a way of involving expertise from different functional areas [78,135], such as data scientists and ML engineers [131].In MLOps, a cross-functional team produces ML applications in small increments based on three parameters: code, data, and models.These can be released and reproduced at any time, using a constant seed value for random sample initialization to set the weights o the trainable layers, if applicable [136].The following subsections define the generic MLOps process model illustrated in Figure 6.The MLOPs workflow in Figure 6 is similar to the DevOps workflow in Figure 2 bu introduces two new components: data and model.Furthermore, the MLOps component testing, deployment, and monitoring are slightly different from their DevOps counter parts.Along with data and model, these differences are explained in the following sec tions.

Data
ML is driven by data; hence, data analysis and operations are vital to MLOps [137,138].Unlike DevOps, ML operations are experimental, which is true in almost all o the steps of MLOps.For instance, hyperparameter optimization is varied during imple mentation.The same is true for data, and the following operations are involved in data analysis [139,140]: The MLOPs workflow in Figure 6 is similar to the DevOps workflow in Figure 2 but introduces two new components: data and model.Furthermore, the MLOps components testing, deployment, and monitoring are slightly different from their DevOps counterparts.
Along with data and model, these differences are explained in the following sections.

Data
ML is driven by data; hence, data analysis and operations are vital to MLOps.[137,138].Unlike DevOps, ML operations are experimental, which is true in almost all of the steps of MLOps.For instance, hyperparameter optimization is varied during implementation.The same is true for data, and the following operations are involved in data analysis [139,140]: The sequence and usage of these components depend on the type of ML and the nature of the application.
Data extraction is mainly concerned with the gathering of data from different sources.The data sources-such as online APIs, cloud-based data lakes, CSV files, or a combination of these-can be diverse.Extra precautions should be taken while extracting the data for some ML tasks.For instance, for a classification issue, the data must be balanced after they are extracted.Failing to do so might degrade the classifier's performance [141].The data extraction component in the MLOps pipeline is usually the first step, in which data from one or more sources are integrated for further processing [139].
To detect inaccurate data early and avoid training ML models with flawed data, the suggested technique is to incorporate a data validation process [140].Common data quality issues include the following [140,142]: • Incomplete data; for instance, the presence of null values.
• Inconsistent data, such as issues with the data type.
• Inaccurate data; for example, data collection with the wrong measurements.
Data analysis is a crucial step in the creation of a model.In this step, Exploratory Data Analysis (EDA) is performed to understand the characteristics of the data [138].Depending on this knowledge, feature engineering is performed in the following steps, and a suitable model is designed.
In data preparation, the validated data are split into three standard datasets: training, test, and validation [143].Features are selected from the data, data-cleaning operations are performed, and some extra features are added after EDA.EDA shows the trends and patterns in the data, and we can add more features from new data sources to support the existing data [144].In data preparation, common data quality issues can be fixed.If required, data transformations such as date-time format matches from different data sources [145] are performed on the data and, finally, the three sets of data are sent to the model.

Model
The ML model is the heart of any ML application.Neural networks are the most common type of ML model, and the rest of this paper assumes that the ML model is a neural network.Once the model structure is defined, model training, model evaluation, and model validation operations are performed.
Model training trains one or several models with the prepared data.Hyperparameter optimization is performed, where model variables such as the number of layers and nodes in each layer are optimized in different iterations.After this optimization, the model is trained or well-fitted [146,147].
Once the model is trained, it is evaluated in model evaluation of the validation data.The trained model is evaluated using the held-out validation datasets to measure the model's quality.Model validation gives the measure of performance of the model [148].Metrics such as absolute error and mean absolute error are used to define the model quality.These metrics are helpful in testing and comparing different models [149,150].

Testing
As in DevOps, unit and integration tests should be performed.Testing of ML models is mostly limited to checks related to the convergence of models, shapes of passed tensors/vectors, and other model-related variables.However, there is a lot of code surrounding an ML model, which should also be tested [151].However, white-box testing for ML-based systems could entail high test efforts due to the large test input space [152].In MLOps, test cases should check the proper input and output data format.

Deployment
Unlike DevOps, deploying ML applications is not straightforward, especially if the ML application is a part of a DevOps application such as a web API.In ML applications, the arrival of new data triggers the retraining and redeployment of models.An automated pipeline must be created to perform these actions [153].

Monitoring
It is crucial to monitor the performance of the deployed ML model.Continuous monitoring helps to understand the model performance and trigger retraining if required [154].In DevOps, the main concern is ensuring that the application is healthy and able to handle the load.If the ML is part of an application such as a web API, the monitoring component should check for ML parameters such as data drift and model performance [131,155].
As in DevOps, some of these steps are combined to form a pipeline.Usually, these are combined with DevOps pipelines.As in DevOps, there is no single generalized pipeline structure.Every pipeline is unique, and the pipeline structure depends on the nature of the application, the type of ML, and the implementation technology.

Continuous Integration
As in DevOps, the CI pipeline is about the testing and validation of code components.For ML applications, data and model validations are added along with classical unit and integration tests [65].Unit tests are written to cover the changes in feature engineering, and different methods used to implement the models.Moreover, tests should be written to check the convergence of the model training.During training, a machine learning model reaches a convergence state when the model loss value settles within an error range, after which any additional training might not improve the model's accuracy [156].

Continuous Deployment
There are considerable changes in the Continuous Deployment pipeline compared to DevOps.As the ML models evolve continuously, verifying the models' compatibility with the target deployment environments with respect to computing power and any changes in the deployment environments is essential.The process changes depending on the use case and whether the ML prediction is online or batch processing [157].

Continuous Training
This new pipeline component is unique to MLOps., The Continuous Training (CT) pipeline automatically retrains the model [158].Different ML components explained in Section 3.2.1 are automated to work in a sequence to achieve this.Retraining the model is essential, as the data keep changing or updating in any ML application.To cope with the new incoming data, the model needs retraining.This retraining includes automating several model retraining, data validation, and model validation processes.To initiate such processes, triggers are included in the pipeline.Some additional components, such as feature storage and metadata management, are used along with these pipelines in MLOps.They help in managing data and reproducibility aspects.These are discussed below.

Feature Store
Due to many variations, such as the components involved in the ML applications, feature stores are used.This feature store acts as a central repository for standardizing the definition, access, and storage of a feature set.A feature store helps to achieve the following [159]: • Store commonly used features; • Build feature sets from the available raw data; • Reuse custom feature sets; • Model monitoring and data drift detection; • Transform and store the data for training or inference purposes.

Metadata Management
The fundamental nature of ML is experimentation.It would not be easy to track the steps that lead to an ideal model or the best-performing dataset with many experiments in all of the different components.With the help of metadata management, almost all of the metadata for the whole ML process can be tracked and used to repeat the desired result.The most significant metadata include [160]: • Experiments and training metadata-Metadata such as environment configuration, hardware metrics, code versions, and hyperparameters.• Artifact metadata-Metadata such as dataset paths, model hashes, dataset previews, and artifact descriptions.• Model metadata-Model-related metadata such as model versions, data and performance drifts, and hardware usage.• Pipeline metadata-Pipeline metadata such as node information and completion re- ports of each pipeline.
MLOps pipelines might look highly automated.Nevertheless, not all of the pipelines or pipeline components are necessary to implement.The pipeline and its components can be selected depending on the implementation and use case.Sculley et al. [4] mentioned that only a minority of the application is ML code, since most of the system is composed of data collection and verification components, model analysis and building, resource and metadata management, automation, and configuration.Google Cloud [161] proposes three MLOps process levels for implementing the CI/CD pipeline for different needs based on the work of Sculley et al. [4]; these define the different levels of maturity of the MLOps processes: • MLOps level 0: Manual process; • MLOps level 1: ML pipeline automation; • MLOps level 2: CI/CD pipeline automation.

Research Gap
In the next section, MLOps is applied for a time-series forecasting application, which predicts the price of a day-ahead hourly electricity market.The MLOps level 2 automated CI/CD pipeline is implemented.CI and CD services are utilized for reliable delivery of the ML results.
This use case fills the following research gap observed in Sections 2 and 3: • Even though the core ML is a tiny part of the whole software ecosystem, the ML application needs various new tools for the MLOps implementation.• MLOps tools might not be compatible with the DevOps tools, burdening the complete system.• Creating an MLOps pipeline with traditional software-such as a web application where DevOps is already implemented-is an issue.• Additionally, to explain the generalization of the created MLOps pipeline and the tools, Case Study 1 is extended to two more case studies.In the last two case studies, the price forecast solution (Case Study 1) is adapted with minimal work.

Case Study 1: Forecasting an Hourly Day-Ahead Electricity Market Using an MLOps Pipeline
The reliable operation of the electric power grid relies on a complex system of markets, which ensures that electric power consumption and generation are matched at every point in time.This matching is crucial for the stability of the grid, since the grid cannot store electric energy.From an MLOps perspective, forecasting the price of any electricity market depends on the market schedule, and the ML experts do not need to understand how the market contributes to stabilizing the grid.The timing characteristics of the market represent important knowledge for the developers of time-series forecasting solutions.In particular, two characteristics are crucial: • Does the price change weekly, daily, hourly, or at some other interval?
• When does the market participation occur and, thus, how far into the future should forecasts be available?
There are numerous markets in a single country, and significant differences can exist between different countries.Thus, an ML team working on electricity price forecasting would do well to avoid hardcoding assumptions related to the questions above, and should instead parameterize them.A common market structure in Europe and elsewhere is the hourly day-ahead market [162,163], which used as a concrete example in this paper.Such markets have the following characteristics: • The market interval is one hour; in other words, there is a separate price for each hour.
• The market participants need to place their bids on the previous day; for example, before the market deadline today, separate bids should be placed for each hour of the next day.
In our case study, we look at a specific hourly day-ahead market-the Finnish Frequency Containment Reserves for Normal Operations (FCR-N) market.The FCR markets compensate participants for maintaining a reserve that can be activated to generate or consume energy in case such an activation is required due to a momentary imbalance in the power grid.An offline neural-network-based forecasting solution for this market is presented in [164].

The Frequency Containment Reserves Market
With the advent of smart grids and Virtual Power Plants (VPPs), various Distributed Energy Resources (DERs)-such as smart loads, batteries, photovoltaics (PVs), and wind power-are being exploited on various electricity markets, including frequency reserves [165][166][167][168]. Frequency reserves with a fast response time for frequency deviations are generally called Primary Frequency Reserves (PFRs), and traditionally they consist of fossil-fuel-burning spinning reserves.These reserves are now being replaced with DERs in the push towards reducing carbon emissions [169]; due to this, under the current allowed delays for PFRs, the reduced grid inertia is becoming a threat to the stability of power systems [170].The FCR-N market was selected among other ancillary services for this case study, as the FCR-N does not have hard real-time constraints and a minimum power bid for market participation.The FCR-N is a day-ahead market, and bidders must submit all of the bids for the hours of the next day before 6 p.m. of the current day [171,172].
If the day-ahead reserve market prices can be predicted, then the DER owners can anticipate variations in price peaks and low or zero prices.Thus, this case study presents a solution based on artificial neural networks, which are deployed online, so that the predictions are automatically updated and available before the bidding deadline of the day-ahead market.A transformer-based ANN model is exploited to predict the day-ahead ancillary energy market prices.The MLOps pipelines are configured to ingest the data at 1 p.m. on the current day, so the forecasts are available, e.g., at 2 p.m., so that the person or system doing the bidding has the forecast a few hours before submitting the bid.

Prediction Model
For the energy price predictions in ML, ANNs are widely used.For instance, based on the architecture defined in [173], Recurrent Neural Networks (RNNs) and feed-forward neural networks are the two major ANN categories.RNNs can predict the high energy spikes better, whereas feed-forward networks can predict the spot market prices for day-ahead prediction [174].An ANN was employed to predict ancillary market prices by considering different data sources where the results outperformed Support-Vector Regression (SVR) and Autoregressive Integrated Moving Average (ARIMA) [164].For implementing the FCR-N market price forecasting using the MLOps pipeline, this case study uses the Temporal Fusion Transformer (TFT) model structure defined in [175].Energy price prediction datasets have a time component, and forecasting the future price values can provide significant value for multi-horizon forecasting, i.e., predicting variables of interest at multiple future time steps.Deep Neural Networks (DNNs) have been used in multi-horizon forecasting, showing substantial performance improvements over traditional time-series models.However, most existing RNN models often do not consider the different inputs commonly present in multi-horizon forecasting, and either assume that all exogenous inputs are known in the future or do not consider static covariates.Conventional timeseries models are influenced by complex nonlinear interactions between many parameters, making it difficult to explain how such models arrive at their predictions.Attention-based models are proposed for sequential data such as energy price prediction datasets.However, multi-horizon forecasting has many different types of inputs, and attention-based models can provide an understanding of appropriate time steps, but they cannot contrast the importance of different features at a given time step, and TFT solves these issues in terms of accuracy and interpretability.
TFT is an attention-based architecture that combines multi-horizon forecasting with interpretable insights into temporal dynamics.TFT utilizes recurrent layers for local processing and interpretable self-attention layers for learning long-term dependencies with the knowledge of temporal relationships at diverse scales [175].The major components of TFT are summarized below.
• Gating mechanisms skip any unused components of the model.
• Variable selection networks choose relevant input variables at each time step.
• Static features can have a meaningful impact on forecasts, and static covariate encoders integrate such features based on which temporal dynamics are modeled.• A sequence-to-sequence temporal processing layer to learn long-term and short-term temporal relationships.
Table 2 provides additional information on TFT hyperparameters for practitioners who may wish to reproduce the results.

MLOps Pipeline for FCR-N Market Price Forecasting
The main scope of this example is to define an MLOps pipeline for such applications and determine how the whole process could be automated.First, the data are ingested via Fingrid and the Finnish Meteorological Institute's (FMI) REST API into raw data storage.The data from Fingrid and FMI are available online via the REST API, but not as a manual download via files; hence, it is possible to automate this pipeline.Next, the data are prepared for further processing by selecting the essential features from the raw data storage, and this feature selection is based on past experimentation.Data are then validated for missing values or the presence of NaN values and stored in a feature store for data reusability.Additionally, EDA is performed on the data to understand any data drifts.The next step is to build the model through model training and evaluation.Once the model is evaluated, the prediction is performed on the new dataset.The performance of the model is evaluated for monitoring purposes.The resulting pipeline architecture is shown in Figure 7.

MLOps Pipeline for FCR-N Market Price Forecasting
The main scope of this example is to define an MLOps pipeline for such applicat determine how the whole process could be automated.First, the data are ingested vi and the Finnish Meteorological Institute's (FMI) REST API into raw data storage.from Fingrid and FMI are available online via the REST API, but not as a manual d via files; hence, it is possible to automate this pipeline.Next, the data are prepared fo processing by selecting the essential features from the raw data storage, and this featu tion is based on past experimentation.Data are then validated for missing values or ence of NaN values and stored in a feature store for data reusability.Additionally performed on the data to understand any data drifts.The next step is to build th through model training and evaluation.Once the model is evaluated, the predictio formed on the new dataset.The performance of the model is evaluated for monitor poses.The resulting pipeline architecture is shown in Figure 7.

MLOps Runtime Environment
As mentioned in Section 3.1, subsection 'Code', the first step is to ensure that is appropriately versioned.GIT was selected as the version control for this impl tion, and GIT feature branching was followed.Figure 8 shows the commits branches.

MLOps Runtime Environment
As mentioned in Section 3.1, subsection 'Code', the first step is to ensure that the code is appropriately versioned.GIT was selected as the version control for this implementation, and GIT feature branching was followed.Figure 8 shows the commits and GIT branches.
Two feature branches were created for the development of the solution, namely, "feature/build-model" and "feature/ingest-data".Additionally, one "develop" branch and one "release" branch were created.The develop branch was branched out from the main branch, and the feature branches were branched out of the develop branch.The feature branches were frequently merged back to the development branch, and all the feature branches were updated with the recent changes.Once the final code was merged back to the develop branch, the code was planned for release from the release branch.The code was versioned correctly, and a stable version of the code was always kept in the main branch after tagging correctly.Two feature branches were created for the development of the solution, namely, "feature/build-model" and "feature/ingest-data." Additionally, one "develop" branch and one "release" branch were created.The develop branch was branched out from the main branch, and the feature branches were branched out of the develop branch.The feature branches were frequently merged back to the development branch, and all the feature branches were updated with the recent changes.Once the final code was merged back to the develop branch, the code was planned for release from the release branch.The code was versioned correctly, and a stable version of the code was always kept in the main branch after tagging correctly.
In this use case-MLOps level 2-CI/CD implementation was followed as explained at the end of Section 3. Hence the CI/CD should include source control, ML pipelines, test services, model services, feature storage, and deployment services.The first step in the CI/CD process is to create pipelines.These pipelines run in a sequence to implement the price forecast task.Moreover, the pipelines should provide a supportive environment, including runtimes and libraries.We selected PyTorch as the ML library and Python as the scripting language.Python libraries such as NumPy and Pandas were installed in a virtual environment and loaded by the pipeline at the beginning to prepare the environment for the forecasting job.For this implementation, 10 pipelines were selected, and the modular code was created to run in each component.The required unit tests were also implemented and included along with data validation.The trigger was set to any commit changes in the SCM main branch; the pipelines would be triggered upon such changes, and the code would be deployed to generate the forecasting results.There is a variety of software available for creating CI/CD pipelines.Jenkins was selected as the CI/CD server because it is the most commonly used CI/CD server for DevOps.MLOps pipelines were incorporated using Jenkinsfile and Python.In this use case, we also used the DevOps process for implementing a web UI using Django, where the Django web framework was used for viewing the predicted result.All of the required software-including GIT, Jenkins, Python, Anaconda, and Docker-was installed on the CSC cloud.
Figure 9 shows the pipelines created for the FCR-N market price forecaster.A new SCM commit change at the main branch would trigger the pipeline.The following pipelines were created in this example implementation: In this use case-MLOps level 2-CI/CD implementation was followed as explained at the end of Section 3. Hence the CI/CD should include source control, ML pipelines, test services, model services, feature storage, and deployment services.The first step in the CI/CD process is to create pipelines.These pipelines run in a sequence to implement the price forecast task.Moreover, the pipelines should provide a supportive environment, including runtimes and libraries.We selected PyTorch as the ML library and Python as the scripting language.Python libraries such as NumPy and Pandas were installed in a virtual environment and loaded by the pipeline at the beginning to prepare the environment for the forecasting job.For this implementation, 10 pipelines were selected, and the modular code was created to run in each component.The required unit tests were also implemented and included along with data validation.The trigger was set to any commit changes in the SCM main branch; the pipelines would be triggered upon such changes, and the code would be deployed to generate the forecasting results.There is a variety of software available for creating CI/CD pipelines.Jenkins was selected as the CI/CD server because it is the most commonly used CI/CD server for DevOps.MLOps pipelines were incorporated using Jenkinsfile and Python.In this use case, we also used the DevOps process for implementing a web UI using Django, where the Django web framework was used for viewing the predicted result.All of the required software-including GIT, Jenkins, Python, Anaconda, and Docker-was installed on the CSC cloud.
Figure 9 shows the pipelines created for the FCR-N market price forecaster.A new SCM commit change at the main branch would trigger the pipeline.The following pipelines were created in this example implementation: • Deployment and monitoring pipelines with a few sub-pipelines.
"Data Ingestion-Data Fetch" fetches the data from the Fingrid and FMI API."Data Ingestion-Store Raw Data" pipeline stores these data in an SQL database, and PostgreSQL is used as the database, which stores the raw data."Data Preparation-Feature Selection" performs the feature selection from the raw data.Different experiments have been performed in the past, based which this pipeline selects the features."Data Preparation-Data Validator" validates the data and checks against different validation rules and data consistency, including null values, the presence of NaN, and data types.Along with this code level, unit test cases are also executed to ensure code integrity.These data are then stored in a feature store for future use and analysis using the "Data Preparation-Feature Store" pipeline."Data Ingestion-Data Fetch" fetches the data from the Fingrid and FMI API."Data Ingestion-Store Raw Data" pipeline stores these data in an SQL database, and Post-greSQL is used as the database, which stores the raw data."Data Preparation-Feature Selection" performs the feature selection from the raw data.Different experiments have been performed in the past, based which this pipeline selects the features."Data Preparation-Data Validator" validates the data and checks against different validation rules and data consistency, including null values, the presence of NaN, and data types.Along with this code level, unit test cases are also executed to ensure code integrity.These data are then stored in a feature store for future use and analysis using the "Data Preparation-Feature Store" pipeline.
Data analysis is performed automatically with the "EDA" pipeline to avoid data drift and other issues with data quality.The next step is to normalize the data and build the model, and this is done via the "Model Building-Model Training" and "Model Building-Model evaluation" pipelines.It is then necessary to create a docker container of the prediction model."Deployment-Prediction" pipeline does that, and then forecasts the FCR-N market price for the next 24 h.The forecasted result is stored in the PostgreSQL database.The model's accuracy and performance are monitored via the "Monitoring-Performance monitor" pipeline.The EDA and performance monitoring results are monitored regularly, and the pipelines are updated if the performance has gone down.
Due to the modular nature of the CI/CD pipeline design, it is easy to plug in/out a module.If a build fails-for instance, at the "Data Ingestion-Data Fetch"-the CI/CD system shows a detailed log on the CI/CD UI, as shown in Figure 10.These logs are maintained in the CI/CD system so that it is easy to debug and replicate the errors in the developer's machine using the local build discussed in Section 3.1, subsection 'Build'.Figure 10 shows the failure pipeline.Once the pipeline finishes the job execution, depending on the build's status, an email is triggered from the CI/CD system Data analysis is performed automatically with the "EDA" pipeline to avoid data drift and other issues with data quality.The next step is to normalize the data and build the model, and this is done via the "Model Building-Model Training" and "Model Building-Model evaluation" pipelines.It is then necessary to create a docker container of the prediction model."Deployment-Prediction" pipeline does that, and then forecasts the FCR-N market price for the next 24 h.The forecasted result is stored in the PostgreSQL database.The model's accuracy and performance are monitored via the "Monitoring-Performance monitor" pipeline.The EDA and performance monitoring results are monitored regularly, and the pipelines are updated if the performance has gone down.
Due to the modular nature of the CI/CD pipeline design, it is easy to plug in/out a module.If a build fails-for instance, at the "Data Ingestion-Data Fetch"-the CI/CD system shows a detailed log on the CI/CD UI, as shown in Figure 10."Data Ingestion-Data Fetch" fetches the data from the Fingrid and FMI API."Data Ingestion-Store Raw Data" pipeline stores these data in an SQL database, and Post-greSQL is used as the database, which stores the raw data."Data Preparation-Feature Selection" performs the feature selection from the raw data.Different experiments have been performed in the past, based which this pipeline selects the features."Data Preparation-Data Validator" validates the data and checks against different validation rules and data consistency, including null values, the presence of NaN, and data types.Along with this code level, unit test cases are also executed to ensure code integrity.These data are then stored in a feature store for future use and analysis using the "Data Preparation-Feature Store" pipeline.
Data analysis is performed automatically with the "EDA" pipeline to avoid data drift and other issues with data quality.The next step is to normalize the data and build the model, and this is done via the "Model Building-Model Training" and "Model Building-Model evaluation" pipelines.It is then necessary to create a docker container of the prediction model."Deployment-Prediction" pipeline does that, and then forecasts the FCR-N market price for the next 24 h.The forecasted result is stored in the PostgreSQL database.The model's accuracy and performance are monitored via the "Monitoring-Performance monitor" pipeline.The EDA and performance monitoring results are monitored regularly, and the pipelines are updated if the performance has gone down.
Due to the modular nature of the CI/CD pipeline design, it is easy to plug in/out a module.If a build fails-for instance, at the "Data Ingestion-Data Fetch"-the CI/CD system shows a detailed log on the CI/CD UI, as shown in Figure 10.These logs are maintained in the CI/CD system so that it is easy to debug and replicate the errors in the developer's machine using the local build discussed in Section 3.1, subsection 'Build'.Figure 10 shows the failure pipeline.Once the pipeline finishes the job execution, depending on the build's status, an email is triggered from the CI/CD system to the configured emails notifying about the new build job.This notification system is created while building the pipelines.These logs are maintained in the CI/CD system so that it is easy to debug and replicate the errors in the developer's machine using the local build discussed in Section 3.1, subsection 'Build'.Figure 10 shows the failure pipeline.Once the pipeline finishes the job execution, depending on the build's status, an email is triggered from the CI/CD system to the configured emails notifying about the new build job.This notification system is created while building the pipelines.
MLflow was selected as the ML application life-cycle management tool.MLflow was integrated with the ML model operations to track the model parameters and performance metrics.Figure 11 shows the MLflow UI hosted on an SaaS service.
MLflow was selected as the ML application life-cycle management tool.MLflow was integrated with the ML model operations to track the model parameters and performance metrics.Figure 11 shows the MLflow UI hosted on an SaaS service.MLflow includes experimentation tracking and reproducibility.This is achieved by logging the metrics and the parameters of each experiment.As shown in Figure 11, several experiments' results can be logged and compared.MLflow stores the artifacts such as configuration files that save information about the model input and the model under training.These artifacts can be stored in a cloud object store such as MinIO for future reproducibility, as shown in Figure 12.

FCR-N Forecasting Results
Figure 13 shows the one-day predicted vs. actual price prediction for the FCR-N market price.The ML model runs daily, and the predicted values are stored in the database.A Django-based web UI was designed for viewing these values and hosted in the CSC cloud as an SaaS, as shown in Figure 14.This SaaS service can also act as a REST API to expose the predicted data to a third-party system.MLflow includes experimentation tracking and reproducibility.This is achieved by logging the metrics and the parameters of each experiment.As shown in Figure 11, several experiments' results can be logged and compared.MLflow stores the artifacts such as configuration files that save information about the model input and the model under training.These artifacts can be stored in a cloud object store such as MinIO for future reproducibility, as shown in Figure 12.
MLflow was selected as the ML application life-cycle management tool.MLflow was integrated with the ML model operations to track the model parameters and performance metrics.Figure 11 shows the MLflow UI hosted on an SaaS service.MLflow includes experimentation tracking and reproducibility.This is achieved by logging the metrics and the parameters of each experiment.As shown in Figure 11, several experiments' results can be logged and compared.MLflow stores the artifacts such as configuration files that save information about the model input and the model under training.These artifacts can be stored in a cloud object store such as MinIO for future reproducibility, as shown in Figure 12.

FCR-N Forecasting Results
Figure 13 shows the one-day predicted vs. actual price prediction for the FCR-N market price.The ML model runs daily, and the predicted values are stored in the database.A Django-based web UI was designed for viewing these values and hosted in the CSC cloud as an SaaS, as shown in Figure 14.This SaaS service can also act as a REST API to expose the predicted data to a third-party system.

FCR-N Forecasting Results
Figure 13 shows the one-day predicted vs. actual price prediction for the FCR-N market price.The ML model runs daily, and the predicted values are stored in the database.A Django-based web UI was designed for viewing these values and hosted in the CSC cloud as an SaaS, as shown in Figure 14.This SaaS service can also act as a REST API to expose the predicted data to a third-party system.

Case Study 2: National Electricity Consumption Forecast Adapted from the Price Forecast MLOps Pipeline
In the above use case, the MLOps pipeline was defined and implemented as shown in Figure 7, while in this use case, the same MLOps pipeline was adapted with minimal changes to forecast the Finnish national electricity consumption.The data were collected from the Finnish Transmission System Operator (TSO) via an online API.The TSO defines the electricity consumption as follows: This use case was implemented with all of the pipeline components defined in Figure 7.However, the following changes were made to the previous MLOps source code:

•
In the TFT, the FCR_N price variable was replaced with electricity_consumption as the prediction variable.

•
In the GIT, a new feature branch was created.

•
A new experiment name was created for tracking the ML model parameters using MLflow.

•
A new Jenkins project as created, and MLOps pipelines were incorporated using the same Jenkinsfile.

•
Unit test cases are updated.

•
To explore the predicted values via web UI, as shown in Figure 14, the corresponding legend and variable names were changed.
Except for the test cases, most of the abovementioned changes were in the configuration files and, most importantly, the same MLOps pipeline and tools were reused on the same cloud platform.Figure 15 shows the one-day predicted vs. actual electricity consumption forecasts for Finland.

Case Study 2: National Electricity Consumption Forecast Adapted from the Price Forecast MLOps Pipeline
In the above use case, the MLOps pipeline was defined and implemented as shown in Figure 7, while in this use case, the same MLOps pipeline was adapted with minimal changes to forecast the Finnish national electricity consumption.The data were collected from the Finnish Transmission System Operator (TSO) via an online API.The TSO defines the electricity consumption as follows: This use case was implemented with all of the pipeline components defined in Figure 7.However, the following changes were made to the previous MLOps source code:

•
In the TFT, the FCR_N price variable was replaced with electricity_consumption as the prediction variable.

•
In the GIT, a new feature branch was created.

•
A new experiment name was created for tracking the ML model parameters using MLflow.

•
A new Jenkins project as created, and MLOps pipelines were incorporated using the same Jenkinsfile.

•
Unit test cases are updated.

•
To explore the predicted values via web UI, as shown in Figure 14, the corresponding legend and variable names were changed.
Except for the test cases, most of the abovementioned changes were in the configuration files and, most importantly, the same MLOps pipeline and tools were reused on the same cloud platform.Figure 15 shows the one-day predicted vs. actual electricity consumption forecasts for Finland.

Case Study 2: National Electricity Consumption Forecast Adapted from the Price Forecast MLOps Pipeline
In the above use case, the MLOps pipeline was defined and implemented as shown in Figure 7, while in this use case, the same MLOps pipeline was adapted with minimal changes to forecast the Finnish national electricity consumption.The data were collected from the Finnish Transmission System Operator (TSO) via an online API.The TSO defines the electricity consumption as follows: This use case was implemented with all of the pipeline components defined in Figure 7.However, the following changes were made to the previous MLOps source code: • In the TFT, the FCR_N price variable was replaced with electricity_consumption as the prediction variable.• In the GIT, a new feature branch was created.
• A new experiment name was created for tracking the ML model parameters using MLflow.
• A new Jenkins project as created, and MLOps pipelines were incorporated using the same Jenkinsfile.• Unit test cases are updated.
• To explore the predicted values via web UI, as shown in Figure 14, the corresponding legend and variable names were changed.
Except for the test cases, most of the abovementioned changes were in the configuration files and, most importantly, the same MLOps pipeline and tools were reused on the same cloud platform.Figure 15 shows the one-day predicted vs. actual electricity consumption forecasts for Finland.

Case Study 3: National Electricity Generation Forecast Adapted from the Price Forecast MLOps Pipeline
In the previous use case, electricity consumption was forecasted by adapting to t MLOps pipelines created for FCR-N market price forecasting.In this example, electric production was forecasted, and the MLOps pipeline changes made for the previous u case were used here.As in the electricity consumption forecasting, the data were collect from the Finnish TSO via an online API.
However, the following changes were made to the previous MLOps source code:

•
In the TFT, the electricity_consumption variable was replaced with electricity_prod tion as the prediction variable.

•
In the GIT, a new feature branch was created.

•
A new experiment name was created for tracking the ML model parameters usi MLflow.Creating a new experiment name is helpful in grouping the new expe ments within a project or use case.

•
A new Jenkins project was created, and MLOps pipelines were incorporated usi the Jenkinsfile used for the above use case.

•
Unit test cases were updated.

•
To explore the predicted values via web UI, as shown in Figure 14, correspondi legend and variable names were changed.
Figure 16 shows the one-day predicted vs. actual electricity production forecasts Finland.

Case Study 3: National Electricity Generation Forecast Adapted from the Price Forecast MLOps Pipeline
In the previous use case, electricity consumption was forecasted by adapting to the MLOps pipelines created for FCR-N market price forecasting.In this example, electricity production was forecasted, and the MLOps pipeline changes made for the previous use case were used here.As in the electricity consumption forecasting, the data were collected from the Finnish TSO via an online API.
However, the following changes were made to the previous MLOps source code: • In the TFT, the electricity_consumption variable was replaced with electricity_production as the prediction variable.• In the GIT, a new feature branch was created.
• A new experiment name was created for tracking the ML model parameters using MLflow.Creating a new experiment name is helpful in grouping the new experiments within a project or use case.• A new Jenkins project was created, and MLOps pipelines were incorporated using the Jenkinsfile used for the above use case.• Unit test cases were updated.
• To explore the predicted values via web UI, as shown in Figure 14, corresponding legend and variable names were changed.
Figure 16 shows the one-day predicted vs. actual electricity production forecasts for Finland.
Case Studies 2 and 3 were implemented with minimal work by adapting to the implementation of Case Study 1. Case Studies 1 and 2 both used the same MLOps pipeline structure and the same set of tools.This generalization is applicable while forecasting similar variables-for instance, wind power forecasting or photovoltaic power generation forecasting-with 1 h as the sampling interval.However, if the sampling interval changes-for example, to 15 min-considerable work must be performed in the MLOps pipeline, such as "Data ingestion" and "Prediction service".Case Studies 2 and 3 were implemented with minimal work by adapting to the implementation of Case Study 1. Case Studies 1 and 2 both used the same MLOps pipeline structure and the same set of tools.This generalization is applicable while forecasting similar variables-for instance, wind power forecasting or photovoltaic power generation forecasting-with 1 h as the sampling interval.However, if the sampling interval changes-for example, to 15 min-considerable work must be performed in the MLOps pipeline, such as "Data ingestion" and "Prediction service".

Conclusions
The main aim of MLOps is to introduce ML products to production by avoiding Development and Operations bottlenecks and automating the workflows.The MLOps system and workflow design need to be modular to accommodate such a system.Such a modular design cannot be generalized and must be specific to the application.This would ensure a system with reduced development, deployment, and monitoring issues.The endto-end life-cycle management of MLOps is easy.This paper presents a use case where we built modular pipelines for an ML time-series forecasting system.There are several ways to implement MLOps, but the principles of DevOps and modularity should be considered as the primary key factors.Even though it is hard to generalize the pipeline structure, in this paper we explain the things to consider when creating the MLOps design architecture.
The case study is generalizable with minor changes to other hourly day-ahead electricity markets.The main changes are related to identifying the relevant features.The case study is further generalizable to other electricity markets operating on a similar timescale.For example, the market interval in some countries is 30 min or 15 min instead of an hour.For a day-ahead market, the market interval will impact the size of the input layer as well as the output layer of the ANN.For electricity markets that are not day-ahead-for example, intraday markets-the bidding deadlines are different, which needs to be taken into account in scheduling the execution of the deployed containers.All of these generalizations require minor efforts in the form of manual work.A topic for further research would be the development of a generic MLOps solution for day-ahead or intraday electricity markets, which would further reduce this manual work.The current case studies are only integrated with the web application to explore the forecasted values.This MLOps work could be extended for Virtual Power Plants, which use the predicted data for managing the assets using the REST API, where concurrency and response time matter.Currently, the REST API implements a basic authentication for exposing the predicted values, and this basic authentication could be further improved.

Conclusions
The main aim of MLOps is to introduce ML products to production by avoiding Development and Operations bottlenecks and automating the workflows.The MLOps system and workflow design need to be modular to accommodate such a system.Such a modular design cannot be generalized and must be specific to the application.This would ensure a system with reduced development, deployment, and monitoring issues.The endto-end life-cycle management of MLOps is easy.This paper presents a use case where we built modular pipelines for an ML time-series forecasting system.There are several ways to implement MLOps, but the principles of DevOps and modularity should be considered as the primary key factors.Even though it is hard to generalize the pipeline structure, in this paper we explain the things to consider when creating the MLOps design architecture.
The case study is generalizable with minor changes to other hourly day-ahead electricity markets.The main changes are related to identifying the relevant features.The case study is further generalizable to other electricity markets operating on a similar timescale.For example, the market interval in some countries is 30 min or 15 min instead of an hour.For a day-ahead market, the market interval will impact the size of the input layer as well as the output layer of the ANN.For electricity markets that are not day-ahead-for example, intraday markets-the bidding deadlines are different, which needs to be taken into account in scheduling the execution of the deployed containers.All of these generalizations require minor efforts in the form of manual work.A topic for further research would be the development of a generic MLOps solution for day-ahead or intraday electricity markets, which would further reduce this manual work.The current case studies are only integrated with the web application to explore the forecasted values.This MLOps work could be extended for Virtual Power Plants, which use the predicted data for managing the assets using the REST API, where concurrency and response time matter.Currently, the REST API implements a basic authentication for exposing the predicted values, and this basic authentication could be further improved.

31 •
Model building;• Deployment and monitoring pipelines with a few sub-pipelines.

Figure 12 .
Figure 12.ML artifact storage using the MinIO object store.

Figure 12 .
Figure 12.ML artifact storage using the MinIO object store.

Figure 12 .
Figure 12.ML artifact storage using the MinIO object store.

Figure 14 .
Figure 14.Web UI for exploring the model prediction values.

Figure 14 .
Figure 14.Web UI for exploring the model prediction values.

Figure 14 .
Figure 14.Web UI for exploring the model prediction values.

•
The new codebase might break old features.• It is difficult to track what has changed.

Table 1 .
Summary of state-of-the-art MLOps articles.