Quality Management for AI-Generated Self-Adaptive Resource Controllers

Pahl, Claus; Barzegar, Hamid R.; El Ioini, Nabil

doi:10.3390/machines14010025

Open AccessArticle

Quality Management for AI-Generated Self-Adaptive Resource Controllers

by

Claus Pahl

^1,*

,

Hamid R. Barzegar

¹

and

Nabil El Ioini

²

¹

Faculty of Engineering, Free University of Bozen-Bolzano, 39100 Bolzano, Italy

²

School of Computer Science, University of Nottingham, Semenyih 43500, Malaysia

^*

Author to whom correspondence should be addressed.

Machines 2026, 14(1), 25; https://doi.org/10.3390/machines14010025

Submission received: 18 November 2025 / Revised: 22 December 2025 / Accepted: 22 December 2025 / Published: 24 December 2025

(This article belongs to the Special Issue Recent Developments in Machine Design, Automation and Robotics, Second Edition)

Download

Browse Figures

Versions Notes

Abstract

Many complex systems requires the use of controllers to allow an automated, self-adaptive management of components and resources. Controllers are software components that observe a system, analyse its quality, and recommend and enact decisions to maintain or improve quality. While controllers have been for many years, recently Artificial Intelligence (AI) techniques such as Machine Learning (ML) and specifically reinforcement learning (RL) are used to construct these controllers, causing uncertainties about the quality of them due to their construction. We investigate quality metrics for RL-constructed software-based controllers that allow for their continuous quality control, which is particularly motivated by increasing automation and also the usage of artificial intelligence and control theoretic solutions for controller construction and operation. We introduce self-adaptation and control principles and define a quality-oriented controller reference architecture for controllers for self-adaptive systems. This forms the basis for the central contribution, a quality analysis metrics framework for controllers themselves.

Keywords:

controller; DevOps; machine learning; reinforcement learning; responsible AI; metrics; quality management; self-adaptive system; generative AI; large language models (LLM)

1. Introduction

There is a growing need to automate the continuous operation and management of various systems. We often call these self-adaptive systems if they are built around a controller that manages the system and its resources. A controller generates actions to maintain such a system within expected quality ranges based on monitored system data as the input. Self-adaptive systems are widely used in environments where manual adjustment is neither feasible nor reliable. This ranges from industrial production machines or power management systems to fully softwareised environments such as cloud applications, which are all examples of virtualised or cyber-physical systems that are suited to be governed by a control-theoretic solution to continuously and automatically adjust the system [1,2,3,4]. In particular, resources used by a complex system can be dynamically adjusted and need to ensure dedicated quality management.

Today, we can observe that automation of control also extends to its construction. For instance, during development, AI techniques can be employed if a sufficient amount of data is available to generate the models that form the core functions of this self-adaption [5]. Thus, we focus here on the automated construction of controllers and their required quality properties, and not on the application systems that these controllers control.

Performance or robustness are the most frequent quality concerns that can be found for self-adaptive controller-managed systems, i.e., controllers monitor and analyse the target system in terms of the dynamic properties of the latter. Consequently, these quality concepts for the system would need to be reinterpreted for a controller as a software-based system. Our objective here is to conduct a wider review of quality metrics beyond these two, also including fairness, sustainability, and explainability, which are common concerns for AI techniques such as machine learning (ML) in general, as part of a responsible AI concern, but need a specific investigation for applications to the controller context here.

Machine learning (ML) is a good example of a construction mechanism to motivate the need for this investigation. In the ML model construction, accuracy, precision, and recall are generally used to assess the quality of prediction or classification models. However, the construction causes concerns related to the responsible AI concept that aims at the fairness, transparency, explainability, and ethical behaviour of these machine-generated controllers. The quality concerns of reinforcement learning (RL) as a widely used mechanism for controller construction have been investigated for specific concerns [6,7,8]. Two challenges arise: firstly, the construction outcome looks different for software controllers than for other typical ML applications, and secondly, the quality management of these controllers themselves needs to be automated, what is often referred to as DevOps automation in the software domain [9,10].

In this paper, we address a coherent system of broad principles and practices to design and manage controllers for self-adaptive systems. Specifically, we present a catalogue of classified controller quality metrics as the main contribution to controller construction. To frame this metrics catalogue, we introduce a reference architecture for controllers within a quality management framework. This is often considered as part of a continuous change process in a DevOps-style (or also referred to as AIOps if the software operations process is AI-controlled) that allows for continuous quality monitoring to mediate quality deficiencies. Since change is ubiquitous, we address change in a two-layered quality management framework covering the system and the controller layer.

Our method is based on (i) a review of survey papers as secondary contributions that we systematically selected to cover AI and controller perspectives and (ii) a further consideration of individual primary solution contributions for the specific aspects from machine learning to resource management to change detection in order to supplement the survey findings and define our framework, taking the state-of-the-art into account. A two-layered feedback loop has already been proposed as a natural consequence of mapping quality control into a continuous feedback loop to monitor quality [11,12]. We based this work on the two-layered loop [13] and used the review papers covering their respective focal concern to form a coherent conceptual framework here that provides a comprehensive set of metrics to allow this two-layered controller loop to be realised. While different controller construction mechanisms are covered, due to the popularity and relevance of reinforcement learning and control theory as established mechanisms, these are specifically singled out to provide insights into the potential implementation of the proposed results.

For automatically generated controllers, their automated continued quality management becomes an important concern. While metrics for qualities of systems that a controller manages have been explored, our investigation specifically focuses on AI-related properties of AI-constructed controllers. The novelty lies in the definition of common qualities for AI-generated controller software, rather than the quality metrics that the system under observation is meant to maintain that have been proposed so far. Furthermore, the AI qualities are combined into a comprehensive catalogue with specific definitions for controllers, which has also not been done.

The paper is structured as follows. In Section 2, we review related work. In Section 3, we present a reference architecture as the first contribution component that frames the outcome. Section 4 provides an overview of reinforcement learning and control theory as two prevalent techniques to give a more concrete context. Section 5 then defines a metrics catalogue for controllers. Section 6 discusses the findings and presents a use case. Finally, Section 7 concludes the paper with some future work indications.

2. Related Work

In this section, we discuss the controller quality perspective covering individual metrics but also general quality frameworks. The detection of quality change is a central task.

Traditional system-level quality metrics such as performance or robustness have been widely investigated as the goal of controllers. Performance is interpreted often as latency or the response-time behaviour of a system or robustness during the disturbances caused in the system execution and monitoring. In contrast, we focus here only on the quality metrics of the controller itself. Here, we consider specifically AI techniques for controller construction and will consequently propose a catalogue of AI-related metrics, which has not yet been presented in a comprehensive framework [14,15,16].

A range of individual quality metrics have been investigated, in particular for the construction of controllers employing reinforcement learning, which is a widely used technique.Reinforcement learning (RL) is a suitable approach to derive solutions for control systems. The work in [7] covers the link between RL performance and the notion of stability that stems from the control area. Ref. [8] is a good example of an RL application for a control problem that requires high degrees of performance. Robustness and performance are covered in [6] to cover recent deep reinforcement learning trends. Robustness is also investigated in [17]. The ability to deal with disturbances from the environment is often seen as an important property of control systems that act in environments with a lot of uncertainty.

However, beyond classical performance metrics, recently in the wider ML and AI context other concerns such as explainability or sustainability have been looked at. Attention has been given to these from the perspective of the environment and the users and/or subjects of a solution. Another concrete direction is the fairness of the solution. Ref. [18] looks at this in the context of Markov processes, which define the central probabilistic behaviour of control systems. While explainability has now been widely recognised for prediction and classification approaches, RL has received less attention. One example is [19], which reviews explainability concerns for RL. A survey of this aspect is provided by [20]. Explanations for self-adaptive systems are addressed in [21,22]. As a wider societal concern that also has a cost impact for users, sustainability through, for example, energy and resource consumption is also investigated [23].

In [24], RL is discussed in the context of a MAPE-K (Monitor, Adapt, Plan and Execute stages based on a shared Knowledge component) loop integration. Meta-learning refers to the need to relearn if adaptations are required to adjust to drift. A meta policy can deal with adaptation to new tasks and environmental conditions. Here, specifically, the cost of RL is considered, since meta-learning should work with small volumes of data and quick convergence.

If a set of metrics needs to be implemented, i.e., needs to be monitored, analysed, and converted into recommendations or remedial actions if quality concerns are detected, then a systematic engineering approach is needed that explains the architecture of the system in question and devises a process for quality management. Ref. [25] provides an overview of the AutoML domain, which is a notion to cover automated approaches to managing the AI model creation and quality management. Another term used in this context is AI engineering. For instance, Ref. [26] approaches such a generic framework from a software engineering perspective, aiming to define principles that define a systematic engineering approach. Similarly, Ref. [5] investigates common engineering practices and how they change in the presence of ML.

In [27], a three-layered reference architecture based on the MAPE-K pattern is presented that separates dynamic monitoring feedback (for managing context information for the monitoring and adaptation setting), adaptation feedback (to control the adaptive behaviour of the target system in terms of functional and non-functional properties), and objectives feedback (to control changes to the adaptation goals such as settling time or stability).

Model checking of controller models is the concern in [28]. The application system is managed in terms of throughput, resource usage, cost, and safety, as these systems are largely impacted by sensor failure or response time deficiencies. The controller is assessed in terms of liveness and safety properties using the UPPAAL model checker. The solution is embedded in a comprehensive process model for controller construction, where the verification of controller models, e.g., in terms of possible paths of the four MAPE processing steps, is most relevant to our perspective.

Our objective is to monitor quality and to react to changes and unexpected behaviour or uncertainties, which are pointed out in [29] as important concerns. The distinction between drift and anomaly as central forms of unexpected behaviour—and also other concepts such as rare events or novelty detection—is often not clear, as Ref. [30] shows with a review of connected concepts. We will provide formal definitions later after motivating the metrics selection. There we also refer in more detail to concrete techniques. Thus, we introduce only some selected review papers here. Several drift detectors have been proposed, as surveyed by Refs. [31,32]. Examples of drift analysis are found in [33,34]. For instance, Ref. [33] states that different diversity levels in an ensemble of learning machines are required to attain better accuracy, i.e., different classifiers are used. Another example is [34], which uses Spiking Neural Networks for drift detection combined with a notion of evolution to detect changes. Equally, for anomaly detection, the number of proposed solutions is large, as the review in [35] shows. We will later introduce anomaly detection techniques specific to our context of continuous sensor data in a technical systems setting.

Another layered architecture is presented in [36], where the lower layer as usual deals with the managed system adaptation. The upper layer, called the lifelong learning loop, focuses on changes in ML-generated controllers. In particular, concept drift is addressed here as a cause of the needed adaption of the controller to changing requirements. The solution is knowledge management, e.g., encoding unlabelled data and detecting new tasks to deal with changes.

In summary, we can note that many controller quality frameworks exist, but that a comprehensive analysis of AI properties has not been carried out. Reinforcement learning is the prevalent ML technique used—which is the reason why we focussed our investigation of this construction technique. However, the reviewed surveys and other papers show that only aspects such as robustness or performance are addressed for controller constructions. Explainability or fairness are not well covered for RL-approaches in general and are only investigated for more general AI-constructed software contexts. The quality of RL-constructed controllers is specifically a research gap. These latter quality concerns in particular need a formalisation, which we will provide later on in Section 5 after introducing a reference architecture and RL basic to frame these concerns in Section 3 and Section 4.

3. Reference Architecture for Controller Quality

Our focus is self-adaptive systems, specifically the construction and architecture of controllers that deal with quality problems and adapt the resources used by the system.

3.1. State-of-the-Art and Motivation of Framework

The review of the state-of-the-art results in two concerns that have not been sufficiently addressed:

Firstly, relevant quality metrics are performance, robustness, fairness, explainability and sustainability, but these need to be adjusted to AI-constructed controllers;
Secondly, a systematic quality management framework is needed to embed these metrics into a coherent framework.

A respective framework consisting of metrics and a quality management process is needed to provide a systematic, quality-driven controller construction and management approach. The presented work provides core principles and practices in this context. The need to automate quality management of control is exacerbated by the fact that this context is about environments where a manual adjustment is neither feasible nor reliable.

Therefore, we need to identify and define quality criteria for the automation of system adaptation where AI is used to construct controllers.

3.2. Controller Quality Requirements

Run-time monitoring of critical systems is always important. Due to the utilisation of ML to construct software, there is no direct control by expert development engineers, and thus quality needs to be managed differently. This requires, for instance, explainability of the ML models to understand quality implications. Machine learning models are normally evaluated for their effectiveness, generally in terms of metrics such as accuracy, precision, and recall that capture the performance. Two challenges emerge for controllers and their quality:

The usual ML performance measures of accuracy, precision, and recall do not naturally apply for all construction methods, requiring a different notion of performance and also the need to take uncertainty and disturbances in the form of robustness in the environment into account;
To better judge the quality of controllers themselves, other concerns such as explainability, but also fairness or sustainability, are important.

What is effectively needed is an engineering approach for self-adaptation controllers, which is part of what is often called AI Engineering [26]. We investigate this specifically for controller construction for self-adaptive software systems.

3.3. Resource Management and Motivational Example

We now return to ML and prediction and classification model accuracy examples and translate this to reinforcement learning and control theory and relevant quality concerns. A motivational use case is controllers for resource management [37,38,39,40,41] that can be applied in virtualised and cyber-physical environments, e.g., cloud or IoT settings. Hong et al. [42] review this for resource management. System adaption is required for the resource configuration, including the monitoring of resource utilisation and respective application performance and the application of generated rules for resource management to meet quality requirements [12,43,44]. The rules adjust the resource configuration (e.g., size) to improve performance and other qualities. A prevalent example is RL, which employs a reward principle applicable in the self-adaptation loop to reward improvements and penalise deteriorations.

The problem shall be described using a concrete example:

The problem could be a resource controller for cloud adaptation. This could employ the following rule: if Workload > 80% then double (Resource-Size). The concrete problem is whether this rule is optimal and whether the recommendation space considered by the controller is complete.
The remediation can be an RL-generated controller that provides an actionable recommendation for a 60% workload as a verified test case. Then, this recommendation can be scaled up to 80%.
-
The model could reward a good utilisation rate (e.g., 60–80%);
-
The model could penalise costly resource consumption, e.g., high monetary costs for cloud resources or energy consumption.

Performance in this case would be determined the accuracy of how well the controllers maintain a 60–80% workload, or robustness would be determined by to what extent noise in the training data affects the controllers’ behaviour.

In particular, AI-driven controller generation requires proper quality monitoring and analysis of a defined set of comprehensive quality criteria. Furthermore, detected quality deficiencies need to be analysed as part of a root cause analysis. From this, suitable remedies need to be recommended and enacted.

3.4. Reference Architecture for Quality Management

The quality management of controller models is the challenge here. The accuracy or effectiveness against the real world could mean that a predictor predicts accurately or that an adaptor manages to effectively improve the system’s performance. However, as the above discussion shows, more than the traditional performance and cost improvement needs to be addressed. While discrimination is not a direct issue for software, a notion of technical fairness and aspects of accountability and explainability need to be dealt with.

Self-adaptive systems and the decision models at the core of respective controllers are suitable for AI-based creation due to the availability of monitoring and actuation data for training. The controller model aims to enact an action, e.g., to adapt resources in the edge, divert traffic in IoT settings, or instruct the machines that are controlled. This implements a dynamic control loop, governed by predefined quality goals [10]. These self-adaptive systems governed by a controller implement a feedback loop, see Figure 1, following the MAPE-K architecture pattern. MAPE-K control loops provide a template for self-managing systems [45]. The managed element is often part of a cyber-physical environment. The MAPE-K loop consists of four phases in the form of a continuous loop:

Monitor: The environment with the managed element is continuously monitored. The observed data can cover various data formats about the physical and digital environment. Usually, some form of aggregation or filtering happens.
Analyse: The current system situation based on the monitored observations is analysed. The reasoning can derive higher-level information that might result in a change or management request.
Plan: Based on a change request, a plan with corresponding management actions is created. The aim is actions that allow the managed element to fulfil its goals and objectives again if concerns have been identified. This can include basic decisions such as an undo or redo, but also complex strategies including structural modification and migration of the underlying system.
Execute: A change plan would be executed by an actuator in the operational environment of the managed element.
Knowledge as a key part of the control loop is shared between all phases. Relevant knowledge for these autonomic systems can range from topology information to historical logs to metrics definitions to management policies. This knowledge, or model, can also be continuously updated.

Figure 1. Base system architecture—a controller loop based on the MAPE-K pattern with (M)onitor, (A)dapt, (P)lan, and (E)xecute stages based on a shared (K)nowledge component.

We propose a two-layered MAPE-K-based controller reference architecture, where the knowledge component K (the ‘model’) is the item of concern for the second layer, together with the full MAPE-K actions. The system is the item of concern for the first layer. The reference architecture provides a meta-learning layer that monitors several quality metrics, but also validates the metrics and their respective success criteria in a second loop—see Figure 2. The lower layer (Layer 1) is a controller for self-adaptive systems. The upper layer (Layer 2) is an intelligent quality monitoring and analysis framework aligned with the requirements of a (generated) controller for self-adaptive systems. Figure 2 builds on the MAPE-K loop in these two layers. The upper loop is the focus of this paper, but needs to take on board the lower layer’s behaviour.

We define controllers as a state–transition system of states S, actions A, a transition function

T : S \times A \to S

and a reward function

R : S \times A \to R

that allows us to measure the quality of the chosen action in a given state—see Algorithm 1.

Algorithm 1 Controller structure—definition

States

S = {1, \dots, n_{s}}

Actions

A = {1, \dots, n_{a}}, A : S \Rightarrow A

Probabilistic transition function

T : S \times A \to S

Reward function

R : S \times A \to R

3.5. Controller Quality Management at the Upper Layer

An important question concerns the automation of the upper loop. If full automation is the ultimate objective, this creates some challenges regarding metrics and their measurability and should thus guide the solution. These challenges are as follows: automate model construction, automate the model use and evaluation, and recommend a model learning configuration adjustment. The upper meta-learning loop design follows the MAPE-K pattern:

Monitor: We need reward functions for model quality, i.e., adaptor quality based on a range of metrics that are linked to the application system and its aspects—from data quality in the input (requiring robustness) [46,47] to sustainability (requiring lower cost and environment damage) as two examples.
Analyse: We need to implement an analysis consisting of problem identification and root cause analysis for model quality problems and feeding into explainability aspects through a (partial) dependency determination to identify which of the system factors would improve the targeted quality the most.
Plan/Execute: The solution would need to recommend and if possible also enact these recommendations. For example, a rule update for the cloud adaptor could be recommended, with model recreation being done. This could in very concrete terms be a readjustment of training data size or ratio. Also, remedial actions for an identified root cause can be considered.

This upper loop would implement a meta-learning process, which is a learning process to adapt the controller through a continuous testing and experimentation process. We call this the knowledge-learning layer.

In order to develop a solution, we follow the Engineering Research guidelines proposed by the ACM SIGSOFT Empirical Standards committee as the methodological framework [48]. In [10], a systematic construction process based on the identification of goals and actuation needs, resulting in suitable models and the final controller, is discussed. To frame the controller design, we use the MAPE-K architecture pattern for system adaption.

4. Reinforcement Learning Principles

Reinforcement learning is the prevalent technique for advanced controller construction reported in the literature. In this section, we introduce the relevant RL background as the technology to which our quality metrics framework for the specific self-adaptive-systems context shall be applied.

4.1. Reinforcement Learning and Control

We assume the environment of these control systems to be non-deterministic or probabilistic, i.e., a controller would need, for instance, a probabilistic policy based on Markov decision processes to control the states of a system [49].

If reinforcement learning (RL) is used, generally, a model-free variant of RL is used for this, i.e., no prior model is used and the training works on previously collected data. Models help to be sample-efficient but do not always exist, as in our case. Some RL techniques that are typically used are SARSA or Q-learning [50]. RL has the notion of a value function at the core to assess a given state and proposed action and enact a policy. This assessment is expressed in terms of a reward. Finding a new value function can be achieved in a number of ways. The Monte Carlo method is a trial-and-error approach that works for systems with a definite end (e.g., games), but less so for non-terminating control systems such as ours. Here, generally, all states are considered equal. Temporal distance (TD) changes this by considering the rewards of recent activities more. States in a farther distance are thus reward-discounted. Q-learning and SARSA are currently the most widely used temporal distance approaches that combine policy and value function calculation into a single quality function. RL is a form of semi-supervised learning. In supervised learning, labels (i.e., whether a decision or action was right or wrong) are available at all states. In semi-supervised systems, there is typically a (sparse) reward of actions much later.

RL is often applied at the intersection of AI and control theory, as the overviews in [50,51] show. We briefly point out differences. The AI perspective focuses on performance and related qualities of the generated models, and control focuses on the stability of the system, which largely means that bounded inputs should result in bounded outputs, which is the key property of the system and results in a stabilising system from an initially unstable one. Performance is measured in terms of rewards. However, a joint perspective can use rewards to represent stability requirements (as in MPC model predictive control approaches) [7,49]. Rewards are assumed to be bounded, but unstable systems could be governed by arbitrarily negative (or unbounded) rewards or penalties.

4.2. Reward and Quality

The RL objective is to maximise the reward that is calculated for each state of the system. An important assessment factor in this process is the value of being in a state s. For this, the expected future reward of a policy is evaluated using a value function. Positive and negative assessments can be used: rewards, e.g., for achieving performance objectives, and penalties, e.g., for high costs or consumption. The quality of the approach is then measured, typically by the optimality of the model and time of convergence.

The policy is adjusted to improve performance. Policy optimisation is based on a mix of exploitation and exploration, i.e., mixing the exploitation of previous knowledge and random exploration. In contexts such as IoT or edge computing, in addition to classical performance, robustness against disturbances in the environment is also of importance. We have argued that fairness is also important and can actually be seen as contributing to the overall performance. We will provide a respective definition below that clarifies this. As indicated, sustainability and explainability have more of an impact on the context of the system in question, but can of course also be rewarded or penalised if automated observation and assessment is possible (as for energy consumption as a sustainability criterion).

This multitude of metrics also distinguishes an RL setting from the typical control-theoretic focus on stability.

4.3. SARSA and Q-Learning

The two most widely used RL algorithms are Q-learning and SARSA. Both SARSA and Q-learning use Temporal Difference (TD) prediction methods for the control problem. Both learn an action-value function by estimating a quality function

Q (s, a)

for a state s and an action a. The Q-value or quality function is updated after every time step. SARSA and Q-learning both use the epsilon-greedy policy, i.e., choosing between exploration and exploitation randomly. We explain the two in more detail, focusing on the differences now.

Q-learning is a so-called off-policy TD algorithm to find an optimal policy by updating the state–action value function (Q) for every step using the Bellman optimality equation until the Q-function converges to the optimal Q-function

Q : X \times A \to R

. The core definitions for Q-learning and SARSA are given in Algorithm 2. For Q-learning, the quality function

Q : S \times A \to R

is defined by Algorithm 3.

SARSA stands for State, Action, Reward, (Next)State, (Next)Action. SARSA is an on-policy TD algorithm that aims to find the optimal policy by updating the state–action value function Q at every state using the Bellman equation. SARSA learns by experiencing the environment and updating the state–action value at every time step. There is only one difference to Q-learning in the calculation of Q, where the maximisation is not applied to

Q (s^{'}, a^{'})

, see Algorithm 4.

Algorithm 2 Core definitions for Q-learning and SARSA
States $S = {1, \dots, n_{s}}$
Actions $A = {1, \dots, n_{a}}, A : S \Rightarrow A$
Reward function $R : S \times A \to R$
Probabilistic transition function $T : S \times A \to S$
Learning rate $α \in [0, 1]$	▹ typically $α = 0.1$
Discount factor $γ \in [0, 1]$

Algorithm 3 Q-learning
procedure Q-Learning( $S$ , A, R, T, $α$ , $γ$ )
Initialise $Q : S \times A \to R$ arbitrarily
while Q has not converged do
Start in state $s \in S$
while s is not terminal do
Calculate $π$ according to Q and the chosen exploration strategy
$a \leftarrow π (s)$
$r \leftarrow R (s, a)$	▹ Receive the reward
$s^{'} \leftarrow T (s, a)$	▹ Receive the new state
$Q (s^{'}, a) \leftarrow$ $(1 - α) \cdot Q (s, a) + α \cdot (r + γ \cdot {max}_{a^{'}} Q (s^{'}, a^{'}))$	▹ Q-learning value function
$s \leftarrow s^{'}$
end while
end while
return Q
end procedure

Algorithm 4 SARSA—difference to Q-learning
$Q (s^{'}, a) \leftarrow (1 - α) \cdot Q (s, a) + α \cdot (r + γ \cdot Q (s^{'}, a^{'}))$	▹ SARSA value function

5. Controller Quality Metrics

The aim is to automatically monitor and analyse the quality of the system controller for the lower layer, as far as possible, using a defined set of metrics. This metrics set takes the controller construction into account. Our selection and definition rationale below is given for an RL-constructed controller type. In this section, we introduce our metrics framework, starting with a conceptual outline, before defining each of the five selected metrics in more detail.

5.1. Method

Two activities are needed for this metrics framework: the selection of suitable metric categories and the precise definition of the selected metrics.

For the first step, we systematically surveyed review articles covering common bibliographic databases such as Scopus, Google Scholar, and IEEE Xplore. We used suitable search terms both for AI-generated software in general (AI, ML, and RL as acronyms and in their extended term variant) and specific controller-oriented ones (controller, self-adaptation, DevOps, AIOps, and also quality) to identify publications with relevant metrics. The surveys included here were selected based on publication quality (considering only peer-reviewed conferences and journals) and relevance (assessing the extent of recent AI technology coverage).

For the second step, again the review papers served as a baseline. Here, however, metric definitions were either selected among options or were adapted to meet the needs of controllers as a specific type of AI-generated software model, in contrast to a wider coverage in the first step.

5.2. Metrics—Selection and Catalogue

In Figure 3, the selected quality metrics are indicated. Performance and fairness directly affect the system quality. Robustness is a guard against external uncertainties and influences. Sustainability has an effect on the environment. Explainability allows understandability, e.g., to provide a governance assurance on the actual controller function.

We can classify the metrics based on whether they relate to the control task at hand or affect the resources that this task consumes. For the task, we also indicate whether the control function is directly affected (core), it could be influenced by its direct context (environment), it could skew the results by favouring certain outcomes (bias), it could be related to the responsibility for the task (governance), or it could have an impact on resources (such as energy as one selected concern). The properties are those of a controller, not the application system.

Performance addresses the core quality management function of a controller;
Robustness relates to the input of monitoring the environment and subsequent MAPE phases;
Explainability concerns the governance of the control and quality framework in order to allow for trustworthy behaviour;
Fairness refers to the avoidance of bias in controller decisions and the effect on controller or system performance;
Sustainability relates to the effect of controller actions on the consumption of resources in the environment in terms of cost and energy.

Both positive and negative measurements can be valued, e.g., by rewarding or penalising them, respectively. We measure concerns that directly influence the task at hand, i.e., how well the solution can perform its job. In a second category of quality targets, the environment is addressed. This includes resources and their consumption, e.g., in terms of energy consumption, but also the human or organisation in charge of the system in a governance concern, e.g., in terms of explainability. We also add an impact direction, i.e., whether the concern is internal, influenced by external forces (inwards through disturbances), or influences external aspects (outwards on parts of the environment). This is indicated in Table 1.

We add another metric at a meta-level that deals with the encompassing concern of responsible AI in order to adjust and tailor the above foundational metrics.

Responsible AI serves as a meta-level aspect that allows us to prioritise and balance the direct quality metrics (performance, robustness, explainability, fairness, sustainability).

All of these metrics will be individually motivated and defined in the next subsections in the following format covering a number of perspectives:

Justification: why the metric is needed;
Assumptions: contextual assumptions;
Rationale: motivation of the definition;
Definition: defined metric;
Example: illustration of metric.

5.3. Performance

Justification: Generally, the performance of a controller refers to the ability to effectively achieve (and optimise) a technical objective. This could be expressed in terms of a function on system performance and cost, e.g.,

F (cost, performance)

, with

\min (cost) while limit (responseTime)

as a sample goal. While this works for many monitored application systems, we need here a definition that in more concrete terms takes the objectives and the construction of the controller into account.

Assumptions: We assume that the quality of a controller action in a given state can be quantified through a reward function R. The performance for controllers is the overall achievement of the model towards an optimal reward. We look at reinforcement learning as the prevalent construction technique.

Rationale: The reward is the central guiding performance measure of RL-based controllers, as discussed in the previous section. Reward optimisation is built into approaches like SARSA or Q-learning to optimise the reward [6,50]. The performance of a reinforcement learning algorithm can be determined by defining the cumulative reward as a function of the number of learning steps. Better rewards are better performance. Thus, this concept is the basis of the performance definition here.

Definition: Different performances emerge depending on the chosen learning rate

α

for the Q-function (which influences to what extent newly acquired information overrides old information). Three different parameters are important for the performance evaluation in a system where the reward varies over time—see Figure 4 for an illustration that shows an eventual decrease and increase while the controller is learning. This time-varying performance can be analysed with three parameters:

Convergence: the asymptotic slope of a performance graph indicates the quality of the policy after stabilisation of the RL algorithm at the end;
Initial Loss: the lowest point of the curve (minimum) indicates how much reward is often sacrificed before the performance begins to improve;
Return on Investment: the zero crossing after the initial loss gives an indication of recovery time, i.e., of how long it takes to recover from initial, often unavoidable learning costs.

Figure 4. Sample RL performance in terms of reward over the system steps [adopted from https://artint.info/2e/html/ArtInt2e.Ch12.S6.html (accessed on 1 December 2025)].

Note that the second and third cases only apply if there are positive and negative rewards. Also, note that the cumulative reward is a measure of the total reward, but algorithms such as Q-learning or SARSA use discounted rewards, modelled using the discount factor

γ

. A flattened graph would indicate that the learning process has finished with a defined policy.

A slightly different perspective emerges for control theory-based controllers. Control theory aims for stability of the systems, i.e., the signal (output action of a controller) should be bounded and minimise fluctuations of the technique system objective, defined in terms of SLA metrics. Notions of convergence, loss and return on investment are applicable to assess the achievement of stability. A reward function could consider these parameters.

Another possibility to define performance is, instead of referring to accumulated rewards, measuring the average reward. This would be a measure of the quality of the learned policy.

Example: The metric is connected to the performance of the learning process. A positive impact on the convergence and loss criteria can be generated by using expert models in the training. For instance, a fuzzy logic rule system has been successfully used for this purpose [52].

5.4. Robustness

Justification: Robustness is the ability to accept, i.e., deal with a wide range of monitored input cases. This includes for instance uncertainties, noise, non-deterministic behaviour, and other disturbances. These are typical for cyber-physical and distributed systems where sensors, connections, or computation can fail in different locations. Robustness concerns arise in non-deterministic behaviour situations and needs repeated experiments in the evaluation. When controllers are constructed with disturbances present, robustness against these is required.

Assumptions: We use the term disturbances to capture the multitude of external factors [17]. Disturbances can be classified into three possible contexts: observations, actions, and dynamics of the environment that the controller interacts with, thus giving the following disturbance classes:

Observation disturbances happen when the observers (e.g., sensors) used cannot detect the exact state of the system;
Action disturbances happen when the actuation of the system ultimately is not the same as the one specified by the control output, thus causing a difference between actual and expected action;
External dynamics disturbances are disturbances that are applied directly to the system. These are environmental factors or external forces.

Rationale: Disturbances can be classified into different behavioural patterns for the observed system quality over time—see Figure 5. The patterns are important, as the evaluation of a controller’s robustness is often carried out using simulations based on disturbances being injected into the system following these patterns. Figure 5 shows the five patterns in two broader categories—three non-periodic and two periodic ones—following the classification in [17].

Definition: The patterns are important to identify disturbance types in a robustness metric. Thus, we define them first.

Non-periodic disturbance patterns are the following:

White Noise Disturbances: These indicate the natural stochastic noise that an agent encounters in real world situations. Noise can be applied, ranging from zero with increasing values of standard deviation.
Step Disturbances: These model a system change with one sudden and sustained change. The magnitude of the step can vary.
Impulse Disturbances: These show system behaviour with a sudden, but only very short temporary change. The impulse magnitude can vary, as above.

Periodic disturbance patterns are the following:

Saw Wave Disturbances: these are cyclic waves that increase linearly to a given magnitude and instantaneously drop back to a starting point in a repeated way. Thus, this pattern combines characteristics of the step and impulse disturbances but is here in contrast applied periodically.
Triangle Wave Disturbances: These are also cyclic waves that, as above, repeatedly increase linearly to a given magnitude and decrease at the same rate to a starting point (and not suddenly as above). This is very similar to the saw wave, but exhibits a more sine-wave-like behaviour.

We can quantify robustness by relating the performance metrics (as above in the ‘Performance’ section) as a ratio between an ideal and a disturbed setting:

\frac{{Performance}_{Disturbed}}{{Performance}_{Ideal}}

This can be calculated for all disturbance patterns. The closer the ratio is to 1, the more robust the controller is.

Example: White noise is often caused in distributed environments such as edge clouds due to outages and anomalies in the system operation and monitoring. Here, the training needs to consider these disturbances. Another example is extended, hardly noticeable slopes that might be caused by drift in the application.

5.5. Explainability

Justification: Explainability is important, in general, for AI, in order to improve the trustworthiness of the solutions. For technical settings such as controllers, explainability is beneficial since it could aid a root cause analysis for quality deficiencies. The explainability of the controller actions is a critical factor in mapping observed model deficiencies back to system-level properties via the monitored input data to the model creation.

Explainability is a meta-quality aiding to improve the controller quality assessment. Since how and why ML algorithms create their models is generally not always obvious, a notion of explainability can help to understand deficiencies in the other four criteria and remedy them.

Assumptions: Explainability differs depending on the construction mechanism, being typically more challenging for ML-constructed controllers than for fuzzy or probability-based ones where the construction is manual on an explicitly defined function or theory. For example, if fuzzy logic is applied, the deciding function is the membership function. Explainability for RL is less mature than for other ML approaches.

Rationale: While explainability is a very broad concern, we introduce here definitions and taxonomies that are relevant for a technical controller setting and allow us to define basic metrics based on observations that can be obtained in the controller construction and deployment. The metrics defined can be used to automatically create indicators or recommendations. Here, however, a full automation of a reconfiguration or other correction would be difficult.

Definition: A number of taxonomies for ML explanations have been proposed in recent years. We focus here on [20] to propose a classification of explainability into three types. Note that we extend these three classes by concrete measurements for our setting.

Feature importance (FI) explanations: These identify features (or metrics in our case) that have an effect on an action a proposed by a controller for an input state s. FI explanations provide an action-level perspective of the controller. For each action, the immediate situation causing an action selection is considered.
Assume system metrics $m_{1}, \dots, m_{k}$ . Then $m_{i} (i : 1 . . k)$ has an effect if $Δ_{m_{i}} (Q (s, a)) > Δ_{m_{j}} (Q (s, a))$ for all $j \neq i$ , meaning that the quality difference $Δ_{m_{i}} (Q (s, a))$ between two subsequent states from state s with action a is the greatest for metric (or feature) $m_{i}$ .
Learning process and MDP (LPM) explanations: These show past experiences or the components of the Markov decision process (MDP) that have led to the current controller behaviour. LPM explanations provide information about the effects of the training or the MDP, e.g., how the controller handles the rewards.
The effect can be measured in terms of the quality function for two controllers (or underlying MDP) instances $Q^{1}$ and $Q^{2}$ with fixed data for training and testing, i.e., $Q^{1}$ is better than $Q^{2}$ , if $Q^{1} (s, a) > Q^{2} (s, a)$ for selected system states s and actions a.
Policy-level (PL) explanations: These show the long-term behaviour of the controller as caused by its policy. They are used to evaluate the overall competency of the controller to achieve its objectives.
The PL explainability definition follows LPM measurement ‘is better than’, but here consider variations in data sets, even in terms of features or metrics used.

Other explainability taxonomies also exist. For instance, [19] distinguishes two types in terms of their temporal scope:

Reactive explanations focus on the immediate moment, i.e., only consider a short time horizon and momentary information, such as direct state changes of the controller, as in FI;
Proactive explanations focus on longer-term consequences, thus considering information about an anticipated future and changes, such as those that LPM and PL represent.

A reactive explanation provides an answer to the question “what happened?”. A proactive explanation answers the question “why has something happened?”. These can then be further classified in terms of how an explanation was created. Beyond the simple measurements that can serve as indicators of the explanation, more complex explanatory models could be constructed. Reactive explanations could be policy simplifications, reward decompositions, or feature contributions and visual methods. Proactive explanations could be structural causal models, explanations in terms of consequences, hierarchical policies, and relational reinforcement learning solutions. More research is needed here.

Example: In a concrete situation, it might need to be determined of some lower layer metric (such as cost over performance) or some strategy (for instance for storage management) is favoured. A remediation could use explicit weightings to level out disbalances.

5.6. Fairness

Justification: Specifically where humans are involved, the fairness of decisions becomes crucial, and any bias towards or against specific groups needs to be identified. This concern can also be transferred from the human viewpoint to the technical domain, creating a notion of technical fairness that avoids preferences that could be given to specific technical settings without a reason. An example of an unfair controller would be a bias toward a specific management strategy. For instance, historically an overprovisioning strategy could have been followed, without that having been objectively justified.

Assumptions: We assume that, as for humans, past technical behaviour and decisions can be detrimental.

Rationale: The definition we propose there was originally referring to states and the quality of actions in these for RL, but can be adapted to all controllers that follow our definition, as there is always a state in which an action is executed, which in turn can be evaluated. This ensures that the long-term reward of a chosen action a is greater than that of

a^{'}

, and there is no bias that would lead to a selection not guided by optimal performance. Fair controllers must implement the distribution of actions, with a somewhat heavier weight put on better-performing actions judged in terms of (possibly discounted) long-term reward. Actions cannot be suggested without having a positive effect on the objective performance of the system as defined above.

Definition: Fairness can be defined in a precise way. We follow the definition given by [18] if a quality function Q or reward function R is defined:

A policy is fair if in a given state s a controller does not choose a possible action a with probability higher than another action $a^{'}$ unless its quality is better, i.e., $Q (s, a) > Q (s, a^{'})$ .

The above definition is often referred to as exact fairness, i.e., quality measured as the potential long-term discounted reward. Possible alternatives shall also be introduced. Approximate-choice fairness requires a controller to never choose a worse action with a probability substantially higher than that of a better action. Approximate-action fairness requires a controller to never favour an action of substantially lower quality than that of a better action.

A number of fairness remedies that are known in the ML domain include the improvement of data labelling through so-called protected attributes. An example is the automation of critical situation assessment, where technical bias could treat specific failures or failure management strategies differently. Here, for instance, a high risk of failure based on past experience might be considered, which could have a probability of discrimination based on certain events that have occurred and could be biassed against or towards these, be that through pre-processing (before ML training) or in-processing (while training). The challenges are to find bias and remove this bias through a control loop, e.g., using favourable data labels as protected attributes to manage fairness. Examples in the edge controller setting are if smaller or bigger device clusters could be favoured wrongly or if specific types of recommended topologies or recommended configuration sizes (messages, storage, etc.) exist.

Example: Often, underprovisioning in the form of a static resource allocation happens in manual systems due to unawareness of scalability options or the aim for simplicity. In case of high demand, this would slow down the system and possibly disadvantage users. Here, favourable labels reflecting user impact could be identified.

5.7. Sustainability

Justification: General economic and ecological sustainability goals are important societal concerns that also should find their application in computing, here specifically in terms of cost- and energy-efficiency of the model creation and model-based decision processes within a controller.

Assumptions: Sustainability is often used synonymously with environmentally sustainable, e.g., in terms of lower carbon emissions [23]. Thus, we focus on environmental sustainability here.

Rationale: While different measures can be proposed in this environmental sustainability context, we choose energy consumption here as the core of our sustainability definition because it can often be determined in computing environments.

Definition: Energy-efficiency can be defined through widely accepted and measurable metrics. We present two such metrics with examples of common tools to measure them:

Energy consumption in KJoule per task ( $K J / t a s k$ ), which can be determined using monitoring tools such as the NVIDIA System Management Interface SMI (NVIDIA System Management Interface (NVIDIA-SMI) is a command line utility for the management and monitoring of NVIDIA GPU devices—https://developer.nvidia.com/nvidia-system-management-interface (accessed on 1 December 2025)) in the case of respective GPUs;
CPU/GPU usage in percent %, which can also be determined using NVIDIA-SMI or similar tools.

These metrics are often put into comparison with performance metrics to assess performance in terms of costs. Similarly to the robustness case, a ratio allows us to indicate a possible trade-off between performance and sustainability to define sustainability. We can relate the performance to the cost or resource consumption it causes:

\frac{Performance}{Resource Consumption}

This can be applied to various resource or cost types. Sustainability focusing on the consumption of resources is often considered and measured through penalties in the value or quality calculation.

Example: Overprovisioning is an effective resource allocation strategy from a system performance perspective but generally causes the highest energy consumption overhead. Explicit penalisation of sustainability defects needs to be introduced.

5.8. Responsible AI (Meta-Level)

Responsible AI serves as a meta-level concern to prioritise and balance the five core quality metrics of performance, robustness, explainability, fairness, and sustainability that were defined above.

Justification: Responsible AI acts as a guiding principle, ensuring that the implementation and configuration of AI-based controllers align with societal, ethical, and operational goals.

Assumptions: Responsible AI metrics are derived as aggregate quality measures influencing controller priorities. Instead of direct measurements, these metrics act as evaluative tools guiding trade-offs among competing quality objectives. They reflect the risks and impacts of quality failures at a systemic level.

Rationale: By framing Responsible AI as a meta-level concern, it integrates seamlessly with the PREFS-R framework. Metrics like “Risk of Quality Failure” and “Impact Assessment” serve as overarching tools to evaluate the likelihood, severity, and consequences of deviations from expected behaviours across foundational metrics (e.g., performance, robustness).

Definition: Our analysis prioritises ethical considerations in Responsible AI, focusing on its role as a meta-level framework that evaluates and balances the foundational core quality metrics (fairness, explainability, sustainability, robustness, and performance). Responsible AI addresses trade-offs and contextual priorities by introducing two overarching meta-metrics:

Risk of Quality Failure (Meta-Risk Analysis): Measures the likelihood and severity of performance or robustness falling below critical thresholds of each identified risk i.

$Risk of Quality Failure = \sum ({Likelihood}_{i} \times {Severity}_{i})$

(1)

-
Likelihood: Based on statistical evaluation of operational data, considering external disturbances, internal errors, or drift;
-
Severity: Defined by the impact on system reliability, safety, or user trust.
Ethical Damage/Cost Score (Impact Evaluation): Quantifies potential damage across societal, environmental, and technical dimensions using weighted criteria.

$Impact Score = \sum (Potential {Damage}_{i} \times {Weight}_{i})$

(2)

-
Damage refers to measurable consequences, such as resource wastage, and privacy breaches;
-
Weights prioritise different concerns based on the context of the application (e.g., sustainability vs. robustness).

Example: In traffic systems, performance and robustness often dominate, but in medical contexts with humans directly involved as the controlled subjects, fairness and explainability play a more significant role. Sustainability might be more dominating in resource-constrained environments.

6. Discussion of the PREFS-R Framework

We have introduced six types of quality metrics for adaptive-systems controllers, with five core metrics and one meta-level metric. This aims to motivate this specific catalogue of metrics as being relevant for the chosen architectural controller settings. We summarise key concepts, as well as soundness and completeness, present a cloud controller use case, list wider theoretical and practical implications, and finally, discuss Generative AI as another AI technique that will receive more attention in the future.

6.1. PREFS-R Summary

We summarise our main metrics definitions in Table 2. We refer to this as the PREFS-R framework of metrics. This framework takes on board the quality concerns outlined in [10], where the notion of setpoints as goal representations and their change is used to refer to stability or disturbance rejection as a robustness concern. Performance in terms of loss and convergence is also covered there. Thus, PREFS-R integrates already accepted quality concerns.

The PREFS-R framework can be considered sound and complete. All five core metrics are valid because they are all valid criteria for AI-generated models that are widely accepted in the literature. We have referred to suitable survey papers for the respective aspects. For our context, we adjusted the definitions to the controller setting. Furthermore, they can all be justified through concrete usage scenarios. The set of metrics is complete in terms of criteria reflected in the current literature. No other aspects have been suggested at this stage, nor have these emerged in the current discussion. It covers the concerns usually included in the concept of ethical AI. Concerns that apply here include, (https://ethical.institute/index.html (accessed on 1 December 2025)) in addition to fairness and explainability that we cover explicitly, the following:

Transparency and accountability, in particular regarding the data sources used and their quality. Here the implications of input data quality such as incompleteness or incorrectness are relevant. Observation disturbances (as part of the robustness metrics) occur if not all context can be observed to cover incompleteness as a data quality concern. Action disturbances allow us to deal with uncertainty and incorrectness in data. In this way, reproducibility is also addressed.
Ethical and legal compliance of the actions caused by data processing. The performance metrics allow us to monitor the controller via reward and penalty settings to discourage unethical and illegal behaviour.

6.2. Controller Quality Use Case

We present now a use case for an RL-constructed controller for cloud infrastructure scalability. This extends the previous motivational example from Section 3.3. We have fully implemented this controller using both SARSA and Q-Learning as RL techniques, which we defined in Section 4.3. More experimental results are available in the respective publications. This use case discussion serves to validate the usefulness and implementability of the metric collection and possible necessary remediation strategies.

This RL-constructed controller takes resource utilisation (workload) and performance (response time) as inputs to determine an updated infrastructure configuration in terms of up- or down-scaling computing resources to maintain response time commitments while also aiming to reduce load, and therefore cost.

In Table 3, we summarise the quality metric examples from the previous sections. We now discuss the metrics one-by-one in the context of the RL cloud resource controller. For this, we describe the quality problem and a respective remediation solution.

Performance:
-
Problem: Training can be costly. Often, comprehensive data sets might not exist. Even if they do and might serve as input for training an ML model, generally, not all factors are known about the construction of these data sets. Various cloud and hosting provider data sets are available, but lack, for instance, information about the occurrence of failure in the recorded operation.
-
Remediation: A remediation can be the use of an own initial expert model to kick-start the learning process with a stable model for faster convergence.
Robustness:
-
Problem: Disturbances in the controller input signals can lead to uncertainties in the subsequent analysis and decision making based on these.
-
Remediation: This can be addressed in two ways. In the construction, formal approaches such as fuzzification are useful for uncertainty management by making the degree of uncertainty explicit and tangible in further decision making on remediation actions. Testing as a second perspective with induced noise can then show the capability of the controller to operate in the presence of noise.
Explainability:
-
Problem: In our controller implementation, performance is valued higher than resource utilisation, since the former would likely cause SLA violations and contractual penalities if violated.
-
Remediation: This can be explicitly controlled through weighted consideration, making the motivation behind the preferences visible.
Fairness:
-
Problem: Technical bias—as already indicated in Table 3—is a fairness concern. Overprovisioning could be favoured by a service provider as it would result in consumers spending more money on contracted infrastructure.
-
Remediation: Our experimental evaluation has compared an ideal solution with a selected set of common strategies such as over- and underprovisioning. The experiments that compare strategies can help to detect unfair strategies based on objectively defined quality criteria.
Sustainability:
-
Problem: The other negative side of overprovisioning, in addition to consumer costs as discussed above, is a higher energy consumption.
-
Remediation: This can be addressed in the RL model construction through encoded penalties for high resource usage.

All these metrics were used in past work on cloud scalability controllers [15,52,53] and have been experimentally validated. Responsible AI, however, as a meta-level concern, is not a directly applicable at the technology level.

6.3. Contribution to Theory and Practice

From a theoretical perspective, the contribution here provides a sound definition of valid quality metrics as discussed above, presented as a coherent catalogue of metrics all documented in a common format.

The practical relevance is enhanced by framing this within a reference architecture that clarifies the relevant components and processing activities. A motivational example of an RL-based cloud infrastructure controller, presented in two parts for motivation and as a validation use cases, illustrates a possible implementation. We have also discussed the wider DevOps/AIOps industry context, which are concepts that frame the automation of software development and operation, in particular under consideration of AI technologies.

6.4. Generative AI and LLM

Recently, generative AI (GenAI) technologies based on large language models (LLMs) have been used in software construction. While these are not the focus here, because currently little experience in this area has been documented for controller construction, we do report on a couple of related initiatives to highlight the importance.

There is a wide range of activities around the utilisation of LLMs for software generation in general. A useful software system management context to understand GenAI-based controller construction is referred to as Infrastructure-as-Code (IaC) [54,55,56], i.e., the configuration and operation of systems through dedicated software, the IaC scripts. Infrastructure here is any software-controllable resource.

A current focus is on code generation, as indicated by [57] in an LLM technology review. IaC code generation can be carried out through LLMs such as GPT3.5-turbo and Codeparrot, which have been experimentally explored. For instance, GPT-4 was used to generate Kubernetes container configurations. These functions are integrated some into LLM tools, such as Infracopilot or K8sGPT, which adopt and integrate AI solutions for management and control into more general tool environments [58].

Besides using LLMs for code generation, work has also started on its quality assessment. A benchmark for LLM-generated IaC code is presented by [59]. It is based on a data set that includes 458 scenarios covering a wide range of services. Each scenario consists of a natural language IaC problem description and an infrastructure solution specification, where the former is input of the LLM and the latter is used to verify the generated IaC program. The authors note that at present even top-performing LLMs only achieve an accuracy of less than 20% for the configuration and control domain, compared to around 80% on general code generation benchmarks.

In [60], two research directions are foreseen: (i) collective intelligence, where a multi-agent collaboration framework allows different LLM agents to cover different roles for self-evolution; and (ii) experience accumulation, where LLM agents continuously acquire new “skills” for emerging tasks such as architecture-based adaptation. LLMs differ from self-adaptive systems in that they rely more on natural language than formally defined knowledge. A related approach is Red Hat’s Ansible Lightspeed (https://www.redhat.com/en/technologies/management/ansible/ansible-lightspeed (accessed on 1 December 2025)), which is an AI coding assistant that uses the IBM watsonx Code Assistant to produce code recommendations based on best practices for the configuration management tools Ansible.

The lack of comprehensive results for controller construction indicate this as a direction for future research on quality metrics.

7. Conclusions

This look into self-adaptive systems shows that the construction and operation of controllers for these are of critical importance. Continuous quality management is, however, still an open research problem [25]. The problem space we investigated is at the intersection of different research fields: software engineering, ML, control theory, automation and self-adaptive systems, and also data analysis—here with an application focus on self-adaptive cyber-physical and virtualised software-controlled systems. We need to develop systems where automation is important throughout the whole life-cycle and where, using AI and control-theoretic approaches in the construction and operation of systems, they are made to be self-adaptive through continuous quality control. A new perspective emerges through the need to automate construction and operation because human intervention is not possible or recommendable, be that for cost, time, or dependability reasons. Another recent direction beyond ML is the use of generative AI. Generative AI as coding assistance is built on LLM technologies.

Our core contribution consists of a metrics catalogue to assess the quality of system controllers. We based this work on several review papers covering their respective focal concern to form a coherent conceptual framework here that provides a comprehensive set of metrics and techniques to allow this two-layered controller loop to be realised.

Weyns has discussed a number of open challenges in his 2020 review [11]. Adaptation in decentralised settings and alignment with emerging technologies such as IoT and cyber-physical systems are specific concerns that we have also highlighted. Weyns also noted that dealing with changing goals and in particular dealing with unanticipated change are concerns of relevance, which we have addressed in our quality framework by introducing drift, anomalies, and their cause analysis techniques. Also, the exploitation of artificial intelligence was named as an open challenge that we focused on.

Several directions remain for future work specific to our proposed framework. We already indicated that the metrics catalogue remains at a conceptual level with a use case validation here, as detailed empirical investigations on a larger scale of the individual metrics across the five categories in different controller settings were beyond the scope of this investigation. Furthermore, handling strategies for the different metric types would also have to be investigated further if the actual implementation of a multi-objective controller for all metrics is envisaged. We have focused on reinforcement learning as the construction mechanism, but Generative AI and LLMs will play a more significant role in the future, as already pointed out in [60].

Author Contributions

Conceptualization, C.P. and H.R.B.; methodology, C.P. and H.R.B.; validation, N.E.I.; data curation, C.P.; writing—original draft preparation, C.P. and N.E.I.; writing—review and editing, C.P. and N.E.I.; supervision, C.P.; project administration, C.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Free University of Bozen-Bolzano, grant QUASAR.

Data Availability Statement

This manuscript is based on conceptual/theoretical work. Based on a review of related literature, a conceptual framework (consisting of a reference architecture and defined metrics) is extracted. All literature sources used are referenced. No other data (neither from external sources nor internally generated) was used in this process. No experimental work that would consume or produce data was carried out.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Rajasekhar, N.; Radhakrishnan, T.; Samsudeen, N. Exploring reinforcement learning in process control: A comprehensive survey. Int. J. Syst. Sci. 2025, 56, 3528–3557. [Google Scholar] [CrossRef]
El-Khatib, M.F.; Khater, F.M.; Hendawi, E.; Abu El-Sebah, M.I. Simplified and intelligent controllers for multi-input multi-output processes. Eng. Appl. Artif. Intell. 2025, 141, 109816. [Google Scholar] [CrossRef]
Cheng, H.; Jung, S.; Kim, Y.B. A novel reinforcement learning controller for the DC-DC boost converter. Energy 2025, 321, 135479. [Google Scholar] [CrossRef]
El-Khatib, M.F.; Sabry, M.N.; El-Sebah, M.I.A.; Maged, S.A. Hardware-in-the-loop testing of simple and intelligent MPPT control algorithm for an electric vehicle charging power by photovoltaic system. ISA Trans. 2023, 137, 656–669. [Google Scholar] [CrossRef] [PubMed]
Wan, Z.; Xia, X.; Lo, D.; Murphy, G.C. How does Machine Learning Change Software Development Practices? IEEE Trans. Softw. Eng. 2021, 47, 1857–1871. [Google Scholar] [CrossRef]
Al-Nima, R.R.O.; Han, T.; Al-Sumaidaee, S.A.M.; Chen, T.; Woo, W.L. Robustness and performance of Deep Reinforcement Learning. Appl. Soft Comput. 2021, 105, 107295. [Google Scholar] [CrossRef]
Buşoniu, L.; de Bruin, T.; Tolić, D.; Kober, J.; Palunko, I. Reinforcement learning for control: Performance, stability, and deep approximators. Annu. Rev. Control 2018, 46, 8–28. [Google Scholar] [CrossRef]
Xu, X.; Chen, Y.; Bai, C. Deep Reinforcement Learning-Based Accurate Control of Planetary Soft Landing. Sensors 2021, 21, 8161. [Google Scholar] [CrossRef] [PubMed]
Faustino, J.; Adriano, D.; Amaro, R.; Pereira, R.; da Silva, M.M. DevOps benefits: A systematic literature review. Softw. Pract. Exp. 2022, 52, 1905–1926. [Google Scholar] [CrossRef]
Filieri, A.; Maggio, M.; Angelopoulos, K.; D’Ippolito, N.; Gerostathopoulos, I.; Hempel, A.B.; Hoffmann, H.; Jamshidi, P.; Kalyvianaki, E.; Klein, C.; et al. Software Engineering Meets Control Theory. In Proceedings of the 2015 IEEE/ACM 10th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, SEAMS 2015, Florence, Italy, 18–19 May 2015; Inverardi, P., Schmerl, B.R., Eds.; IEEE Computer Society: Washington, DC, USA, 2015; pp. 71–82. [Google Scholar]
Weyns, D. Engineering Self-Adaptive Systems: A Short Tour in Seven Waves. In An Introduction to Self-Adaptive Systems: A Contemporary Software Engineering Perspective; IEEE: New York, NY, USA, 2020; pp. 17–24. [Google Scholar]
Pahl, C.; Azimi, S.; Barzegar, H.R.; Ioini, N.E. A Quality-driven Machine Learning Governance Architecture for Self-adaptive Edge Clouds. In Proceedings of the 2th International Conference on Cloud Computing and Services Science, CLOSER 2022, Benidorm, Spain, 19–21 May 2026; SCITEPRESS: Setúbal, Portugal, 2022; pp. 305–312. [Google Scholar]
Pahl, C.; Azimi, S. Constructing Dependable Data-Driven Software With Machine Learning. IEEE Softw. 2021, 38, 88–97. [Google Scholar] [CrossRef]
Pahl, C. Research challenges for machine learning-constructed software. Serv. Oriented Comput. Appl. 2023, 17, 1–4. [Google Scholar] [CrossRef]
Zhou, G.; Tian, W.; Buyya, R.; Xue, R.; Song, L. Deep reinforcement learning-based methods for resource scheduling in cloud computing: A review and future directions. Artif. Intell. Rev. 2024, 57, 124. [Google Scholar] [CrossRef]
Saboori, A.; Jiang, G.; Chen, H. Autotuning Configurations in Distributed Systems for Performance Improvements Using Evolutionary Strategies. In Proceedings of the 2008 The 28th International Conference on Distributed Computing Systems, Beijing, China, 17–20 June 2008; pp. 769–776. [Google Scholar] [CrossRef]
Glossop, C.R.; Panerati, J.; Krishnan, A.; Yuan, Z.; Schoellig, A.P. Characterising the Robustness of Reinforcement Learning for Continuous Control using Disturbance Injection. CoRR 2022, abs/2210.15199. [Google Scholar]
Jabbari, S.; Joseph, M.; Kearns, M.J.; Morgenstern, J.; Roth, A. Fair Learning in Markovian Environments. arXiv 2016, arXiv:1611.03071. [Google Scholar]
Krajna, A.; Brcic, M.; Lipic, T.; Doncevic, J. Explainability in reinforcement learning: Perspective and position. arXiv 2022, arXiv:2203.11547. [Google Scholar] [CrossRef]
Milani, S.; Topin, N.; Veloso, M.; Fang, F. A Survey of Explainable Reinforcement Learning. arXiv 2022, arXiv:2202.08434. [Google Scholar] [CrossRef]
Li, N.; Adepu, S.; Kang, E.; Garlan, D. Explanations for human-on-the-loop: A probabilistic model checking approach. In Proceedings of the SEAMS’20: International Symposium on Software Engineering for Adaptive and Self-Managing Systems, Seoul, Republic of Korea, 29 June–3 July 2020; pp. 181–187. [Google Scholar]
Wohlrab, R.; Cámara, J.; Garlan, D.; Schmerl, B.R. Explaining quality attribute tradeoffs in automated planning for self-adaptive systems. J. Syst. Softw. 2023, 198, 111538. [Google Scholar] [CrossRef]
Mou, Z.; Huo, Y.; Bai, R.; Xie, M.; Yu, C.; Xu, J.; Zheng, B. Sustainable Online Reinforcement Learning for Auto-bidding. In Proceedings of the NIPS’22: 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
Zhang, M.; Li, J.; Zhao, H.; Tei, K.; Honiden, S.; Jin, Z. A Meta Reinforcement Learning-based Approach for Self-Adaptive System. In Proceedings of the IEEE International Conference on Autonomic Computing and Self-Organizing Systems, ACSOS 2021, Washington, DC, USA, 27 September–1 October 2021; pp. 1–10. [Google Scholar]
He, X.; Zhao, K.; Chu, X. AutoML: A survey of the state-of-the-art. Knowl.-Based Syst. 2021, 212, 106622. [Google Scholar] [CrossRef]
Lwakatare, L.E.; Raj, A.; Bosch, J.; Olsson, H.H.; Crnkovic, I. A Taxonomy of Software Engineering Challenges for Machine Learning Systems: An Empirical Investigation. In Agile Processes in Software Engineering and Extreme Programming, Proceedings of the 20th International Conference, XP 2019, Montréal, QC, Canada, 21–25 May 2019; Lecture Notes in Business Information Processing; Kruchten, P., Fraser, S., Coallier, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2019; Volume 355, pp. 227–243. [Google Scholar]
Villegas, N.M.; Tamura, G.; Müller, H.A.; Duchien, L.; Casallas, R. DYNAMICO: A Reference Model for Governing Control Objectives and Context Relevance in Self-Adaptive Software Systems. In Proceedings of the Software Engineering for Self-Adaptive Systems II—International Seminar Dagstuhl Castle, Wadern, Germany, 24–29 October 2010; Revised Selected and Invited Papers; Lecture Notes in Computer Science. de Lemos, R., Giese, H., Müller, H.A., Shaw, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; 7475, pp. 265–293. [Google Scholar]
Calinescu, R.; Weyns, D.; Gerasimou, S.; Iftikhar, M.U.; Habli, I.; Kelly, T. Engineering Trustworthy Self-Adaptive Software with Dynamic Assurance Cases. IEEE Trans. Softw. Eng. 2018, 44, 1039–1069. [Google Scholar] [CrossRef]
Langford, M.A.; Cheng, B.H.C. “Know What You Know”: Predicting Behavior for Learning-Enabled Systems When Facing Uncertainty. In Proceedings of the 16th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, Madrid, Spain, 18–24 May 2021; pp. 78–89. [Google Scholar]
Carreño, A.; Inza, I.; Lozano, J.A. Analyzing rare event, anomaly, novelty and outlier detection terms under the supervised classification framework. Artif. Intell. Rev. 2020, 53, 3575–3594. [Google Scholar] [CrossRef]
Barros, R.S.M.; Santos, S.G.T.C. A large-scale comparison of concept drift detectors. Inf. Sci. 2018, 451–452, 348–370. [Google Scholar] [CrossRef]
Cerquitelli, T.; Proto, S.; Ventura, F.; Apiletti, D.; Baralis, E. Automating concept-drift detection by self-evaluating predictive model degradation. arXiv 2019, arXiv:1907.08120. [Google Scholar]
Minku, L.L.; Yao, X. DDD: A New Ensemble Approach for Dealing with Concept Drift. IEEE Trans. Knowl. Data Eng. 2012, 24, 619–633. [Google Scholar] [CrossRef]
Lobo, J.L.; Del Ser, J.; Laña, I.; Bilbao, M.N.; Kasabov, N. Drift Detection over Non-stationary Data Streams Using Evolving Spiking Neural Networks. In Intelligent Distributed Computing XII; Del Ser, J., Osaba, E., Bilbao, M.N., Sanchez-Medina, J.J., Vecchio, M., Yang, X.S., Eds.; Springer: Cham, Switzerland, 2018; pp. 82–94. [Google Scholar]
Chandola, V.; Banerjee, A.; Kumar, V. Anomaly Detection: A Survey. ACM Comput. Surv. 2009, 41, 15. [Google Scholar] [CrossRef]
Gheibi, O.; Weyns, D. Lifelong Self-Adaptation: Self-Adaptation Meets Lifelong Machine Learning. In Proceedings of the 2022 International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS), Pittsburgh, PA, USA, 22–24 May 2022; Schmerl, B.R., Maggio, M., Cámara, J., Eds.; IEEE: New York, NY, USA, 2022; pp. 1–12. [Google Scholar]
Femminella, M.; Reali, G. Gossip-based monitoring of virtualized resources in 5G networks. In Proceedings of the IEEE INFOCOM 2019—IEEE Conference on Computer Communications Workshops, INFOCOM Workshops 2019, Paris, France, 29 April–2 May 2019; pp. 378–384. [Google Scholar]
Tokunaga, K.; Kawamura, K.; Takaya, N. High-speed uploading architecture using distributed edge servers on multi-RAT heterogeneous networks. In Proceedings of the IEEE International Symposium on Local and Metropolitan Area Networks, LANMAN 2016, Rome, Italy, 13–15 June 2016; pp. 1–2. [Google Scholar]
Zhao, H.; Yi, D.; Zhang, M.; Wang, Q.; Xinyue, S.; Zhu, H. Multipath Transmission Workload Balancing Optimization Scheme Based on Mobile Edge Computing in Vehicular Heterogeneous Network. IEEE Access 2019, 7, 116047–116055. [Google Scholar] [CrossRef]
Javed, A.; Malhi, A.; Främling, K. Edge Computing-based Fault-Tolerant Framework: A Case Study on Vehicular Networks. In Proceedings of the Intl Wireless Communications and Mobile Computing Conference, IWCMC 2020, Limassol, Cyprus, 15–19 June 2020. [Google Scholar]
Fang, D.; Liu, X.; Romdhani, I.; Jamshidi, P.; Pahl, C. An agility-oriented and fuzziness-embedded semantic model for collaborative cloud service search, retrieval and recommendation. Future Gener. Comput. Syst. 2016, 56, 11–26. [Google Scholar] [CrossRef]
Hong, C.; Varghese, B. Resource Management in Fog/Edge Computing: A Survey on Architectures, Infrastructure, and Algorithms. ACM Comput. Surv. 2019, 52, 97. [Google Scholar] [CrossRef]
El Ioini, N.; Pahl, C. Trustworthy orchestration of container based edge computing using permissioned blockchain. In Proceedings of the 2018 Fifth International Conference on Internet of Things: Systems, Management and Security, Valencia, Spain, 15–18 October 2018; pp. 147–154. [Google Scholar]
El Ioini, N.; Barzegar, H.R.; Pahl, C. Trust management for service migration in Multi-access Edge Computing environments. Comput. Commun. 2022, 194, 167–179. [Google Scholar] [CrossRef]
Arcaini, P.; Riccobene, E.; Scandurra, P. Modeling and Analyzing MAPE-K Feedback Loops for Self-Adaptation. In Proceedings of the 10th IEEE/ACM International Symposium on Software Engineering for Adaptive and Self-Managing Systems, SEAMS 2015, Florence, Italy, 18–19 May 2015; Inverardi, P., Schmerl, B.R., Eds.; IEEE Computer Society: Los Alamitos, CA, USA, 2015; pp. 13–23. [Google Scholar]
De Hoog, J.; Mercelis, S.; Hellinckx, P. Improving machine learning-based decision-making through inclusion of data quality. CEUR Workshop Proc. 2019, 2491, 1–10. [Google Scholar]
Ehrlinger, L.; Haunschmid, V.; Palazzini, D.; Lettner, C. A DaQL to Monitor Data Quality in Machine Learning Applications. In Proceedings of the Database and Expert Systems Applications—30th International Conference, DEXA 2019, Linz, Austria, 26–29 August 2019; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2019; Volume 11706, pp. 227–237. [Google Scholar]
Ralph, P. ACM SIGSOFT Empirical Standards Released. SIGSOFT Softw. Eng. Notes 2021, 46, 19. [Google Scholar] [CrossRef]
Soudbakhsh, D.; Annaswamy, A.M.; Wang, Y.; Brunton, S.L.; Gaudio, J.E.; Hussain, H.S.; Vrabie, D.L.; Drgona, J.; Filev, D.P. Data-Driven Control: Theory and Applications. In Proceedings of the 2023 American Control Conference (ACC), San Diego, CA, USA, 31 May–2 June 2023; pp. 1922–1939. [Google Scholar]
Brunton, S.L.; Kutz, J.N. Data Driven Science & Engineering—Machine Learning, Dynamical Systems, and Control; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
Nian, R.; Liu, J.; Huang, B. A review on reinforcement learning: Introduction and applications in industrial process control. Comput. Chem. Eng. 2020, 139, 106886. [Google Scholar] [CrossRef]
Arabnejad, H.; Pahl, C.; Jamshidi, P.; Estrada, G. A Comparison of Reinforcement Learning Techniques for Fuzzy Cloud Auto-Scaling. In Proceedings of the International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017, Madrid, Spain, 14–17 May 2017; pp. 64–73. [Google Scholar]
Jamshidi, P.; Sharifloo, A.M.; Pahl, C.; Metzger, A.; Estrada, G. Self-Learning Cloud Controllers: Fuzzy Q-Learning for Knowledge Evolution. In Proceedings of the 2015 International Conference on Cloud and Autonomic Computing, Boston, MA, USA, 21–25 September 2015; pp. 208–211. [Google Scholar]
Alnafessah, A.; Gias, A.U.; Wang, R.; Zhu, L.; Casale, G.; Filieri, A. Quality-Aware DevOps Research: Where Do We Stand? IEEE Access 2021, 9, 44476–44489. [Google Scholar] [CrossRef]
Seth, D.K.; Ratra, K.K.; Sundareswaran, A.P. AI and Generative AI-Driven Automation for Multi-Cloud and Hybrid Cloud Architectures: Enhancing Security, Performance, and Operational Efficiency. In Proceedings of the 2025 IEEE 15th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 6–8 January 2025; pp. 00784–00793. [Google Scholar]
Pahl, C.; Gunduz, N.G.; Sezen, O.C.; Ghamgosar, A.; El Ioini, N. Infrastructure as Code–Technology Review and Research Challenges. In Proceedings of the International Conference on Cloud Computing and Services Science CLOSER’2025, Porto, Portugal, 1–3 April 2025. [Google Scholar]
Srivatsa, K.G.; Mukhopadhyay, S.; Katrapati, G.; Shrivastava, M. A Survey of using Large Language Models for Generating Infrastructure as Code. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), Goa, India, 14–17 December 2023; Pawar, J.D., Devi, S.L., Eds.; Goa University: Goa, India, 2023; pp. 523–533. [Google Scholar]
Novakova Nedeltcheva, G.; De La Fuente Ruiz, A.; Orue-Echevarria Arrieta, L.; Bat, N.; Blasi, L. Towards Supporting the Generation of Infrastructure as Code Through Modelling Approaches—Systematic Literature Review. In Proceedings of the 2022 IEEE 19th International Conference on Software Architecture Companion (ICSA-C), Honolulu, HI, USA, 12–15 March 2022; pp. 210–217. [Google Scholar]
Kon, P.T.J.; Liu, J.; Qiu, Y.; Fan, W.; He, T.; Lin, L.; Zhang, H.; Park, O.M.; Elengikal, G.S.; Kang, Y.; et al. IaC-Eval: A Code Generation Benchmark for Cloud Infrastructure-as-Code Programs. In Advances in Neural Information Processing Systems; Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2024; Volume 37, pp. 134488–134506. [Google Scholar]
Li, J.; Zhang, M.; Li, N.; Weyns, D.; Jin, Z.; Tei, K. Generative AI for Self-Adaptive Systems: State of the Art and Research Roadmap. ACM Trans. Auton. Adapt. Syst. 2024, 19, 13. [Google Scholar] [CrossRef]

Figure 2. Controller reference architecture—with lower system monitoring loop and upper controller quality management loop.

Figure 3. Layered controller quality architecture—with system quality and controller quality management in MAPE-K format.

Figure 5. Disturbances in the form of five patterns, adopted from [17], with three non-periodic ones on the left and two periodic ones on the right.

Table 1. Controller quality metrics and their context.

Quality Metric	Impact	Target
Performance	Internal	System/Controller
Robustness	Inward	System/Controller
Fairness	Internal	System/Controller
Sustainability	Outward	Environment (Resource Consumption)
Explainability	Outward	Environment (Owner, Responsible)
Responsible AI	Outward	Environment (Owner, Accountability)

Table 2. Summary of metrics with definition, determination (M + A in MAPE), and remediation (P + E in MAPE).

Quality Metric	Definition	Determination [M + A]	Remediation [P + E]
Performance	reward optimisation or stability	Layer 1 quality (e.g., execution time, workload) and Layer 2 quality (e.g., convergence, loss)	built into rewards and policy optimisation
Robustness	tolerance against disturbances	impact on performance metrics during disturbances	pattern recognition (learned or manual)
Explainability	evaluation of controller actions	recording of actions and a posteriori analysis of action impact	reconfiguration
Fairness	no event or action bias	bias detection via holistic performance comparison	mechanisms such as favourable labels
Sustainability	resource consumption	environmental metrics—energy and cost	reward or penalise consumption
Responsible AI (meta-level)	ethical use of AI	assessment of ethical implications through metrics like Risk of Harm and Ethical Impact Score	implementation of guidelines for ethical AI practices

Table 3. Controller quality metrics—examples.

Quality Metric	Problem	Remediation
Performance	convergence and loss problems due to costly training	use of expert models in training
Robustness	white noise in distributed environments cause by the network	adjust training criteria and detect anomalies
Explainability	identification of favoured metric or strategy	use of explicit weightings for control
Fairness	underprovisioning as a resource allocation strategy could disadvantage users on high demand	define favourable labels that reflect user needs
Sustainability	overprovisioning is performance-effective, but causes high energy consumption	penalise undesirable sustainability results
Responsible AI (meta-level)	potential ethical concerns regarding bias and lack of transparency in AI decisions	implement guidelines and frameworks to ensure fairness, accountability, and transparency

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pahl, C.; Barzegar, H.R.; El Ioini, N. Quality Management for AI-Generated Self-Adaptive Resource Controllers. Machines 2026, 14, 25. https://doi.org/10.3390/machines14010025

AMA Style

Pahl C, Barzegar HR, El Ioini N. Quality Management for AI-Generated Self-Adaptive Resource Controllers. Machines. 2026; 14(1):25. https://doi.org/10.3390/machines14010025

Chicago/Turabian Style

Pahl, Claus, Hamid R. Barzegar, and Nabil El Ioini. 2026. "Quality Management for AI-Generated Self-Adaptive Resource Controllers" Machines 14, no. 1: 25. https://doi.org/10.3390/machines14010025

APA Style

Pahl, C., Barzegar, H. R., & El Ioini, N. (2026). Quality Management for AI-Generated Self-Adaptive Resource Controllers. Machines, 14(1), 25. https://doi.org/10.3390/machines14010025

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Quality Management for AI-Generated Self-Adaptive Resource Controllers

Abstract

1. Introduction

2. Related Work

3. Reference Architecture for Controller Quality

3.1. State-of-the-Art and Motivation of Framework

3.2. Controller Quality Requirements

3.3. Resource Management and Motivational Example

3.4. Reference Architecture for Quality Management

3.5. Controller Quality Management at the Upper Layer

4. Reinforcement Learning Principles

4.1. Reinforcement Learning and Control

4.2. Reward and Quality

4.3. SARSA and Q-Learning

5. Controller Quality Metrics

5.1. Method

5.2. Metrics—Selection and Catalogue

5.3. Performance

5.4. Robustness

5.5. Explainability

5.6. Fairness

5.7. Sustainability

5.8. Responsible AI (Meta-Level)

6. Discussion of the PREFS-R Framework

6.1. PREFS-R Summary

6.2. Controller Quality Use Case

6.3. Contribution to Theory and Practice

6.4. Generative AI and LLM

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI