As outlined before, there is a reasonable amount of work trying to quantify the degree of self-organization in an overall SASO system. However, the observable effects of SASO mechanisms manifest also in modifications of the behavior of subsystems that are subject to a modifiable configuration. We propose to develop a measurement framework for quantifying the degree to which a SASO (sub-)system changes its behavior in a given observation period. With such a technique at hand, the system can quantify the heterogeneity of configurations, e.g., in response to dynamics of the environment or external disturbances, and consequently assess the impact of an adaptation. Such a measurement comes with a second advantage: It can be used to compare SASO systems under identical conditions to analyze which solution requires higher adaptation effort and achieves stable solutions faster.

In the following section, we initially define challenges that need to be addressed towards such a measurement framework. Afterwards, we define the first approach for such a measurement framework by addressing aspects of the first derived challenges. This approach makes use of generative probabilistic models, which are defined in this context.

#### 4.1. Challenges for Defining a Measurement Framework

In the following paragraphs, we define challenges that need to be addressed towards a measurement framework for self-adapting systems. We assume that self-adaptation manifests itself by means of configurations that are altered through internal decision mechanisms. We further assume that this actual configuration of all parameters relevant for controlling the productive behavior of a subsystem is accessible through interfaces. Consequently, each subsystem ${a}_{i}\in S$ has a current configuration ${c}_{i}\in {C}_{i}$.

**Challenge 1: Quantification of changes**

We assume that a subsystem ${a}_{i}$ does not need to modify its configuration if all conditions are stable. This implies that any adaptation is a response to environmental dynamics, reorganization effects, or disturbances. Conceptually, we model each subsystem ${a}_{i}$ as a process that generates observable samples (i.e., configurations). We further model the observation process as a snapshot of all configurations ${c}_{i}$ that is processed with a pre-defined sampling rate. In a first approach, we may quantify the change statistically by accumulating the number of observed changes. The advantage of this approach is that it is easy to compute. This requires research on the following questions: (a) Which size of the sampling interval ${d}_{adaptation}$ is appropriate for measuring self-adaptation? (b) How does the proposed measurement behave if being used as a basis for comparisons between different self-adaptation mechanisms? and (c) If configurations possibly cover a continuous space, they have to be discrete in the first place—how does such a pre-processing affect the measurement?

**Challenge 2: Heterogeneity of changes in the subsystems’ configurations**

The outlined approach has the disadvantage of not incorporating the heterogeneity of configurations within the system during the observation period. This means that only the change itself is considered, but not the variety of different configurations and the diversity of observed adaptations. In this sense, the previous approach provides just an easy-to-compute indicator for the adaptation behavior but neglects detailed descriptions of the behavior. To overcome these limitations, we propose to consider the frequencies and heterogeneity of the adaptations. Expressed in a probabilistic framework, we represent each subsystem ${a}_{i}$ as a process that generates observable samples (i.e., configurations). We assume an observation period of a given length in which actual configurations of all contained subsystems are collected in a certain sampling frequency. When measuring the degree of self-adaptation from the outside (e.g., without access to the internals of the ${a}_{i}$), we either use the configurations as they are, use pre-processing techniques to extract the values of attributes (features) from samples (observations), or use a hybrid approach.

Based on this idea, we define the degree of self-adaptation as an unexpected or unpredictable change of the distribution underlying the observed samples (i.e., the configurations). We propose to measure the expected amount of information contained in a new distribution concerning for a reference distribution of samples by using divergence measures (e.g., the Kullback–Leibler divergence [

37]): The higher the difference between both distributions, the more self-adaptation took place. Compared to the concept outlined in Challenge 1, the probability distributions have the advantage that they take all occurred configurations into account. For instance, switching only between two different configurations will result in a lower degree of self-adaptation than taking on highly diverse configurations, while the static degree will only recognize the various changes. The basic challenge here is to compare the different divergence measures and change detecting techniques known in the machine learning domain and determine which provides the most reliable values for quantifying the severity of the change. Such a model requires also research on the following questions: (i) How to handle configurations that may consist of other variable types (e.g., including categorical or Boolean), and (ii) How to handle adaptations that may be done more frequently or are done asynchronously within

S (as a result of isolated decisions by each

${a}_{i}$, negotiated adaptations, or chain reactions, for instance).

**Challenge 3: Stability of self-adaptation processes**

The two previous challenges proposed to measure self-adaptation as a change in the configuration behavior of the contained subsystems. Conceptually, this is based on the assumption that self-adaptation is a response to certain events and influences (e.g., disturbances, outages, or environmental dynamics): To maintain a certain utility or to self-improve this utility over time, certain configurations are more beneficial than others in a particular context. However, this basic idea of self-adaptation also assumes that in case of static conditions (or only slowly changing situations) no adaptations are needed. Consequently, a self-adaptation process is assumed to result in stable and optimized solutions. A resulting challenge is then to develop techniques which can analyze this stabilization process in detail. This especially means techniques to identify cases where re-adaptation is not converging towards (optimal) behavior or runs into oscillating patterns.

The basic idea is illustrated in

Figure 2: The green arrows indicate the comparison of probability distributions as discussed in the context of Challenge 2. We define periods of consecutive estimations and compare the results of the measurements as aggregated value for a certain time frame (

${T}_{Stability}$) against the previous time frame. Therefore,

${T}_{Stability}$ specifies the size of the considered window and

d refers to the fixed/sliding window size. In particular, we now shift the focus towards long-term comparisons instead of the one-by-one comparisons. Again, a suitable approach may be the usage of a hybrid technique, i.e., considering both windows: an initial pre-defined window and the previous sliding window. From this general concept, the following research questions have to be addressed: (a) What is a meaningful lower bound for a factor

k (number of frames) such that

${T}_{stability}=k\ast d$ provides meaningful results for a stability estimation? (b) How can the stability be expressed to consider the differences between the observed behavior within different time frames (of size

${T}_{Stability}$)? and (c) How can we distinguish between a short-term level of stability (i.e.,

k is 1—in response to an identified disturbance using techniques such as [

38]) and the long-term stability of the system (i.e.,

k is larger than 1—taking certain interval into account, such as a day)?

**Challenge 4: Sliding windows**

The probability-based approaches outlined before require inherent comparability of two probability distributions. Transferred to the temporal behavior of a SASO system, this implies that the potential adaptation process manifests itself in the difference between a current and a referential distribution of attribute occurrences. For comparison, we must define that both measurement periods cover the same period. This can be done using a sliding window approach: A fixed period

d is used to observe samples for the current estimation process (i.e., between time

${t}_{0}$ and

${t}_{-1}$) and the same duration is used for a reference observation (i.e., the period directly before the current observations are done: between

${t}_{-1}$ and

${t}_{-2}$). Alternatively, the reference window might be fixed (i.e., static), e.g., at the begin of the observation (here, slow changes can be detected easier, but changes the composition of

S always require new reference models). Similar considerations have been presented in the context of measuring emergence in [

35]. However, it may be beneficial to use a hybrid approach that combines both concepts: estimating the change compared to the previous period and against a static distribution to be able to cover all aspects. We propose to investigate the impact of choosing the window: (i) in terms of which period should serve as reference distribution, and (ii) how

d has to be configured depending on the message frequency.

**Challenge 5: Mutual influences among configurations**

Until now we assumed that configurations are chosen independently from each other, and further assumed that the overall distributions can be approximated by Gaussians. However, both assumptions may not be entirely true in real-world conditions. For instance, there is a temporal dependency among configurations of an individual subsystem, since a configuration is selected in response to external conditions. These conditions typically develop rather than shifting abruptly. Consequently, subsequent configurations are typically ’similar’ based on a similarity metric covering the configuration space (i.e., Euclidian). Furthermore, the decision of an individual subsystem may also have an impact on the utility of a neighbored subsystem, and consequently on the decision about its next configuration. We call this concept ’mutual influence’ [

10]. Especially when considering detected mutual influences in the self-adaptation mechanism as outlined in [

39,

40], the measurement of a degree of self-adaptation is directly influenced. This results in the challenge to incorporate these mutual influence information in the measure and in trying to distinguish between effects on the actual value of the measure, i.e., to which part did the acting in shared environments influence the actual value and to which part the basic self-adaptation mechanism? This can be augmented with information about correlations between configurations and their occurrence in distributions.

**Challenge 6: Self-explanation of SASO behavior**

Finally, the last challenge that takes all information from the previous challenges into account is dedicated to developing mechanisms for self-explanation. Given the possibility that we can quantify a degree of self-adaptation including a measure of stability, we can use this information to identify abnormally high adaptation behavior. This may be perceived as incomprehensible by users and consequently requires explanations. Consequently, several questions arise in this context: (a) How can we determine abnormally high adaptation behavior, possibly based on a certain context (i.e., situation)? (b) How can we identify root causes that triggered these adaptations? and (c) How can we generate comprehensible explanations for this behavior that are acceptable for users?

In this article, we present a first concept that proposes possible solutions for Challenge 2 and Challenge 4. We will outline this in the following paragraphs.

#### 4.2. Measuring a Degree of Adaptation Based on Generative Probabilistic Models

We define self-adaptation as an unexpected or unpredictable change of the distribution underlying the observed samples (i.e., the configurations of the subsystems). Consequently, a divergence measure can be applied to compare two density functions. We will refer to a density function

$p(x)$ representing an earlier point in time and to

$q(x)$ as a density function representing the current observation cycle. A famous divergence measure is the Kullback–Leibler (KL) divergence

$KL(p||q)$ (also called

relative entropy), see [

37]. It is defined for continuous variables as follows:

The advantage of

$KL$ is that it fulfils some important requirements: (1) if

$p(x)=q(x)$ the measure

$KL(p||q)$ is 0, and (2)

$KL(p||q)\ge 0$. Reformulating Equation (

1) results in:

The formula can be interpreted as follows: It measures the expected amount of information contained in a new distribution concerning a reference distribution of samples. However, KL is not symmetric which means that it comes up with different results depending on which of the two distributions we use as reference and which as a comparison. We can easily turn it into a symmetric variant as follows:

Based on this symmetric variant, we can reformulate Equation (

2) as a symmetric variant as follows:

We propose to use Equation (

4) as a measure for quantifying self-adaptation processes. The basic idea is that the measure increases if the two distributions begin to change. This increase is a result of comparing the distributions of the observed samples, or more precisely, of the distribution of densities of the observed samples within the input space during a certain time interval. The more subsystems

${a}_{i}$ adapt their configuration due to changing conditions, the higher is the divergence to the previous distribution. As a result, the measure will indicate a higher degree of self-adaptation.

In comparison to other possible measurement techniques, this approach is characterized by a set of advantages:

In comparison to approaches following the concept of measuring autonomy as outlined, e.g., in [

26], our method does not rely on a static encoding of possible configurations pre-defined in number of bits.

In comparison to approaches using the discrete entropy (e.g., outlined for measuring emergence in [

34]), it is continuous and does not rely on binning (which introduces more parameters and a certain bias).

It is independent of the notion of other concepts such as self-organization or emergence.

Although it makes use of measurement period (windows), it is continuously applied. It does not need a trigger (e.g., the detection of a disturbance as used [

27], for instance).

It does not require a model of the internal decision processes of the autonomous subsystems ${a}_{i}$, since it just considers the externally visible configuration settings.

It can easily be applied to, e.g., hierarchical structures of SASO systems in terms of considering only those subsystems ${a}_{i}$ that belong to a certain authority.

Although KL comes with some limitations (e.g., the absolute value is hard to interpret), it fulfils the most urgent requirements we formulated for measuring a degree of self-adaptation:

It is zero if both distributions (i.e., derived from the reference and the current observation period) are identical. This means that the same configurations are observed. Theoretically, there may be switches between different agents running the configuration of the other one and vice versa, but in general this is a reliable approximation for constant behavior.

In turn, high values of KL indicate strong changes in the configurations of the contained subsystems. This indicates a high degree of self-adaptation.

A major issue in this context is that the values of KL highly depend on the considered feature vector (i.e., the number, the type, and the resolution of the configuration parameters) as well as the frequency in which adaptations are done. These values are highly application-dependent. However, the comparison is always made within the domain—meaning that the ordering is correct, but the individual actual value does not much about the severity of the change. This can be defined in relation to a set of observations, i.e., based on experiences with the application under investigation. This leaves the question open how to determine a real ’level’ out of the KL values.