A Formal Definition of Scale-Dependent Complexity and the Multi-Scale Law of Requisite Variety

Siegenfeld, Alexander F.; Bar-Yam, Yaneer

doi:10.3390/e27080835

Open AccessArticle

A Formal Definition of Scale-Dependent Complexity and the Multi-Scale Law of Requisite Variety

by

Alexander F. Siegenfeld

^1,2,*

and

Yaneer Bar-Yam

²

¹

Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

²

New England Complex Systems Institute, Cambridge, MA 02139, USA

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(8), 835; https://doi.org/10.3390/e27080835

Submission received: 16 May 2025 / Revised: 25 July 2025 / Accepted: 31 July 2025 / Published: 6 August 2025

(This article belongs to the Section Complexity)

Download

Browse Figures

Versions Notes

Abstract

Ashby’s law of requisite variety allows a comparison of systems with their environments, providing a necessary (but not sufficient) condition for system efficacy: A system must possess at least as much complexity as any set of environmental behaviors that require distinct responses from the system. However, to account for the dependence of a system’s complexity on the level of detail—or scale—of its description, a multi-scale generalization of Ashby’s law is needed. We define a class of complexity profiles (complexity as a function of scale) that is the first, to our knowledge, to exhibit a multi-scale law of requisite variety. This formalism provides a characterization of multi-scale complexity and generalizes the law of requisite variety’s single constraint on system behaviors to a class of multi-scale constraints. We show that these complexity profiles satisfy a sum rule, which reflects a tradeoff between smaller- and larger-scale degrees of freedom, and we extend our results to subdivided systems and systems with a continuum of components.

Keywords:

multi-scale; complexity; law of requisite variety; sum rule

1. Introduction

Defining complexity in general terms has been a persistent challenge in the study of complex systems [1,2,3,4]. A first-pass definition of a system’s complexity would be the information necessary to describe its state in full detail. However, this definition would assign the highest complexity to maximally disordered systems (e.g., an ideal gas where molecules move independently and unpredictably). Conversely, defining complexity as the degree of order would assign the maximum complexity to perfectly ordered systems (e.g., all the molecules moving in unison). Neither extreme captures what is typically meant by “complex”. Scale-dependent complexity—which has been formalized for general collections of random variables [5,6,7] and time series in particular [8,9,10,11] and used in contexts such as chaos [12,13], biological signals [14,15,16,17,18], traffic patterns [19,20], financial data [21,22,23], Gaussian processes [24], and fluid dynamics [25,26]—offers a promising path. Rather than attempt to quantify a system’s complexity with a single number, this approach recognizes that the information required to describe a system’s state—i.e., its complexity—is inherently dependent on the level of detail or scale at which that description is made (see Figure 1). In this framework, complexity is not an intrinsic scalar quantity of a system but a property of scale: It reflects how much structured variation remains after a system has been coarse-grained to a particular level of resolution. Systems typically viewed as “complex” exhibit complexity that varies across a range of scales [27], potentially in a scale-free/fractal fashion [28,29].

But in terms of real-world consequences, what does it mean if one system possesses more complexity than another? Ashby’s law of requisite variety [30] provides one possible answer: in order to be effective, the complexity of a system must equal or exceed that of its environment. Here, a system’s environment must be defined to include only the set of behaviors that requires a distinct response from the system (see Figure 2).

However, Ashby’s law does not account for the multi-scale nature of complexity. For instance, two individuals who lacked the ability to cooperate may have enough small-scale complexity to independently move various objects but would lack sufficient complexity at the scale necessary to move a couch. Thus, while the two individuals would possess sufficient complexity (i.e., a sufficient variety of behaviors), it would not be at the right scale (i.e., at the level of description that includes only behaviors that involve coordination between the two individuals). This example is not in violation of Ashby’s law, as Ashby’s law merely provides a necessary rather than a sufficient condition for system efficacy. Nonetheless, this example and others (see below) motivate us to seek a stronger necessary condition for system efficacy that takes into account not only the complexity but also the scale of system behaviors. We thus propose a multi-scale law of requisite variety: In order to be effective, the complexity of a system must equal or exceed that of its environment at all scales. Our goal is to provide a definition of scale-dependent complexity—a class of complexity profiles—that satisfies this multi-scale version law of requisite variety. For a pedagogical introduction to complexity profiles and the multi-scale law of requisite variety, please see ref. [27].

For an existing formal definition of a complexity profile [7,32], it has been proven that a multi-scale law of requisite variety applies for systems and environments that are block-independent, i.e., systems/environments with components that can be partitioned into mutually independent blocks such that components within the same block have the identical behavior [33]. However, the law of requisite variety does not apply to this complexity profile more generally. For instance, adding additional components to the system (without changing the existing components) can actually reduce this complexity profile at larger scales due to the possibility of negative interaction information for more than two variables [34]; thus, a system that is capable of effectively interacting with its environment could, nonetheless, end up with less complexity than its environment at larger scales. Given the desirability and usefulness of a complexity profile satisfying Ashby’s law at each scale (a property that has been implicitly used in many analyses, such as management [35,36,37], military defense [31,38,39], governance [40], multi-agent coordination [41,42,43], and evolutionary dynamics [44,45,46]), we therefore seek a formal definition of the complexity profile that reflects this property. (We will show that for block-independent systems, the formalism introduced here can be reduced to the definition of complexity profiles discussed above.)

The one other constraint that we desire for a complexity profile is a sum rule, i.e., that the area under the complexity profile does not depend on interdependencies between components but rather only the individual components’ behaviors. Such a constraint reflects the tradeoff between complexities at various scales: In order for a system to have complexity at larger scales, its components must be correlated, which constrains the fine-scale configurations of the system (and thus its smaller-scale complexity) [27]. Without a sum rule or some other similar constraint, the multi-scale law of requisite variety would be no more than many copies of the single-scale version of Ashby’s law—one for each scale—with no structure relating the various scales to each other.

In order to define a complexity profile for which the multi-scale law of requisite variety holds, we first have to define what it means for a multi-component system to effectively match its environment (Section 2). Then, in Section 3, we formally define what constitutes a complexity profile and what criteria must be satisfied for it to capture the multi-scale law of requisite variety and the tradeoff between complexities at various scales. In Section 4, we define a class of complexity profiles that satisfies such criteria and examine some of its properties.

Such a class does not provide a single complexity profile; rather, a complexity profile is assigned for each way of partitioning the system. Choosing a method of partitioning the system is analogous to choosing a coordinate system onto which the multi-scale complexity of the system can be projected (breaking the permutation symmetry corresponding to the relabeling of system components). The fact that the profile depends on the partitioning method reflects the fact that there is no single way to coarse-grain a system, although some coarse-graining choices are more useful/better reflect the system’s structure than others. However, the efficacy of a system in a particular environment is independent of the choice of the coordinate system; thus, regardless of which partitioning scheme is chosen, an effective system will have at least as much complexity as the environment at all scales. This formalism, therefore, gives an entire class of constraints that must be satisfied: As long as the system and its environment are partitioned/coarse-grained in the same way, the system’s complexity matching/exceeding its environment’s at all scales provides a necessary condition for system efficacy.

2. Generalizing Ashby’s Law to Multiple Components

Ashby’s law claims that to effectively regulate an environment, the system must have a degree of freedom or behavior for each distinct environmental behavior. In other words, there cannot be two environmental states for a given system state. It then follows (by the pigeon-hole principle) that the number of behaviors of the system must be greater than or equal to the number of behaviors of the environment.

More formally, let X and Y be random variables or collections of random variables, let

H (X)

denote the Shannon entropy of X, which is the minimum average number of bits needed to describe the state of X, and let

H (Y | X)

denote the expected value of the Shannon entropy of Y given the state of X [47]. Then, each system state of X corresponding to no more than one environmental state of Y can be written as

H (Y | X) = 0

, from which it follows that

H (X) \geq H (Y)

(i.e., the complexity of the environment can not exceed that of the system). It is important to note that in this formulation, the environment Y is defined to be the set of states that require distinct behaviors of the system. Two environmental states that do not require different system behaviors should be represented by a single state of Y.

In order to consider multi-scale behavior, let us describe the system X as consisting of N components such that

X = {x_{1}, \dots, x_{N}}

.

Definition 1.

A system X of size

N = | X |

is defined as a set of N random variables. These random variables are referred to as components of the system.

Remark 1.

In this formulation, all the components of one or more systems are treated as the same size; while this condition may seem like a limitation, any system or systems can be described in this way to arbitrary precision: Components of different sizes can be accounted for by defining a new set of components of size equal to the greatest common factor of the sizes of the original components (irrational relative sizes—for which no greatest common factor exists—can be approximated to arbitrary precision by rational relative sizes). If the new components are all of size l, each original component

{\tilde{x}}_{i}

of size

l_{i}

can then be replaced with

l_{i} / l

new components

x_{j + 1}, x_{j + 2} \dots, x_{j + l_{i} / l}

for which the state of one of these

l_{i} / l

variables completely determines the state of all the others, i.e.,

x_{j + 1} = x_{j + 2} = \dots = x_{j + l_{i} / l}

(see, e.g., Figure 3).

The assumption that the system X must have at least one distinct response for each environmental state Y (i.e.,

H (Y | X) = 0

) is generalized as follows: An “environmental component”

y_{i}

is defined for each system component

x_{i} \in X

, such that each

y_{i}

is a random variable representing the environmental states that require a distinct response from the system component

x_{i}

. Then, for the system to effectively interact with its environment,

H (y_{i} | x_{i}) = 0

for each i, i.e., there cannot be two environmental component states for a given state in the corresponding system component. (Note that this condition is necessary but not sufficient for the system to effectively interact with its environment—just because the system components can choose a different response for each environmental condition does not guarantee that the responses are appropriate.) Letting

Y = {y_{1}, \dots, y_{N}}

, we see that

H (y_{i} | x_{i}) = 0

implies

H (Y | X) = 0

, and, thus,

H (y_{i} | x_{i}) = 0

is a stronger condition: Not only must the system match the environment overall, but this matching must be properly organized in a specific way. This formulation allows for constraints among the environmental components to induce constraints among the system components.

Note that each environmental component represents the set of behaviors that are required by its corresponding system component, which, as shown in Figure 4, can correspond to multiple physical parts of the environment. Depending on how the system is connected to the environment—or depending on which constraints arising from interactions between the system and its environment are to be examined—the environmental components may need to be defined differently. Although it may not seem so at first, any interaction between a system and its environment can be formulated as above: If we start with a more general formulation in which each system component

x_{i}

interacts with environmental components

{\tilde{y}}_{i_{1}}, {\tilde{y}}_{i_{2}}, \dots, {\tilde{y}}_{i_{n_{i}}}

(i.e.,

\forall i, j, H (y_{i_{j}} | x_{i}) = 0

), which allows for each system component to interact with multiple environmental components and vice versa, we can redefine the environmental components such that each

x_{i}

is associated with the random variable

y_{i} \equiv ({\tilde{y}}_{i_{1}}, {\tilde{y}}_{i_{2}}, \dots, {\tilde{y}}_{i_{n_{i}}})

(see Figure 4 for an example). This redefinition may result in new entanglements among the environmental components.

Definition 2.

An environment

(Y, f)

for system X is a system Y together with a bijection

f : Y \to X

.

Definition 3.

A system X matches its environment

(Y, f)

iff

H (y | f (y)) = 0

for all

y \in Y

.

Example 1.

Consider a system of two thermostats,

x_{1}

in room 1 and

x_{2}

in room 2, that can each be either on or off. The environment can be described by two variables

y_{1}

and

y_{2}

that represent whether or not room 1 or room 2, respectively, should be heated (the bijection f in Definitions 2 and 3 mapping

y_{1}

to

x_{1}

and

y_{2}

to

x_{2}

). In order for the system to match the environment, it must be that

H (y_{1} | x_{1}) = H (y_{2} | x_{2}) = 0

. Thus, if the need for each room to be heated is independent of that of the other room, the thermostats must be able to operate independently of one another; likewise, if the two rooms’ needs for heat are correlated, the thermostats must also be correlated.

Example 2.

Consider a system X in which each

x_{i} \in X

represents some aspect of policy (e.g., educational policy) being applied in region i of a given country (the regions could, for instance, be towns/cities). The environment

(Y, f)

could be defined by the random variables

y_{i}

(where

f (y_{i}) = x_{i}

), such that each

y_{i}

corresponds to conditions in region i that require a distinct policy in order for the region to be effectively governed. If the

y_{i}

values vary independently of one another, while the

x_{i}

values cannot, then the system will not be able to match its environment (e.g., an education policy that is determined entirely at the national level will not be able to effectively interact with locales if each locale has specific educational needs). Conversely, if there are correlations among the

y_{i}

values that are lacking in the

x_{i}

values, the system will also be unable to match its environment (e.g., it would be ineffective for each city to independently set its own policy with respect to international trade or with respect to regulating a national corporation that spans many cities).

Note that the possible states of a system or the probabilities assigned to these states cannot be defined without specifying the environment with which the system is interacting, for the same system may behave differently in different environments. (Alternatively, each individual environment need not be treated separately; if each individual environment is assigned a probability, this ensemble of possible individual environments can itself be treated as a single environment of the system.) In either case, this formalism concerning a system matching its environment is purely descriptive and does not require the specification of the mechanism by which the system and the environment are related.

Example 3.

Returning to the example of the two thermostats, if the system (the thermostats) and the environment (the rooms) are connected so that the state of thermostat i depends directly on the state of room i, the thermostat states will have precisely as much correlation as the room states do. The thermostat states will be independent random variables if and only if the room states are.

With Definition 3, we have a characterization of Ashby’s law that takes into account the multi-scale structure of a system and its connection with its environment. The goal is then to understand how the properties of the environment constrain the corresponding properties of the system. If an environment has a certain property and it is known that the system matches the environment, what must be true about the system? For the single-scale case of Ashby’s law, the system must have at least as much information as the environment. The complexity profile, described below, generalizes this property to multiple scales. In particular, it allows us to formulate the multi-scale law of requisite variety: In order for a system to match its environment, it must have at least as much complexity as its environment at every scale.

3. Defining a Complexity Profile

The basic version of Ashby’s law states that for a system X to match its environment Y, the overall complexity of X must be greater than or equal to the overall complexity of Y. But, as argued in Section 2.3 of ref. [27], it does not make sense to speak of complexity as a single number but rather the complexity of a system must depend on its scale. Thus, we wish to generalize the notion of a complexity profile such that the complexity of a system and its environment can be compared at multiple scales.

Definition 4.

A complexity profile

C_{X} (n)

of a system X assigns a particular amount of information to the system at each scale

n \in Z^{+}

. For

n > | X |

, we define

C_{X} (n) = 0

. If we wish to consider each component of the system to be of size l, we can define a continuous version of the complexity profile (see Appendix B) as follows:

{\tilde{C}}_{X} (s) = C_{X} (⌈ s / l ⌉) .

(1)

We wish for a complexity profile to have two additional properties: It should (1) manifest the multi-scale law of requisite variety and (2) obey the sum rule. Each property is defined below, with applications/examples given in Sections 2.5 and 2.4 of ref. [27], respectively.

Definition 5.

A complexity profile manifests the multi-scale law of requisite variety if for any two systems X and Y, X matching Y (per Definition 3) implies that

C_{X} (n) \geq C_{Y} (n)

for all n.

Definition 6.

A complexity profile obeys the sum rule if for any system X,

\sum_{n = 1}^{\infty} C_{X} (n) = \sum_{x \in X} H (x)

.

The multi-scale law of requisite variety is important because it allows for the interpretation that a necessary (but not sufficient) condition for a system to effectively interact with its environment must be that it has at least as much complexity as the environment at every scale. The sum rule is important because it captures the intuition that for a system composed of components with the same individual behaviors, there is a tradeoff among the complexities of the system at various scales (since complexity at larger scales requires constraints among the system’s smaller-scale degrees of freedom).

Note that examining measures of multi-scale complexity can never prove that a system matches its environment—just as in the single-scale case, a system having more complexity than its environment by no means guarantees that every system state corresponds to a single environmental state (nor that the state adopted by the system will be appropriate). But examining multi-scale measures of information can prove the impossibility of compatibility. The goal, then, in formulating multi-scale measures is to create more instances in which the impossibility of compatibility can be shown. Using this multi-scale formalism, the system must now possess more complexity than its environment at all scales—not just more complexity than its environment overall. For instance, an army of ants may have more fine-grained complexity than its environment but will be able to perform certain tasks (e.g., moving large objects) only with larger-scale coordination between the ants.

4. A Class of Complexity Profiles

In Section 3, we have defined the term complexity profile and have described general properties that any complexity profile should have. We now describe a specific class of complexity profiles that satisfies these properties. This class of profiles is not the only such class and may not be the best one, but it serves as an instructive example and provides one useful way of characterizing multi-scale complexity.

One way to define a large-scale or coarse-grained description of a system is to allow only a subset of the components of the system to be described. (This coarse-graining scheme is analogous to the decimation approach for implementing the position–space renormalization group in physics.) As a first pass, one might divide the system into n equivalent disjoint subsets and then define the information in the description of the system at scale n to simply be the information in one of the subsets (see, e.g., Figure 5). However, given that the partition into n equivalent subsets may not be possible (either due to heterogeneity in the components or because the system size is not divisible by n), this definition can be generalized by averaging over the the information in each of the n subsets.

Example 4.

Consider a Markov chain

(x_{1}, x_{2}, x_{3}, \dots)

(for finite Markov chains of size N, simply let

x_{i} \equiv 0

for

i > N

). A set of disjoint, coarse-grained descriptions of the Markov chain at scale n could be

{(x_{1}, x_{1 + n}, x_{1 + 2 n}, x_{1 + 3 n}, \dots), (x_{2}, x_{2 + n}, x_{2 + 2 n}, x_{2 + 3 n}, \dots), \dots, (x_{n}, x_{2 n}, x_{3 n}, \dots)} .

Thus, the information at scale n of the Markov chain could be defined as

\frac{1}{n} \sum_{i = 1}^{n} H ({x_{j} | j \equiv i mod n}) .

(2)

Note, however, that this sequence of descriptions is not nested and so cannot be used in its entirety in Definitions 8 and 9.

First, we must define how to successively partition the system. We only allow for nested sequences of partitions so that larger-scale descriptions of the system cannot contain information that smaller-scale descriptions lack. The way in which a system is partitioned defines a sequence of descriptions of the system; different partitioning schemes can be thought of as different nested ontologies with which to create these successively coarser descriptions. This formulation allows for a general framework for describing a system at multiple scales, with successively larger-scale descriptions of a system corresponding to nested subsets of the system that are decreasing in size.

4.1. Definition

We now formally define this class of complexity profiles. To do so, we first build up some notation for defining nested sequences of partitions:

Definition 7.

Define

P = {P_{i}}_{i = 1}^{\infty}

to be a nested partition sequence of a set X if each

P_{i}

is a partition of X,

P_{i} \leq P_{j}

(i.e.,

P_{i}

is a refinement of

P_{j}

) whenever

i > j

, and

P_{i} < P_{j}

(i.e.,

P_{i}

is a strict refinement of

P_{j}

) whenever

| X | \geq i > j

.

Note that in order for the strict refinement clause of this definition to be satisfied (i.e., for

P_{i}

to have more parts than

P_{j}

whenever

| X | \geq i > j

), it must be that

P_{n}

contains n parts for

n \leq | X |

and

P_{n} = P_{| X |}

for

n > | X |

, since a partition of X cannot have more than

| X |

parts.

Example 5.

Let

X = {x_{1}, x_{2}, x_{3}, x_{4}}

. An example of a nested partition sequence of X is

P = (P_{1}, P_{2}, P_{3}, P_{4}, P_{5}, \dots) = ({{x_{1}, x_{2}, x_{3}, x_{4}}}, {{x_{1}, x_{3}}, {x_{2}, x_{4}}}, {{x_{1}, x_{3}}, {x_{2}}, {x_{4}}},

{{x_{1}}, {x_{2}}, {x_{3}}, {x_{4}}}, {{x_{1}}, {x_{2}}, {x_{3}}, {x_{4}}}, \dots)

.

Definition 8.

Given a nested partition sequence P of a system X, define

{\tilde{S}}_{X}^{P} (n) \equiv n S_{X}^{P} (n) \equiv \sum_{χ \in P_{n}} H (χ)

for

n \in Z^{+}

.

Note that

{\tilde{S}}_{X}^{P} (n)

is non-decreasing in n and captures the total (potentially overlapping) information of the system parts, while

S_{X}^{P} (n)

is the average amount of information necessary to describe one of the n parts. Information that is n-fold redundant (i.e., is of scale n) can be counted up to n times in

{\tilde{S}}_{X}^{P} (n)

—it is this fact that motivates the following definition of a complexity profile.

Definition 9.

Given a nested partition sequence P of a system X, the complexity profile

C_{X}^{P} (n) : Z^{+} \to [0, \infty)

is defined as

C_{X}^{P} (n) = {\tilde{S}}_{X}^{P} (n) - {\tilde{S}}_{X}^{P} (n - 1)

, with the convention that

{\tilde{S}}_{X}^{P} (0) = 0

.

Remark 2.

For

n = 1

,

C_{X}^{P} (n) = H (X)

. For

n > | X |

,

C_{X}^{P} (n) = 0

. And for

1 < n \leq | X |

,

C_{X}^{P} (n) = H (A) + H (B) - H (A, B) = I (A; B)

, where A and B are the two subsets of X that are elements of

P_{n}

but not of

P_{n - 1}

and where I denotes mutual information. Thus, this complexity profile is very computationally tractable.

Example 6.

Using the nested partition sequence given in Example 5 of

X = {x_{1}, \dots, x_{4}}

, if

x_{i}

are all unbiased bits,

x_{1} = x_{2}

, and

x_{1}

,

x_{3}

, and

x_{4}

are mutually independent, we have

{\tilde{S}}_{X}^{P} (1) = H (X) = 3

,

{\tilde{S}}_{X}^{P} (2) = H (x_{1}, x_{3}) + H (x_{2}, x_{4}) = 4

, and

{\tilde{S}}_{X}^{P} (n) = 4

for

n > 2

. Thus,

C_{X}^{P} (1) = 3

,

C_{X}^{P} (2) = 1

, and

C_{X}^{P} (n) = 0

for

n > 2

.

Example 7.

Consider a system X of N molecules, the velocities of which are independently drawn from a Maxwell–Boltzmann distribution for which the temperature T is, itself, a random variable. Consider a nested partitioning scheme in which at each step, the largest remaining part (or, in the case of a tie, one of the largest remaining parts) is divided as equally as possible in two. The resulting complexity profile will then have

C (1) = H (X)

and

C (n) = H (T)

for

1 < n < < N

, since for

n < < N

, the size of the parts will be large enough so that T can be almost precisely determined from any single part. As n approaches N, a measurement of any single part will yield more and more uncertainty regarding the value of T, so

C (n)

will slowly decay from

H (T)

to 0. Such a complexity profile captures the fact that at the smallest scale, there is a lot of information related to the microscopic details of each molecule, but at a wide range of larger intermediate scales, the information present is much smaller and roughly constant, arising only from the common large-scale influence that the temperature has across the system.

4.2. The Multi-Scale Law of Requisite Variety and the Sum Rule

This complexity profile roughly captures the notion of redundant information and will satisfy the properties described in Definitions 5 and 6 (as proved below). It is dependent on the particular set of partitions used—a reflection of the fact that there are multiple ways to coarse-grain a system—and, thus, will not capture the redundancies present in an absolute sense, as the complexity profile described in refs. [7,32] does. But that complexity profile, while it does obey the sum rule, does not manifest the multi-scale law of requisite variety. Thus, while it characterizes the information structure present in a system, it does not allow us to compare a system to its environment in a mathematically rigorous way. The class of complexity profiles considered here allows this comparison by requiring that the system and the environment be partitioned in the same way, breaking permutation symmetry and accounting for the correspondence between system and environmental components.

Theorem 1.

The multi-scale law of requisite variety. If a system X matches its environment

(Y, f)

, then for all the nested partition sequences P of X,

C_{X}^{P} (n) \geq C_{Y}^{P^{f}} (n)

at each scale n, where

P^{f}

is the corresponding nested partition sequence of Y (see Definition 10 below).

Proof.

See Appendix A. □

Definition 10.

Given a nested partition sequence P of a set X and a bijection

f : Y \to X

, define

P^{f}

to be the nested partition sequence of Y such that

\forall n \in Z^{+}

,

\forall y, y^{'} \in Y

, y and

y^{'}

belong to the same part of

P_{n}^{f}

iff

f (y)

and

f (y^{'})

belong to the same part of

P_{n}

.

One of the advantages of having the complexity profile depend on the partitioning scheme is that Theorem 1 holds for all possible nested partition sequences of the system, assuming its environment is partitioned in the same way. In other words, regardless of how the partitions are used to define the scale, the system must have at least as much complexity as its environment at all scales, so long as the scale is defined in the same way for the system and its environment.

Furthermore, not only must all possible complexity profiles of the system match the corresponding complexity profile of the environment but also all the possible complexity profiles of all the possible subsets of the system must match the corresponding complexity profile of the corresponding subset of the environment, as stated in the following corollary to Theorem 1. This is a powerful statement, since it implies that not only must the system have at least as much complexity as its environment at all scales but also subdivisions within the system must be aligned with the corresponding subdivisions within the environment (see Section 2.6 of ref. [27]).

Corollary 1.

Subdivision matching. Suppose a system X matches its environment

(Y, f)

. Then, for any subsets

Y^{'} \subset Y

and

X^{'} = f (Y^{'}) \subset X

,

C_{X^{'}}^{P} (n) \geq C_{Y^{'}}^{P^{f}} (n)

at each scale n for all the nested partition sequences P of

X^{'}

.

Proof.

Since X matches Y,

X^{'}

matches

Y^{'}

. Therefore, Theorem 1 applies to

X^{'}

and

Y^{'}

. □

We now state and prove the sum rule:

Theorem 2.

Sum rule. For any system X and all the nested partition sequences P of X,

\sum_{n = 1}^{\infty} C_{X}^{P} (n) = \sum_{x \in X} H (x)

.

Proof.

\sum_{n = 1}^{\infty} C_{X}^{P} (n) = {lim}_{n \to \infty} {\tilde{S}}_{X}^{P} (n) - {\tilde{S}}_{X}^{P} (0) = {\tilde{S}}_{X}^{P} (| X |) - 0 = \sum_{x \in X} H (x)

. □

Because the complexity at each scale measures the amount of additional information present when different parts of the system are considered separately, the sum of the complexity across all the scales will simply equal the total information present in the system when each component is considered independently of the rest. Thus, given fixed individual behaviors of the system components, there is a necessary tradeoff between complexity at larger and smaller scales, regardless of which partitioning scheme is used (i.e., regardless of how the scale is defined).

4.3. Choosing from Among the Partitioning Schemes

Because of the dependence on the partitioning scheme, Definition 9 defines a family of complexity profiles. That there is no single complexity profile for this definition can be thought of as a consequence of there being no single way to coarse-grain a system. In other words, implicit in any particular complexity profile of a system is a scheme for describing that system at multiple scales. While there is no such scheme that is “the correct scheme” in an absolute sense, for any particular purpose (and, often, for almost any conceivable purpose), some schemes are far better than others.

But before examining this question, we first consider a strong advantage of the multiplicity of complexity profiles: Theorem 1 applies to all of them. Thus, any complexity profile, regardless of the partitioning scheme, can potentially be used to show a multi-scale complexity mismatch between the system and environment. This is useful when one has information about the probability distributions of the system and environment separately, but not necessarily on the joint probability distribution of the system and environment together, such that one cannot directly determine whether the system matches the environment (since quantities such as

H (y | x)

would be unknown for any given system component x and environmental component y).

Assuming one knows which system components correspond to which environmental components, one can test for potential incompatibility between the system and environment by considering any nested partition sequence of any subset of the system and the corresponding subset of the environment, as per Theorem 1 and Corollary 1. Thus, a meaningful comparison of the system and environment can be made for a wide variety of complexity profiles, provided the definitions for the system and environment are consistent. In the likely case that the complexity profiles cannot be precisely calculated, this framework thus supports a wide variety of qualitative complexity profiles that one may wish to construct.

When the correspondence between system and environmental components is unknown, there are still ways in which to compare the system and the environment. For instance, if a system X matches its environment Y, then

max_{P} F (C_{X}^{P}) \geq max_{P} F (C_{Y}^{P})

(3)

for all functions F that map complexity profiles onto

R

and are non-decreasing in

C (n)

for all scales n. Thus, finding even a single function F for which Equation (3) does not hold is enough to show that X cannot possibly match Y, regardless of how they may be connected. Other such constructions that are independent of the bijection between X and Y are also possible.

However, although any partitioning scheme can be used to show a mismatch between a system and its environment, not all partitioning schemes are equally good choices for gaining an understanding of the structure of the system. Each part of a partition represents an approximation of the system by describing only that subset of its components, so if the purpose of the complexity profile is to characterize the structure of the system, the partitions should be chosen accordingly. For instance, for a system

{x_{1}, x_{2}, x_{3}, x_{4}}

, where

x_{1} = x_{2}

and

x_{3} = x_{4}

, partitioning the system into

{x_{1}, x_{2}}

and

{x_{3}, x_{4}}

does not make sense if the goal is to create a reasonably faithful two-component description of the four-component system.

As a heuristic, successive cuts in a nested partition sequence should cut through random variables with significant mutual information (i.e., significant redundancy), although, of course, taking a greedy algorithm (i.e., first maximizing complexity at scale 2 and then choosing the next partition to maximize complexity at scale 3, given the constraint that it has to be nested within the previous partition, and so on) may not always match the system’s structure. Nonetheless, this greedy algorithm does at least provide a consistent way to define complexity profiles across various systems such that complexity is decreasing with scale. (Formally, we can define this complexity profile using the nested partition sequence P that maximizes the complexity profile according to a “dictionary ordering”, i.e., that maximizes

\sum_{n = 1}^{\infty} M^{- n} C_{X}^{P} (n)

for any

M > C_{X}^{P} (1) = H (X)

. However, just because P maximizes the complexity profile for X according to this (or any other) metric does not guarantee that for an environment

(Y, f)

of X,

f (P)

will maximize the complexity profile of Y according to the same metric.)

Example 8.

Consider a system

X = {x_{1}, x_{2}, \dots, x_{8}}

such that

x_{1} = x_{2} = x_{3}

,

x_{4} = x_{5} = x_{6}

, and

x_{7} = x_{8}

, but otherwise, all the components are mutually independent (Figure 6). Then, intuitively, we expect that

C_{X} (1) = C_{X} (2) = H (x_{1}) + H (x_{4}) + H (x_{7})

,

C_{X} (3) = H (x_{1}) + H (x_{4})

, and

C_{X} (n) = 0

for

n > 3

. A nested partition sequence that gives us this complexity profile is

P = (P_{1}, P_{2}, P_{3}, \dots)

with

P_{1} = {X}

,

P_{2} = {{x_{1}, x_{4}, x_{7}}, {x_{2}, x_{3}, x_{5}, x_{6}, x_{8}}}

, and

P_{3} = {{x_{1}, x_{4}, x_{7}}, {x_{2}, x_{5}, x_{8}}, {x_{3}, x_{6}}}

, and where it does not matter which subsequent partitions are used, since each part of

P_{3}

contains mutually independent random variables.

Example 9.

Consider a two-dimensional

4 \times 4

grid of random variables in which two variables have nonzero mutual information conditioning on the rest if and only if they are adjacent. One way to partition the grid that respects its structure is given in Figure 7.

Example 10.

Consider a hierarchy consisting of seven individuals: a leader with two subordinates, each of which have two subordinates themselves, as depicted in Figure 8. The behavior of each individual is represented by three random variables, each with complexity c. Thus, if examined separately from the rest of the system, the complexity of each individual is

3 c

. On the left, everyone completely follows the leader, resulting in a complexity of

3 c

up to scale 7 regardless of the partitioning scheme. On the right, some information is transmitted down the hierarchy, but lower levels are also given some autonomy, resulting in more complexity at smaller scales but less at larger scales (but with the same area under the curve, consistent with the sum rule).

4.4. Combining Subsystems

If a system can be divided into independent subsystems, the complexity profile of the system as a whole can be written as the sum of the complexity profiles of each of the independent subsystems. And if a system can be divided into m subsystems that behave identically, its complexity profile will equal that of any one of the subsystems except with the scale axis stretched by a factor of m. These properties are made precise below.

Theorem 3.

The additivity of the complexity profiles of superimposed independent systems. Suppose two disjoint systems A and B are independent, i.e., the mutual information

I (A; B) = 0

, and let

C = A \cup B

. Consider any nested partition sequences

P^{A}

of A and

P^{B}

of B. Then, for all the nested partition sequences

P^{C}

that restrict to

P^{A}

on

A \subset C

and

P^{B}

on

B \subset C

,

C_{C}^{P^{C}} (n) = C_{A}^{P^{A}} (n) + C_{B}^{P^{B}} (n) .

(4)

In other words, the complexity profiles of independent subsystems add.

Proof.

This result follows from

P_{i}^{C}

restricting to

P_{i}^{A}

on A and

P_{i}^{B}

on B and the fact that for any subsets

A^{'} \subset A

and

B^{'} \subset B

,

H (A^{'} \cup B^{'}) = H (A^{'}) + H (B^{'})

. □

In order to formulate the second property, we first build up some notation in the following two definitions:

Definition 11.

For a system X and a positive integer m, let

m * X = \cup_{i = 1}^{m} X_{i}

, where the

X_{i}

are disjoint systems for which there exist bijections

f_{i} : X_{i} \to X

such that

\forall x \in X_{i}

,

H (x | f_{i} (x)) = H (f_{i} (x) | x) = 0

. In other words,

m * X

contains m identical copies of X, such that the behavior of any one copy completely determines the behaviors of all the others.

Definition 12.

Given a nested partition sequence P of a system X and a positive integer m, define the nested partition sequence

m * P

of the system

m * X = \cup_{i = 1}^{m} X_{i}

(with bijections

f_{i} : X_{i} \to X

) as follows: For

n \leq m

,

{(m * P)}_{n} \equiv {\cup_{i = n}^{m} X_{i}} \cup {X_{i} : i < n} = {X_{1}, X_{2}, \dots, X_{n - 1}, \cup_{i = n}^{m} X_{i}}

. For

n \geq m

, define

{(m * P)}_{n}

such that it restricts to

P_{n_{i}}^{f_{i}}

(see Definition 10) on each

X_{i} \subset m * X

, where

n_{i} = ⌈ n / m ⌉

if

i \leq (n mod m)

and

n_{i} = ⌊ n / m ⌋

otherwise.

Theorem 4.

The scale additivity of replicated systems. Let P be any nested partition sequence of a system X. Then,

C_{m * X}^{m * P} (n) = C_{X}^{P} (⌈ n / m ⌉) .

(5)

In other words, the effect of including m exact replicas of X is to stretch the scale axis of the complexity profile by a factor of m.

Proof.

This result follows from Definitions 11 and 12. □

Theorems 3 and 4 indicate that for any block-independent system, there exists a nested partition sequence that yields the same complexity profile as that given by the formalism in refs. [7,32]. (The formalism in refs. [7,32] is stated in ref. [32] to be the only such formalism that is a linear combination of entropies of subsets of the system, yields its results for block-independent systems, and is symmetric with respect to permutations of the components. The partition formalism in this paper does not contradict this statement, since any partitioning scheme will break this permutation symmetry for systems with greater than two components.)

5. Discussion

The motivation behind our analysis has been to construct a definition of a complexity profile for multi-component systems that obeys both the sum rule and a multi-scale version of the law of requisite variety. In order to do so, we first had to generalize the law of requisite variety to multi-component systems. We then created a formal definition for a complexity profile and defined two properties—the multi-scale law of requisite variety and the sum rule—that complexity profiles should satisfy. Finally, we constructed a class of examples of complexity profiles and proved that they satisfy these properties. We demonstrated their application to a few simple systems and showed how they behave when independent and dependent subsystems are combined.

This formalism is purely descriptive in that questions of causal influence and mechanism (i.e., what determines the states of each component) are not considered; rather, only the possible states of the system and its environment and correlations among these states are considered. (This approach is analogous to how statistical mechanics does not consider the Newtonian dynamics of individual gas molecules but rather only the probabilities of finding the gas in any given state.) By abstracting out notions of causality and mechanism, this approach allows for an understanding of the space of all the possible system behaviors and for the identification of systems that are doomed to failure, regardless of the mechanism. The way in which system and environmental components are mechanistically linked and the evolution and adaptability of complex systems over time are directions for future work.

More elegant profiles than those presented Section 4 may exist. One could also imagine complexity profiles that take advantage of some known structure of the systems under consideration; for instance, for systems that can be embedded into

R^{d}

, where d is far lower than the number of system components, Fourier methods could be explored. More broadly, the sum rule could be relaxed, allowing for other definitions of multi-scale complexity. Completely eliminating any tradeoff of complexity among scales would likely lead to under-constrained profiles—certainly, smaller-scale complexity must be reduced in order to create larger-scale structure. But one could imagine forms that this tradeoff may take other than

\sum_{n} C_{X} (n) = \sum_{x \in X} H (x)

. Even more broadly, other definitions of what it means for a system to match its environment could be considered, as long as some sort of multi-scale law of requisite variety is retained, so that the complexity profiles of multiple systems can be meaningfully compared.

With all that said, the profiles presented here are the first, to the best of our knowledge, to obey both a sum rule and multi-scale law of requisite variety. At the very least, these formalisms provide a formal grounding that can be used to support conceptual claims that are made using complexity profiles. Our hope is that these formalisms will spur further development in our understanding of the general properties of multi-component systems.

Author Contributions

Conceptualization, A.F.S. and Y.B.-Y.; Methodology, A.F.S.; Formal analysis, A.F.S.; Writing—original draft, A.F.S.; Writing—review & editing, A.F.S. and Y.B.-Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science Foundation’s Graduate Research Fellowship (under Grant No. 1122374), the Hertz Foundation, and the Long-Term Future Fund.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable.

Acknowledgments

We thank Alex Zhu and Robi Bhattacharjee for helpful discussions.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proofs

In order to prove Theorem 1, we first prove the following lemma.

Lemma A1.

If

H (B_{1} | A_{1}) = H (B_{2} | A_{2}) = 0

then

I (A_{1}; A_{2}) \geq I (B_{1}; B_{2})

.

Proof.

\begin{matrix} I (A_{1}; A_{2}) = H (A_{1}) + H (A_{2}) - H (A_{1}, A_{2}) = \\ H (B_{1}) + H (A_{1} | B_{1}) - H (B_{1} | A_{1}) + H (B_{2}) + H (A_{2} | B_{2}) - H (B_{2} | A_{2}) \\ - H (B_{1}, B_{2}) - H (A_{1}, A_{2} | B_{1}, B_{2}) + H (B_{1}, B_{2} | A_{1}, A_{2}) = \\ I (B_{1}; B_{2}) + H (A_{1} | B_{1}) + H (A_{2} | B_{2}) - H (A_{1}, A_{2} | B_{1}, B_{2}) = \\ I (B_{1}; B_{2}) + I (A_{1}; A_{2} | B_{1}; B_{2}) + I (A_{1}; B_{2} | B_{1}) + I (A_{2}; B_{1} | B_{2}) \geq I (B_{1}; B_{2}) \end{matrix}

□

We now prove Theorem 1: Multi-scale law of requisite variety. If a system X matches its environment

(Y, f)

, then for all nested partition sequences P of X,

C_{X}^{P} (n) \geq C_{Y}^{P^{f}} (n)

at each scale n.

Proof.

For any collections of random variables

A = {a_{1}, \dots, a_{N}}

and

B = {b_{1}, \dots, b_{N}}

,

\forall i H (b_{i} | a_{i}) = 0

implies that

0 \leq H (B | A) \leq \sum_{i} H (b_{i} | A) \leq \sum_{i} H (b_{i} | a_{i}) = 0

and thus

H (B | A) = 0

. Using this fact together with Remark 2 and Lemma A1, we obtain

C_{X}^{P} (n) \geq C_{Y}^{P^{f}} (n)

for

n \in Z^{+}

. Note that X and Y being partitioned in the same way guarantees that for any subset

A \subset X

and the corresponding subset

B \subset Y

,

H (B | A) = 0

. □

Appendix B. Continuum Limit

Complexity profiles can also be defined for continuous systems.

Definition A1.

We define a continuous system X of size L as a sequence of discrete systems

{X_{i}}_{i = 1}^{\infty}

with components of size

l_{i} \equiv L / | X_{i} |

such that

X_{i} \subset X_{j}

whenever

i < j

and

{lim}_{i \to \infty} l_{i} = 0

. Then, the complexity profile for the continuous system (X) is defined as

C_{X} (s) \equiv lim_{i \to \infty} {\tilde{C}}_{X_{i}} (s) = lim_{i \to \infty} C_{X_{i}} (⌈ s / l_{i} ⌉),

(A1)

provided such a limit exists, where

{\tilde{C}}_{X_{i}} (s)

is defined in Equation (1).

Remark A1.

Note that any discrete system X with complexity profile

C_{X} (n)

and components of size l can be considered as a continuous system of size

| X | l

per Definition A1 by defining the systems

{X_{i}}_{i = 1}^{\infty}

(with components of size

l / i

) such that

X_{i} = i * X

(Definition 11) and

{\tilde{C}}_{X_{i}} (s) \equiv {\tilde{C}}_{X} (s) = C_{X} (⌈ s / l ⌉) = C_{X_{i}} (⌈ i s / l ⌉) .

(A2)

(see Equation (1)).

Example A1.

Suppose that the continuous system X of size L is a random continuous function

f (x)

for

x \in [0, L]

. Define

X_{i} = {f (L / 2^{i}), f (2 L / 2^{i}), f (3 L / 2^{i}), \dots, f (2^{i} L / 2^{i})}

so that

X_{i}

has

2^{i}

components, each of scale

L / 2^{i}

. Then, X can be described by the sequence

{X_{i}}_{i = 1}^{\infty}

.

We can extend the class of complexity profiles defined in Section 4 as follows:

For any nested partition sequence P of a discrete system X with components of size l, the complexity profiles in Equation (A2) can be realized by letting

C_{X} (n) = C_{X}^{P} (n)

and

C_{X_{i}} (n) = C_{X_{i}}^{i * P} (n)

(see Definition 12), since by Theorem 4,

C_{X_{i}}^{i * P} (⌈ i s / l ⌉) = C_{X}^{P} (⌈ s / l ⌉) .

(A3)

To define a partition-based complexity profile using Equation (A1) for a continuous system X (defined by an infinite sequence of discrete systems

X_{1} \subset X_{2} \subset X_{3} \subset X_{4} \dots

, as per Definition A1), a nested partition sequence

P^{i}

must be chosen for each

X_{i}

. Of course, these nested partition sequences must be chosen so that the limit in Equation (A1) exists; for consistency, it can also be required that on each

X_{i} \subset X_{j}

, each partition

P_{n}^{j}

of

X_{j}

restricts to

P_{m}^{i}

on

X_{i}

for some

m \leq n

.

References

Gell-Mann, M. What is complexity? In Complexity and Industrial Clusters: Dynamics and Models in Theory and Practice; Springer: Berlin/Heidelberg, Germany, 2002; pp. 13–24. [Google Scholar]
Adami, C. What is complexity? BioEssays 2002, 24, 1085–1094. [Google Scholar] [CrossRef]
Crutchfield, J.P. Between order and chaos. Nat. Phys. 2012, 8, 17–24. [Google Scholar] [CrossRef]
Lineweaver, C.H.; Davies, P.C.; Ruse, M. Complexity and the Arrow of Time; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Bar-Yam, Y. Dynamics of Complex Systems; Addison-Wesley: Boston, MA, USA, 1997. [Google Scholar]
Ay, N.; Olbrich, E.; Bertschinger, N.; Jost, J. A Unifying Framework for Complexity Measures of Finite Systems; Working Paper 2006-08-028; Santa Fe Institute: Santa Fe, NM, USA, 2006. [Google Scholar]
Allen, B.; Stacey, B.; Bar-Yam, Y. Multiscale information theory and the marginal utility of information. Entropy 2017, 19, 273. [Google Scholar] [CrossRef]
Binder, P.M.; Plazas, J.A. Multiscale analysis of complex systems. Phys. Rev. E 2001, 63, 065203. [Google Scholar] [CrossRef]
Ahmed, M.U.; Mandic, D.P. Multivariate multiscale entropy: A tool for complexity analysis of multichannel data. Phys. Rev. E 2011, 84, 061918. [Google Scholar] [CrossRef]
Wu, S.D.; Wu, C.W.; Lin, S.G.; Wang, C.C.; Lee, K.Y. Time series analysis using composite multiscale entropy. Entropy 2013, 15, 1069–1084. [Google Scholar] [CrossRef]
Humeau-Heurtier, A. The multiscale entropy algorithm and its variants: A review. Entropy 2015, 17, 3110–3123. [Google Scholar] [CrossRef]
Zunino, L.; Soriano, M.C.; Rosso, O.A. Distinguishing chaotic and stochastic dynamics from time series by using a multiscale symbolic approach. Phys. Rev. E 2012, 86, 046210. [Google Scholar] [CrossRef] [PubMed]
He, S.; Li, C.; Sun, K.; Jafari, S. Multivariate multiscale complexity analysis of self-reproducing chaotic systems. Entropy 2018, 20, 556. [Google Scholar] [CrossRef]
Costa, M.; Goldberger, A.L.; Peng, C.K. Multiscale entropy analysis of complex physiologic time series. Phys. Rev. Lett. 2002, 89, 068102. [Google Scholar] [CrossRef] [PubMed]
Costa, M.; Peng, C.K.; Goldberger, A.L.; Hausdorff, J.M. Multiscale entropy analysis of human gait dynamics. Phys. A Stat. Mech. Its Appl. 2003, 330, 53–60. [Google Scholar] [CrossRef] [PubMed]
Costa, M.; Goldberger, A.L.; Peng, C.K. Multiscale entropy analysis of biological signals. Phys. Rev. E 2005, 71, 021906. [Google Scholar] [CrossRef]
Catarino, A.; Churches, O.; Baron-Cohen, S.; Andrade, A.; Ring, H. Atypical EEG complexity in autism spectrum conditions: A multiscale entropy analysis. Clin. Neurophysiol. 2011, 122, 2375–2383. [Google Scholar] [CrossRef]
Liang, S.F.; Kuo, C.E.; Hu, Y.H.; Pan, Y.H.; Wang, Y.H. Automatic stage scoring of single-channel sleep EEG by using multiscale entropy and autoregressive models. IEEE Trans. Instrum. Meas. 2012, 61, 1649–1657. [Google Scholar] [CrossRef]
Wang, J.; Shang, P.; Zhao, X.; Xia, J. Multiscale entropy analysis of traffic time series. Int. J. Mod. Phys. C 2013, 24, 1350006. [Google Scholar] [CrossRef]
Yin, Y.; Shang, P. Multivariate multiscale sample entropy of traffic time series. Nonlinear Dyn. 2016, 86, 479–488. [Google Scholar] [CrossRef]
Martina, E.; Rodriguez, E.; Escarela-Perez, R.; Alvarez-Ramirez, J. Multiscale entropy analysis of crude oil price dynamics. Energy Econ. 2011, 33, 936–947. [Google Scholar] [CrossRef]
Alvarez-Ramirez, J.; Rodriguez, E.; Alvarez, J. A multiscale entropy approach for market efficiency. Int. Rev. Financ. Anal. 2012, 21, 64–69. [Google Scholar] [CrossRef]
Yin, Y.; Shang, P. Weighted multiscale permutation entropy of financial time series. Nonlinear Dyn. 2014, 78, 2921–2939. [Google Scholar] [CrossRef]
Faes, L.; Marinazzo, D.; Stramaglia, S. Multiscale information decomposition: Exact computation for multivariate Gaussian processes. Entropy 2017, 19, 408. [Google Scholar] [CrossRef]
Li, Q.; Zuntao, F. Permutation entropy and statistical complexity quantifier of nonstationarity effect in the vertical velocity records. Phys. Rev. E 2014, 89, 012905. [Google Scholar] [CrossRef]
Murayama, S.; Kinugawa, H.; Tokuda, I.T.; Gotoda, H. Characterization and detection of thermoacoustic combustion oscillations based on statistical complexity and complex-network theory. Phys. Rev. E 2018, 97, 022223. [Google Scholar] [CrossRef]
Siegenfeld, A.F.; Bar-Yam, Y. An introduction to complex systems science and its applications. Complexity 2020, 2020, 6105872. [Google Scholar] [CrossRef]
Anderson, P.W. More is different: Broken symmetry and the nature of the hierarchical structure of science. Science 1972, 177, 393–396. [Google Scholar] [CrossRef]
Kwapień, J.; Drożdż, S. Physical approach to complex systems. Phys. Rep. 2012, 515, 115–226. [Google Scholar] [CrossRef]
Ashby, W.R. An Introduction to Cybernetics; Chapman & Hall Ltd.: Boca Raton, FL, USA, 1961. [Google Scholar]
Norman, J.; Bar-Yam, Y. Special Operations Forces as a Global Immune System. In Evolution, Development and Complexity; Georgiev, G.Y., Smart, J.M., Flores Martinez, C.L., Price, M.E., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 367–379. [Google Scholar]
Bar-Yam, Y. Multiscale complexity/entropy. Adv. Complex Syst. 2004, 7, 47–63. [Google Scholar] [CrossRef]
Bar-Yam, Y. Multiscale variety in complex systems. Complexity 2004, 9, 37–45. [Google Scholar] [CrossRef]
Bar-Yam, Y. A mathematical theory of strong emergence using multiscale variety. Complexity 2004, 9, 15–24. [Google Scholar] [CrossRef]
Rosenkranz, C.; Holten, R. The variety engineering method: Analyzing and designing information flows in organizations. Inf. Syst. E-Bus. Manag. 2011, 9, 11–49. [Google Scholar] [CrossRef]
McKelvey, B.; Lichtenstein, B.B.; Andriani, P. When organisations and ecosystems interact: Toward a law of requisite fractality in firms. Int. J. Complex. Leadersh. Manag. 2012, 2, 104–136. [Google Scholar] [CrossRef]
Gorod, A.; Hallo, L. Towards an Evolving Toolbox in Complex Systems Management. In Proceedings of the 2019 IEEE International Systems Conference (SysCon), Orlando, FL, USA, 8–11 April 2019; pp. 1–6. [Google Scholar] [CrossRef]
Ryan, A. About the bears and the bees: Adaptive responses to asymmetric warfare. In Unifying Themes in Complex Systems: Proceedings of the Sixth International Conference on Complex Systems; Springer: Berlin/Heidelberg, Germany, 2008; pp. 588–595. [Google Scholar]
Galway, D.; Pieris, G.; Fusina, G. Tasking system capabilities modeling using the complexity profile. In Proceedings of the 2011 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), Paris, France, 11–15 April 2011; pp. 102–106. [Google Scholar]
Huang, S.; Siegenfeld, A.F.; Gelman, A. How Democracies Polarize: A Multilevel Perspective. arXiv 2022, arXiv:2211.01249. [Google Scholar]
Alexiou, K. Understanding design as a multiagent coordination process: Distribution, complexity, and emergence. Environ. Plan. B Plan. Des. 2011, 38, 248–266. [Google Scholar] [CrossRef]
Mahmoodi, K.; West, B.J.; Grigolini, P. Complexity matching and requisite variety. arXiv 2018, arXiv:1806.08808. [Google Scholar]
Salas, I.; Abades, S. Social Complex Systems as Multiscale Phenomena: From the Genome to Animal Societies. In Proceedings of the COMPLEXIS, Online, 24–25 April 2021; pp. 100–106. [Google Scholar]
DeRosa, J.K.; McCaughin, L.K. Combined systems engineering and management in the evolution of complex adaptive systems. In Proceedings of the 2007 1st Annual IEEE Systems Conference, Honolulu, HI, USA, 9–13 April 2007; pp. 1–8. [Google Scholar]
Gershenson, C. The sigma profile: A formal tool to study organization and its evolution at multiple scales. Complexity 2011, 16, 37–44. [Google Scholar] [CrossRef][Green Version]
Stacey, B.C. Multiscale structure in eco-evolutionary dynamics. arXiv 2015, arXiv:1509.02958. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]

Figure 1. The average length of description necessary to specify a system’s state depends on its number of possible distinct behaviors. Which behaviors are distinct depends on the scale/level of detail at which the system is being described. At the smallest scale, the system depicted has many possible states, but many distinguishable smaller-scale states can all correspond to a single larger-scale state.

Figure 2. An illustration of Ashby’s law of requisite variety. (a) Because the system has fewer states (i.e., lower complexity) than its environment, it is impossible for the system to have a distinct response to each of the four environmental states. (b) Here, the system is able to have a distinct response to each environmental state; a necessary (but not sufficient) condition for this matching is that the system’s complexity equals or exceeds its environment’s. Image source: ref. [31].

Figure 3. Accounting for components of various sizes. All the systems can be described by components that are all of the same size. For instance, two systems

{{\tilde{a}}_{1}, {\tilde{a}}_{2}}

and

{{\tilde{b}}_{1}, {\tilde{b}}_{2}, {\tilde{b}}_{3}}

that contain components of sizes

2 s

,

1.5 s

, and

1 s

(where the units of s depend on the notion of the size being used) can be reformulated in terms of components

{a_{1}, \dots, a_{7}}

and

{b_{1}, \dots, b_{8}}

that all have size

0.5 s

, where

a_{1} = a_{2} = a_{3} = a_{4} = {\tilde{a}}_{1}

,

a_{5} = a_{6} = a_{7} = {\tilde{a}}_{2}

,

b_{1} = b_{2} = b_{3} = b_{4} = {\tilde{b}}_{1}

,

b_{5} = b_{6} = {\tilde{b}}_{2}

, and

b_{7} = b_{8} = {\tilde{b}}_{3}

.

Figure 3. Accounting for components of various sizes. All the systems can be described by components that are all of the same size. For instance, two systems

{{\tilde{a}}_{1}, {\tilde{a}}_{2}}

and

{{\tilde{b}}_{1}, {\tilde{b}}_{2}, {\tilde{b}}_{3}}

that contain components of sizes

2 s

,

1.5 s

, and

1 s

(where the units of s depend on the notion of the size being used) can be reformulated in terms of components

{a_{1}, \dots, a_{7}}

and

{b_{1}, \dots, b_{8}}

that all have size

0.5 s

, where

a_{1} = a_{2} = a_{3} = a_{4} = {\tilde{a}}_{1}

,

a_{5} = a_{6} = a_{7} = {\tilde{a}}_{2}

,

b_{1} = b_{2} = b_{3} = b_{4} = {\tilde{b}}_{1}

,

b_{5} = b_{6} = {\tilde{b}}_{2}

, and

b_{7} = b_{8} = {\tilde{b}}_{3}

.

Figure 4. Defining the environmental components. Regardless of the interactions between a system and its environment, environmental components can always be defined such that they have a one-to-one relationship with system components. For instance, suppose that for the system

{x_{1}, x_{2}, x_{3}}

to effectively interact with its environment

{{\tilde{y}}_{1}, {\tilde{y}}_{2}, {\tilde{y}}_{3}, {\tilde{y}}_{4}}

,

x_{1}

must have a distinct response for each possible state of

{\tilde{y}}_{1}

and

{\tilde{y}}_{3}

(i.e.,

H ({\tilde{y}}_{1} | {\tilde{x}}_{1}) = 0

and

H ({\tilde{y}}_{3} | {\tilde{x}}_{1}) = 0

),

x_{2}

must have a distinct response for each possible state of

{\tilde{y}}_{1}

(i.e.,

H ({\tilde{y}}_{1} | x_{2}) = 0

), and

x_{3}

must have a distinct response for each possible state of

{\tilde{y}}_{2}

,

{\tilde{y}}_{3}

, and

{\tilde{y}}_{4}

(i.e.,

H ({\tilde{y}}_{2} | x_{3}) = 0

,

H ({\tilde{y}}_{3} | x_{3}) = 0

, and

H ({\tilde{y}}_{4} | x_{3}) = 0

). If we define new environmental components

y_{1} \equiv ({\tilde{y}}_{1}, {\tilde{y}}_{3})

,

y_{2} \equiv {\tilde{y}}_{1}

, and

y_{3} \equiv ({\tilde{y}}_{2}, {\tilde{y}}_{3}, {\tilde{y}}_{4})

, we see that

x_{1}

must react only to

y_{1}

,

x_{2}

only to

y_{2}

, and

x_{3}

only to

y_{3}

, since the original constraints are equivalent to the constraints

H (y_{i} | x_{i}) = 0

for

i \in {1, 2, 3}

.

Figure 4. Defining the environmental components. Regardless of the interactions between a system and its environment, environmental components can always be defined such that they have a one-to-one relationship with system components. For instance, suppose that for the system

{x_{1}, x_{2}, x_{3}}

to effectively interact with its environment

{{\tilde{y}}_{1}, {\tilde{y}}_{2}, {\tilde{y}}_{3}, {\tilde{y}}_{4}}

,

x_{1}

must have a distinct response for each possible state of

{\tilde{y}}_{1}

and

{\tilde{y}}_{3}

(i.e.,

H ({\tilde{y}}_{1} | {\tilde{x}}_{1}) = 0

and

H ({\tilde{y}}_{3} | {\tilde{x}}_{1}) = 0

),

x_{2}

must have a distinct response for each possible state of

{\tilde{y}}_{1}

(i.e.,

H ({\tilde{y}}_{1} | x_{2}) = 0

), and

x_{3}

must have a distinct response for each possible state of

{\tilde{y}}_{2}

,

{\tilde{y}}_{3}

, and

{\tilde{y}}_{4}

(i.e.,

H ({\tilde{y}}_{2} | x_{3}) = 0

,

H ({\tilde{y}}_{3} | x_{3}) = 0

, and

H ({\tilde{y}}_{4} | x_{3}) = 0

). If we define new environmental components

y_{1} \equiv ({\tilde{y}}_{1}, {\tilde{y}}_{3})

,

y_{2} \equiv {\tilde{y}}_{1}

, and

y_{3} \equiv ({\tilde{y}}_{2}, {\tilde{y}}_{3}, {\tilde{y}}_{4})

, we see that

x_{1}

must react only to

y_{1}

,

x_{2}

only to

y_{2}

, and

x_{3}

only to

y_{3}

, since the original constraints are equivalent to the constraints

H (y_{i} | x_{i}) = 0

for

i \in {1, 2, 3}

.

Figure 5. Larger-scale/coarse-grained descriptions. Consider a system with eight components. The full description (scale 1) consists of all eight components. A coarse-grained description (scale 2) might consist of every other component, which can serve as an approximation for the system as a whole. A further coarse-grained description (scale 4) might consist of every other component of the scale-2 description.

Figure 6. (a) The first and second cuts necessary to create the first three partitions discussed in Example 8 are shown, together with (b) the resulting complexity profile (made continuous via Equation (1)) if

H (x) = 1

for each

x \in X

.

Figure 6. (a) The first and second cuts necessary to create the first three partitions discussed in Example 8 are shown, together with (b) the resulting complexity profile (made continuous via Equation (1)) if

H (x) = 1

for each

x \in X

.

Figure 7. Cuts that will create a nested partition sequence for the

4 \times 4

grid of random variables described in Example 9 are shown. (a) The first three cuts partition the grid into four parts with four components each. (b) Each of the four parts (labeled by

n \in {1, 2, 3, 4}

) are then subsequently partitioned. The resulting complexity profile will, of course, depend on the nature of the random variables and their correlations.

Figure 7. Cuts that will create a nested partition sequence for the

4 \times 4

grid of random variables described in Example 9 are shown. (a) The first three cuts partition the grid into four parts with four components each. (b) Each of the four parts (labeled by

n \in {1, 2, 3, 4}

) are then subsequently partitioned. The resulting complexity profile will, of course, depend on the nature of the random variables and their correlations.

Figure 8. The hierarchies discussed in Example 10 are shown, together with the resulting complexity profiles (made continuous via Equation (1)) for

c = 1

. The complexity profile in (c) can be obtained from any nested partition sequence for the hierarchy in (a). The complexity profile in (d) can be obtained from any nested partition sequence for which the first three partitions are given by the displayed cuts in the hierarchy in (b).

Figure 8. The hierarchies discussed in Example 10 are shown, together with the resulting complexity profiles (made continuous via Equation (1)) for

c = 1

. The complexity profile in (c) can be obtained from any nested partition sequence for the hierarchy in (a). The complexity profile in (d) can be obtained from any nested partition sequence for which the first three partitions are given by the displayed cuts in the hierarchy in (b).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Siegenfeld, A.F.; Bar-Yam, Y. A Formal Definition of Scale-Dependent Complexity and the Multi-Scale Law of Requisite Variety. Entropy 2025, 27, 835. https://doi.org/10.3390/e27080835

AMA Style

Siegenfeld AF, Bar-Yam Y. A Formal Definition of Scale-Dependent Complexity and the Multi-Scale Law of Requisite Variety. Entropy. 2025; 27(8):835. https://doi.org/10.3390/e27080835

Chicago/Turabian Style

Siegenfeld, Alexander F., and Yaneer Bar-Yam. 2025. "A Formal Definition of Scale-Dependent Complexity and the Multi-Scale Law of Requisite Variety" Entropy 27, no. 8: 835. https://doi.org/10.3390/e27080835

APA Style

Siegenfeld, A. F., & Bar-Yam, Y. (2025). A Formal Definition of Scale-Dependent Complexity and the Multi-Scale Law of Requisite Variety. Entropy, 27(8), 835. https://doi.org/10.3390/e27080835

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Formal Definition of Scale-Dependent Complexity and the Multi-Scale Law of Requisite Variety

Abstract

1. Introduction

2. Generalizing Ashby’s Law to Multiple Components

3. Defining a Complexity Profile

4. A Class of Complexity Profiles

4.1. Definition

4.2. The Multi-Scale Law of Requisite Variety and the Sum Rule

4.3. Choosing from Among the Partitioning Schemes

4.4. Combining Subsystems

5. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proofs

Appendix B. Continuum Limit

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI