Open Access
This article is

- freely available
- re-usable

*Entropy*
**2017**,
*19*(4),
181;
https://doi.org/10.3390/e19040181

Article

Citizen Science and Topology of Mind: Complexity, Computation and Criticality in Data-Driven Exploration of Open Complex Systems

Sony Computer Science Laboratories, Inc., Takanawa Muse Bldg. 3F, 3-14-13, Higashi Gotanda, Shinagawa-ku, Tokyo 141-0022, Japan

Academic Editor:
J. A. Tenreiro Machado

Received: 30 December 2016 / Accepted: 20 April 2017 / Published: 22 April 2017

## Abstract

**:**

Recently emerging data-driven citizen sciences need to harness an increasing amount of massive data with varying quality. This paper develops essential theoretical frameworks, example models, and a general definition of complexity measure, and examines its computational complexity for an interactive data-driven citizen science within the context of guided self-organization. We first define a conceptual model that incorporates the quality of observation in terms of accuracy and reproducibility, ranging between subjectivity, inter-subjectivity, and objectivity. Next, we examine the database’s algebraic and topological structure in relation to informational complexity measures, and evaluate its computational complexities with respect to an exhaustive optimization. Conjectures of criticality are obtained on the self-organizing processes of observation and dynamical model development. Example analysis is demonstrated with the use of biodiversity assessment database—the process that inevitably involves human subjectivity for management within open complex systems.

Keywords:

inter-subjective objectivity; complexity measure; computational complexity; criticality; citizen science; open complex system## 1. Introduction

Recent innovation of information and communication technologies (ICT) embedded in real environments is drastically changing the way society interacts with computation. This has been described as the fourth industrial revolution [1]. In particular, ubiquitous sensors and mobile communication tools have led to an increasing capacity of distributed and interactive environmental sensing. These technological supports bring in new effective methodologies to tackle complex self-organising behaviours in social–ecological systems that are difficult to understand with conventional modelling and simulation approaches (e.g., [2,3]). Massive amounts of sparse and heterogenous data that are based on the internal observation from within various collective phenomena call for an extended analytical framework, ranging from objective measurements (e.g., with sensors) to subjective data such as human evaluations and feedbacks.

Redefining a standard formalization of computation and its complexity that are associated with self-organised citizen science can raise multiple criteria for the evaluation of critical phenomena, spread over the dynamical process of observation, management, and knowledge formation in open complex systems [4,5]. Self-organised criticality appears in various natural and social phenomena, often with scale-free statistical properties [6,7]. They manifest in the power law, which can be reduced to a simple combination of inherent stochastic processes [8], and whose realizations provide proxies of emergent functionality (e.g., [9,10,11]). The large fluctuation of the power law distributes the statistical complexity in multiple scales that cannot be represented by a simple mean value for predictive purposes. The sampling time series from a power-law distribution encounters intermittent shifts of the sample average due to the infinite variance of distribution—even with the upper-bounded power law in the real world (e.g., in the magnitude distribution of earthquakes). This situation addresses a statistical limit of prediction solely by the modelling and simulation of the phenomena, but also presents a positive reason to engage human elements as a practical solution in actual management—especially those involving semantic and cognitive judgements [12,13]. On the technology side, machine learning models have long been attempting to optimize the prediction of unknown stochastic sources, implementing interactive estimation processes to exploit the hidden causal structure from temporal observation sequences (e.g., [14]). Modelling studies of guided self-organization have recently been explored with implementation in robotics, simulated neural networks, and networks of agents, etc. [15]. Although most of the achievement is discussed within the predictability of a confined experimental setting, a hybrid system with the synergy of human and computation elements always lies as a premise of real-world situation, which has been little exploited, except for some prototypical interfaces for the internet of things (e.g., [16]). For a cost-effective monitoring and control within restricted resources, guided criticality should be introduced to the user side of technology, in order to migrate and abstract decision making process from computation to human ability [3,4,17].

In particular, in solving global agendas such as sustainability goals, a comprehensive approach is required that should make use of the full potential of self-organisation in coupled social–ecological systems [5,18,19]. These efforts practically take on the engagement of citizens and multi-disciplinary stakeholders as important actors in the data acquisition and the implementation of an interactive management through guided self-organization, as a novel type of collective intelligence in the era of the fourth industrial revolution [3,20,21].

In facing the transition of data-driven citizen science towards the achievement of dynamical control in managing real-world open complex systems, this article raises fundamental theories and example models to support the discussion of complexity, computation, and criticality in its most possible general form. We formalize the basic objectives as follows, which are exploited in the subsequent sections with the corresponding numbers:

- Section 2: How can we formalize and treat the databases of varying quality from both machine and human observations, which range from subjective bias to objective fact? How can we set up scientific measures that should assure the compatibility with the principles of accuracy and reproducibility ?
- Section 3: How can we generalize the concept of complexity measures in application to the human–computer hybrid systems in citizen science?
- Section 4: What is the nature of computational complexities in actual data processing?
- Section 5: What is the general condition to yield guided self-organization for cost-effective citizen science?

Although these questions are universal in multiple industries, a common basis of understanding the problems and mutual development of ICT infrastructure are still isolated and developed independently in each sector. Throughout the exploration of these topics, this paper attempts to provide a common terminology and establish a theoretical basis for the realisation of a cost-effective citizen science in open complex systems situations. This is becoming increasingly important for solving transdisciplinary problems through the participation of multiple stakeholders in the real world [5].

## 2. Inter-Subjective Objectivity Model

We first consider the expression of the quality of data ranging between human subjectivity and machine objectivity in the general form of database $\mathbb{X}$. As a premise, any information that can be represented in digital computing is compatible with the natural number theory. At the infinite limit of computational memory, the representation of the database extends to general sets on a real data type with countably infinite precision, which accepts the definition of $\sigma $-finite measure in a measure-theoretical formulation. We define the general form of arbitrary database $\mathbb{X}$ as follows:
where $\mathbb{R}$ is a real data type, ${\mathbb{S}}^{m}$ is the m sets ${\{{S}_{i}\}}_{i=1,2,\cdots ,m}$ of arbitrary symbolic set ${S}_{i}=\{{s}_{1},{s}_{2},\cdots ,{s}_{{l}_{i}}\}$, with the dimensions n, m, and ${l}_{i}$ as natural numbers $\mathbb{N}$ including 0. Any variable in this article takes the assumption that it can be stored in $\mathbb{X}$. For mathematical simplicity, we hereafter consider the real data type $\mathbb{R}$ as a real number. In practice, ${\mathbb{R}}^{n}$ describes the values of n real variables (such as time, spatial coordinates, probabilities, etc.), and ${\mathbb{S}}^{m}$ represents m discrete sets of symbols (such as the name of variables, occurrence of discrete variables, text data, etc.). Obviously, ${\mathbb{S}}^{m}\subseteq {\mathbb{R}}^{m}$ holds in mathematical simplification, but we separate the notations to distinguish between the quantitative and qualitative variable types.

$$\begin{array}{c}\hfill \mathbb{X}={\mathbb{R}}^{n}\times {\mathbb{S}}^{m}\phantom{\rule{4pt}{0ex}}(n,m\in \mathbb{N}).\end{array}$$

#### 2.1. Formalization of Subjectivity, Inter-Subjectivity, Subjective–Objective Unity, and Objectivity

Digital data $\mathbb{X}$ from citizen science vary from subjective human perception to objective sensor measurement with a different degree of human-induced bias. Here, the subjectivity and objectivity matter because they influence the accuracy and reproducibility of data that is fundamental to establishing scientific analysis. We formalize the nature of observation variables between the subjectivity, objectivity, and these interactions as follows:

- Subjectivity is the quality of observation that is based on human perception without the substantial support of a machine.
- Inter-Subjectivity is the degree of commonality between the subjectivities of multiple subjects.
- Objectivity is the quality of observation that is based on a machine measurement whose consequence does not depend on the operator’s will.
- Subjective–Objective Unity is the degree of commonality between the subjectivity and objectivity.
- Inter-Subjective Objectivity is the quality of observation that satisfies the coincidence of both inter-subjectivity and subjective–objective unity.

These follow basic concepts in philosophy and social science and are adapted to the situation of data analysis. The concept of subjectivity is commonly used in philosophy as the collection of the perceptions, experiences, expectations, personal or cultural understanding, and beliefs specific to a person, which influences, informs, and is biased towards people’s judgments and evaluations. In contrast, objectivity refers to a view of truth or reality which is free from any individual’s influence [22]. The most simplistic form of inter-subjectivity in social science employs the term in the sense of having a shared definition of an object, or shared subjectivity [23].

The relations between these classifications are shown in Figure 1a. For example, text data written by humans are subjective data whether the fact described is based on an objective phenomenon or not. Sensor logs are objective data, even measured on a human body such as heart rate that could be influenced by subjective thought. When multiple subjects give the same subjective evaluation, such as rating of web contents, the commonality augments the degree of inter-subjectivity, which is often adapted to cloud-sourced data validation (e.g., [24,25]). When a subjective evaluation coincides with an objective measurement, the commonality represents the degree of subjective–objective unity. A highly reproducible subjective–objective unity can provide on-site practical measurement in field science, typical in biodiversity assessment and soil texture analysis (e.g., [25,26]). This is because these plausible subjective–objective unity measures also coincide with high inter-subjectivity after sufficient training, which guarantees the accuracy of on-site application without confirming the accordance with objective measurement each time. When the methodology is highly established with respect to the accuracy and reproducibility, it belongs to inter-subjective objectivity, where each subjective and objective measurement converges to the same result. The developmental process of reproducible subjective evaluations that converge with objective measurements is depicted in Figure 1b). By training the subjective–objective unity of each human observer, their inter-subjectivity increases, and the commonality of measurement augments to become a self-organizing loop between the subjective–objective unity and inter-subjectivity by a mutual feedback to attain a higher degree of inter-subjective objectivity.

Note that in a philosophical generalization (e.g., phenomenology), all data are the derivatives of subjectivity, because a machine observation is also constructed on human perception in the establishment of measurement principle, construction of sensing devices and data processing workflows, and final interpretation. To avoid trivial argument that does not affect the reproducibility of the results, we adopt the standpoint that separates the subjectivity and objectivity with the degree of intervention to observation outcome between human and machine. We call this conceptual model the inter-subjective objective model.

#### 2.2. Representative Model: Buoy–Anchor–Raft Model

In order to apply the inter-subjective objective model into a quantitative framework of actual data processing, we develop a general example model with a more familiar and analogical terminology that are intuitively easier to understand: the buoy–anchor–raft model, as schematically expressed in Figure 2. The definition and correspondence to the inter-subjective objectivity model are given as follows:

- Buoy refers to subjective data that fluctuates on the sea surface, representing subjectivity. Buoy can provide subjective estimates of an observation object lying on the objective sea floor, but the observation is biased by subjective fluctuations.
- Anchor refers to objective data that is fixed on the sea floor representing objectivity, without the influence from the subjective sea surface. Anchors can be connected to buoys, which provide the evaluation of subjective fluctuation with respect to objective machine measurements.
- Raft represents the relationship between buoys, and refers to inter-subjectivity of data without reference to anchors. A buoy can evaluate another buoy using relative difference of fluctuation on a subjective sea surface, and the overall commonality between buoys is represented as the raft. Nevertheless, it is based on an internal observation between buoys without an objective system of units, and is therefore susceptible to a global drift of collective standard.
- Buoy–Anchor connection rope defines the degree of subjective–objective unity. As a buoy’s movement is more controlled by its anchor, higher subjective–objective unity is assured.
- Raft–Anchor connection ropes define the degree of inter-subjective objectivity. In addition to the commonality between buoys represented as a raft, the effects of the global drift from subjective sea surface could be controlled with anchors within a plausible range of error with respect to the objective sea floor.

Concrete examples of the buoy, anchor, raft in various social systems and scientific domains are given in Table 1. While inter-subjective objectivity is a conceptual framework that classifies the quality of observation, the buoy, anchor, and raft refer to actual constructs of databases implemented with ICT. The terms arose from the developmental process of management systems in open systems science [5], sharing the perspective with the transversal question of the grand challenge of AI research regarding the effective extraction of scientific knowledge out of heterogenous data of varying quality [27]. Without properly positioning the subjective background of the study, it is often the case that established knowledge with large-scale experiments and statistical analyses is revealed to be false in high-throughput discovery-oriented research, resulting in a null-field with statistically prevailing bias [28]. As shown in Table 1, conceptual problematics for the implementation of ICT in various fields can be mutually characterized with the use of the buoy–anchor–raft model. This means the ICT infrastructure can be applied and shared in a synergistic way across domains, which is beneficial, especially for open-source development advocated in complex systems science [21]. Recent development in the application programming interface for big data integration has increased the support for this challenge, which calls for a general theoretical framework of information processing that the buoy–anchor–raft model can provide (e.g., [29]).

We then consider a mathematical expression of the buoy–anchor–raft model in view of providing a simplified idea of computation with respect to the evaluation of inter-subjective objectivity. Recently emerging contexts of citizen science make use of buoys as important information sources, in contrast to objective science such as traditional physics, which is usually self-contained with anchors. Buoys fluctuate with human subjectivity, which is scientifically called bias. Suppose we cannot directly measure observation objects as anchors. This constraint does not necessarily arise from the observation principle but rather from the resource limitation: For example, a field evaluation of biodiversity mostly depends on human observation because massive DNA barcoding is too costly or even ineffective. So, the accuracy of buoy data should be evaluated with other buoy–anchor connections compatible with observation objects. By defining a buoy data $\mathbf{B}\subset \mathbb{X}$ and corresponding measurable anchor data $\mathbf{A}\subset \mathbb{X}$, a buoy–anchor connection $\mathbf{C}$ can be defined as an error function $\mathrm{erf}(\xb7)$ between $\mathbf{A}$ and $\mathbf{B}$:

$$\mathbf{C}:=\mathrm{erf}(\mathbf{B},\mathbf{A}).$$

In case of n observation objects $\mathbf{A}=({a}_{1},{a}_{2},\cdots ,{a}_{n})\in {\mathbb{R}}^{n}$ and $\mathbf{B}=({b}_{1},{b}_{2},\cdots ,{b}_{n})\in {\mathbb{R}}^{n}$ for one observer, a typical example of buoy–anchor connection $c\in \mathbb{R}$ is given with the regularized mean squared error:

$$c=\frac{1}{n}\sum _{i=1}^{n}{\left(\frac{{a}_{i}-{b}_{i}}{{a}_{i}}\right)}^{2}.$$

The regularization makes c accessible to the canonical evaluation of confidence interval, such as t-test. As a generalization to m observers, let us describe
where
given that

$$\mathbf{C}=\left(\begin{array}{c}{c}_{1}\\ {c}_{2}\\ \vdots \\ {c}_{m}\end{array}\right),$$

$${c}_{j}=\frac{1}{n}\sum _{i=1}^{n}{\left(\frac{{a}_{ij}-{b}_{ij}}{{a}_{ij}}\right)}^{2},\phantom{\rule{4pt}{0ex}}(j=1,2,\cdots ,m),$$

$$\mathbf{A}=\left(\begin{array}{ccc}{a}_{11}& \cdots & {a}_{1m}\\ \vdots & \ddots & \vdots \\ {a}_{n1}& \cdots & {a}_{nm}\end{array}\right),\phantom{\rule{4pt}{0ex}}\mathbf{B}=\left(\begin{array}{ccc}{b}_{11}& \cdots & {b}_{1m}\\ \vdots & \ddots & \vdots \\ {b}_{n1}& \cdots & {b}_{nm}\end{array}\right).$$

Next, we consider the raft model. In most social systems, the case-wise precise measurement of anchors is impossible, and we call for the raft of common sense and other social feedbacks as a premise of plausible judgement. Consider m observers with somehow quantifiable opinions (buoy) on n observation objects. We define the raft matrix $\mathbf{R}$ as follows, as a generalization of buoy data to m observers and n observation objects:
where the raft by definition refers to the commonality contained between these buoys. In a completely equal society where every observer’s opinion is equally respected, we obtain the mean inter-subjective evaluation $\mathbf{E}=({e}_{1},\cdots ,{e}_{n})$ on n objects as follows:

$$\mathbf{R}=\left(\begin{array}{ccc}{r}_{11}& \cdots & {r}_{1m}\\ \vdots & \ddots & \vdots \\ {r}_{n1}& \cdots & {r}_{nm}\end{array}\right),$$

$$\mathbf{E}:=\left(\begin{array}{c}{e}_{1}\\ \vdots \\ {e}_{n}\end{array}\right)=\left(\begin{array}{ccc}{r}_{11}& \cdots & {r}_{1m}\\ \vdots & \ddots & \vdots \\ {r}_{n1}& \cdots & {r}_{nm}\end{array}\right)\left(\begin{array}{c}1/n\\ \vdots \\ 1/n\end{array}\right).$$

Decision-making based on the evaluation of raft can represent the community’s mean quantifiable opinions, although it is not free from collective bias. It remains only within the framework of inter-subjectivity. For a better evaluation in terms of inter-subjective objectivity, we need to introduce a connection with anchors. Let us introduce a buoy–anchor connection $\mathbf{C}$ from Equation (4), then an example of the inter-subjective objective evaluation ${\mathbf{E}}^{\prime}=({e}_{1}^{\prime},\cdots ,{e}_{n}^{\prime})$ in the sense of raft–anchor connection can be given by:
where

$${\mathbf{E}}^{\prime}:=\left(\begin{array}{c}{e}_{1}^{\prime}\\ \vdots \\ {e}_{n}^{\prime}\end{array}\right)\propto \left(\begin{array}{ccc}{r}_{11}& \cdots & {r}_{1m}\\ \vdots & \ddots & \vdots \\ {r}_{n1}& \cdots & {r}_{nm}\end{array}\right)[-log(\mathbf{C})],$$

$$[-log(\mathbf{C})]=\left(\begin{array}{c}-log({c}_{1})\\ \vdots \\ -log({c}_{m})\end{array}\right).$$

This means that the error function of the buoy–anchor connection is reflected as an entropy that represents subjective–objective unity of each observer. The opinion of the observer with higher subjective–objective unity is weighted according to the informational scarcity of subjective errors. Such integrated evaluation incorporating the scoring system on observers’ quality are one of the general solutions in web-based citizen science (e.g., [25]).

Note that the n objects of observation can also coincide with m observers themselves. As $\mathbf{C}$ can be independently obtained from $\mathbf{R}$, it can also accept subjective objects of observation where direct anchors do not exist, such as psychological state or the quantification of qualia such as Quality Function Deployment (QFD) [30] and pain scale [31]. In such cases, traditional methods only employ simple raft evaluation $\mathbf{E}$ without anchors, as formalized in Equation (8). In contrast, with the buoy–anchor–raft model, it is possible to relate indirect anchors to other related objectively quantifiable variables, by expanding the database into a more comprehensive system. In either case, this model provides accessibility to the inter-subjective objective evaluation by properly defining the buoy, anchor, raft and its connections.

The correspondence between the buoy–anchor–raft model and computational variables developed in the following sections are listed in Table 2.

## 3. Complexity Measures

We consider the generalization of complexity measures with respect to essential information processing in citizen science, based on the inter-subjective objectivity model with buoy–anchor–raft constructs. The concept and definition of complexity vary according to the fields, such as algorithmic complexity, statistical complexity, biological complexity, etc. In this paper, we take a generalized definition of complexity measure as the projection from a system’s variables to one-dimensional quantity, which is composed to express a distinctive characteristic of the system [32]. This includes classical indices mentioned with the context of complexity, as well as various forms of information expressed as numbers in ICT, such as feature dimensions of machine learning.

#### 3.1. Complexity Measure and Search Function

We consider general forms of complexity defined on database $\mathbb{X}$ in relation to the search function. Complexity measures are widely studied in information theory, with the underlying principle to abstract a low-dimensional representative index of useful features for functional characterization of complex systems [32]. Usually, complexity measures defined on n real variables are the epimorphism to the one-dimensional real number line, ${\mathbb{R}}^{n}\mapsto \mathbb{R}$. The general complexity measure for citizen science is therefore the projection of the database to real value index, $\mathbb{X}\mapsto \mathbb{R}$, with the condition that this transformation will provide some utility for the management.

The importance of utility depends on the need for information retrieval in citizen science process, or the conditions that are practically used in a database search. Indeed, the search function is actually the retrieval of corresponding data set with respect to a given condition, such that
where ${S}_{R}$ stands for the search result on database $\mathbb{X}$ with search query $Q(\xb7)$. For example, $Q(\xb7)$ is an if–then construct that can specify the value range of real variables, or the matching with specific symbolic sequence, which returns the corresponding data sets into ${S}_{R}$.

$${S}_{R}[Q(x)]:=\{x\in \mathbb{X}|Q(x)\},$$

In order to perform computation such as the calculation of the buoy–anchor–raft model evaluation, the integral I of $\sigma $-finite measure $\mu $ on $\mathbb{X}$ with respect to the condition $Q(\xb7)$ can be defined as follows, with indicator function $\mathbf{1}(\xb7|Q(\xb7))$:
where

$$I(Q(x)):={\int}_{\mathbb{X}}\mathbf{1}(x|Q(x))\mu (dx),$$

$$\begin{array}{c}\hfill \mathbf{1}(x|Q(x)):=\left\{\begin{array}{ccc}1& if& x\in {S}_{R}[Q(x)],\\ 0& if& x\notin {S}_{R}[Q(x)].\end{array}\right.\end{array}$$

In one-dimensional case, $\mu $ can represent either of buoy or anchor. If we define $\mu :\mathbb{X}\mapsto \mathbb{R}$ as the function of occurrence probability $p(\xb7)$ of $x\subset \mathbb{X}$, such as
then I coincides with entropy, one of the typical information theoretical complexity measures. $\mu $ can also include joint distribution, such that with ${\mu}^{\prime}$:
in which case, the mutual information ${I}_{2}$,
can incorporate raft, buoy–anchor, and raft–anchor connections.

$$\mu (x)=-p(x)log(p(x)),$$

$$\begin{array}{c}\hfill \begin{array}{ccc}\hfill {\mu}^{\prime}(x,y)& =& {\displaystyle p(x,y)log\frac{p(x,y)}{p(x)p(y)},}\hfill \\ \hfill x& \ne & y,\hfill \\ \hfill x,y& \in & \mathbb{X},\hfill \end{array}\end{array}$$

$$\begin{array}{c}\hfill \begin{array}{ccc}\hfill {I}_{2}& :=& {\displaystyle {\int}_{\mathbb{X}}\mathbf{1}(x,y|Q(x),Q(y)){\mu}^{\prime}(dx,dy)}\hfill \end{array}\end{array}$$

As a search query, $Q(x)$ provides a value of complexity measure I; we can also inversely use I to specify ${S}_{R}[Q(x)]$. We consider the invertible map ${S}_{R}^{-1}:\{x\in \mathbb{X}|I\}\to \{Q(x)\}$ that generates all possible queries $\{Q(x)\}$ which return the set of x associated with the given value of complexity measure I. For example, we can search the dataset with its entropy higher than a threshold ${I}_{c}$ by setting

$$\begin{array}{c}\hfill \{Q(x)\}:={S}_{R}^{-1}\left[\left\{x\subset \mathbb{X}|{\int}_{x}\mu (dx)>{I}_{c}\right\}\right].\end{array}$$

Nevertheless, complexity measures that specifically define an arbitrary $Q(x)$ are generally not given explicitly. In practice, we usually compare the performance of known complexity measures with respect to the ability to characterize the features on which we focus our analysis. The general task is to invent a novel complexity measure that can exclusively separate patterns in $\mathbb{X}$, given implicitly as $Q(x)$. For that purpose, the following theorem holds:

**Theorem**

**1.**

For any search condition $Q(x)$, we can construct an exclusively selective complexity measure ${I}^{\prime}$ which can sort out effects from other variables, with the function $G(\xb7):\mathbb{R}\mapsto \{Q(x)\}$, such that
The definition of invertibility of G follows that of ${S}_{R}$.

$$\begin{array}{ccc}\hfill Q(x)& =& {S}_{R}^{-1}\left[\{x\in \mathbb{X}|{I}^{\prime}\}\right]=G({I}^{\prime}),\hfill \end{array}$$

$$\begin{array}{ccc}\hfill {I}^{\prime}& =& {G}^{-1}(Q(x)).\hfill \end{array}$$

Proofs of the theorems are given in Appendix A.

The intuitive geometric meaning of the inverse function relationship between complexity measures and search function is shown in Figure 3.

#### 3.2. Observation Commonality as Complexity

Inter-subjective objectivity is based on the commonality among subjectivity, inter-subjectivity, and objectivity. Essential computation is therefore the search for commonality between different observation datasets, whether it be from humans or machines. We consider the observation commonality to be a complexity measure that conforms to inter-subjective objectivity, and analyze its general mathematical structure.

We consider $\sigma $-finite probabilistic measures ${\mu}_{1}$, ${\mu}_{2}$ on measurable database space $(\mathbb{X},\mathcal{B})$, where $\mathcal{B}$ stands for Borel $\sigma $-algebra of $\mathbb{X}$. Then, the convolution * of ${\mu}_{1}$ and ${\mu}_{2}$ is defined as follows:
where $\mathcal{B}(\mathbb{S})$ and $\mathcal{B}(\mathbb{R})$ represent $\sigma $-algebra of $\mathbb{S}\subset \mathbb{X}$ and $\mathbb{R}\subset \mathbb{X}$, respectively.

$$\begin{array}{cc}\hfill {\mu}_{1}*{\mu}_{2}({s}_{i}):=\sum _{j}{\mu}_{2}({s}_{j}){\mu}_{1}({s}_{i-j})& \phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}{s}_{i}\in \mathcal{B}(\mathbb{S}),\{{s}_{i},{s}_{j},{s}_{i-j}\}\in \mathbb{S},\end{array}$$

$$\begin{array}{cc}\hfill {\mu}_{1}*{\mu}_{2}(\mathbf{x}):={\int}_{\mathbb{R}}{\mu}_{1}(\mathbf{x}-y){\mu}_{2}(dy)& \phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}\mathbf{x}\in \mathcal{B}(\mathbb{R}),\phantom{\rule{4pt}{0ex}}\mathbf{x}-y:=\{x-y|x\in \mathbf{x}\},\end{array}$$

Through appropriate variable transformation, the convolution of probability measures with real type variables (21) can be expressed as follows, as the probability of the sum of the variables [33]:

$$\begin{array}{c}\hfill {\mu}_{1}*{\mu}_{2}(\mathbf{x})={\int}_{\mathbb{R}}{\int}_{\mathbb{R}}\mathbf{1}(x+y|x+y\in \mathbf{x}){\mu}_{1}(dx){\mu}_{2}(dy),\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\mathbf{x}\in \mathcal{B}(\mathbb{R}).\end{array}$$

By choosing finite sets of $\mathbf{x}$ such as time period, geographic range, and other real type variable range, as well as symbols for $\{{s}_{i}\}$ such as name of observation object, one can define the commonality of observations as a part of the convolution of the probabilities from different observers. The observation ${\mu}_{1}$ and ${\mu}_{2}$ can be of any nature between subjectivity, inter-subjectivity, and objectivity.

We now consider the condition of valid observation with respect to the regularization of probability measure as follows, for a general number of observers $i\in \{1,\cdots ,N\}$:

$$\begin{array}{c}\hfill {\int}_{\mathbb{R}}{\mu}_{i}(dx)=1.\end{array}$$

This means that by expanding the scale of the real type variable to infinity, one can observe its occurrence with probability 1. The same formalization also applies to $\sigma $-finite measure on $(\mathbb{S},\mathcal{B}(\mathbb{S}))$, which is integrated in the formalization with $(\mathbb{R},\mathcal{B}(\mathbb{R}))$.

Next, consider a confined variable range $r\subset \mathbb{R}$ with positive probability measure ${\mu}_{i}(r)>0$. This range can be of any complex form as long as it supports positive measure. In a real situation, this can correspond to intermittent observation time interval, scattered geographical range, and other discrete range of the real type variable. We define the rate of observation ${q}_{i}$ by observer i within variable range r as
which converges to (23) with $r\to \mathbb{R}$.

$$\begin{array}{c}\hfill {q}_{i}(r):={\int}_{\mathbb{R}}\mathbf{1}(x|x\in r){\mu}_{i}(dx)\le 1,\end{array}$$

The commonality of observation between two observers i, j based on r is expressed as the following convolution confined to r:
which also means taking the sum of joint distributions ${\mu}_{i}\xb7{\mu}_{j}$ between all smallest measurable events in r. The additional condition $x,y\in r$ in $\mathbf{1}(\xb7)$ limits the integral of each variable within r, which includes formal condition $x+y\in {r}_{2}$. The following generalization holds:

$$\begin{array}{c}\hfill \begin{array}{ccc}\hfill {\mu}_{i}*{\mu}_{j}({r}_{2})& :=& {\displaystyle {\int}_{\mathbb{R}}{\int}_{\mathbb{R}}\mathbf{1}(x+y|x+y\in {r}_{2};x,y\in r){\mu}_{i}(dx){\mu}_{j}(dy)}\hfill \\ & =& {\displaystyle {\int}_{\mathbb{R}}{\int}_{\mathbb{R}}\mathbf{1}(x+y|x,y\in r){\mu}_{i}(dx){\mu}_{j}(dy)}\hfill \\ & =& {\displaystyle {\int}_{r}{\int}_{r}{\mu}_{i}(dx){\mu}_{j}(dy),}\hfill \\ \hfill {r}_{2}& :=& \{{x}_{1}+{x}_{2}|{x}_{1},{x}_{2}\in r\},\hfill \end{array}\end{array}$$

**Theorem**

**2.**

For N independent and valid observation ${\mu}_{i}(r)>0$ $(i=1,\cdots ,N)$ on variable range $r\phantom{\rule{3.33333pt}{0ex}}\subset \mathbb{R}$, let
where the coefficient Λ is a free parameter that remains invariant under the convolution. Then

$$\begin{array}{c}\hfill \begin{array}{ccc}\hfill {\lambda}_{N}({r}_{N})& :=& {\mu}_{1}*{\mu}_{2}*\cdots *{\mu}_{i}*\cdots *{\mu}_{N}({r}_{N})\hfill \\ & :=& {\displaystyle {\int}_{{\mathbb{R}}^{N}}\mathbf{1}\left({\displaystyle \Lambda \sum _{i=1}^{N}{x}_{i}|\Lambda \sum _{i=1}^{N}{x}_{i}\in {r}_{N};{x}_{i}\in r}\right)\prod _{i}^{\{1,\cdots ,N\}}{\mu}_{i}(d{x}_{i})}\hfill \\ & =& {\displaystyle {\int}_{{\mathbb{R}}^{N}}\mathbf{1}\left({\displaystyle \Lambda \sum _{i=1}^{N}{x}_{i}|{x}_{i}\in r}\right)\prod _{i}^{\{1,\cdots ,N\}}{\mu}_{i}(d{x}_{i}),}\hfill \\ \hfill {r}_{N}& :=& \left\{{\displaystyle \Lambda \sum _{k=1}^{N}{x}_{k}|{x}_{k}\in r}\right\},\hfill \\ \hfill \Lambda & :=& \mathbb{R}\setminus \{0,\pm \infty \},\hfill \end{array}\end{array}$$

$$\begin{array}{c}\hfill {\lambda}_{N}({r}_{N})=\prod _{i}^{N}{q}_{i}(r).\end{array}$$

This means that the ${N}^{-1}$-th power of multiple convolution ${\lambda}_{N}({r}_{N})$ represents the geometric mean of N independent valid observation rates. By choosing regularization factor $\Lambda $, ${r}_{N}$ corresponds to the ensemble of possible mean values $\left({\displaystyle \Lambda =\frac{1}{N}}\right)$, integrated values $\left(\Lambda =1\right)$, and other weighted sum of N random samplings from r. The regularization parameter $\Lambda $ can further be generalized to an arbitrary measurable function $\Lambda (\xb7)$ representing commonality characteristics, taking $\sum _{i=1}^{N}{x}_{i}$ as a variable.

With the use of the logarithmic scale, the information of ${\lambda}_{N}({r}_{N})$ is the sum of those with individual observation:

$$\begin{array}{c}\hfill -log({\mu}_{1}*{\mu}_{2}*\cdots *{\mu}_{i}*\cdots *{\mu}_{N}({r}_{N}))=\sum _{i}^{N}(-log{\mu}_{i}(r)).\end{array}$$

As a similar property related to geometric mean, note that the following Young’s inequality also holds:
where $||\xb7||$ denotes total variation. This assures us that the variation of the commonality remains within the order of the product of each observation’s variation.

$$\begin{array}{c}\hfill |\Lambda |\xb7||{\mu}_{1}*{\mu}_{2}*\cdots *{\mu}_{i}*\cdots {\mu}_{N}||\le \prod _{i}^{N}||{\mu}_{i}||,\end{array}$$

However, it is important to note that as a general property of convolution,

$$\begin{array}{c}\hfill {\displaystyle {\lambda}_{N}(r)\ne \prod _{i}^{N}{q}_{i}(r).}\end{array}$$

The equality only holds in case $r\to \mathbb{R}$ or ${\mu}_{i}({r}_{N})={\mu}_{i}(r)$ for $i=1,\cdots ,N$, without implication for the independence of observations. For the convolution on general subset ${r}_{s}\subseteq {r}_{N}$, the exact definition is given by
though it requires direct calculation without relevance to ${q}_{i}(r)$. In order to obtain fast computable form, the following asymptotical generalization holds:

$$\begin{array}{c}\hfill \begin{array}{ccc}{\lambda}_{N}({r}_{s})& :=& {\displaystyle {\int}_{{\mathbb{R}}^{N}}\mathbf{1}\left({\displaystyle \Lambda \sum _{i=1}^{N}{x}_{i}|\Lambda \sum _{i=1}^{N}{x}_{i}\in {r}_{s};{x}_{i}\in r}\right){\mu}_{1}(d{x}_{1})\cdots {\mu}_{N}(d{x}_{N}),}\hfill \end{array}\end{array}$$

**Theorem**

**3.**

As $N\to \infty $, for $r\subset \mathbb{R}$, ${\mu}_{i}(r)>0$, $i=1,\cdots ,N$ and ${r}_{s}\subseteq {r}_{N}$, ${\lambda}_{N}({r}_{s})$ converges almost everywhere to the following:
where $m(\xb7)$ is the Lebesgue measure on $\mathbb{R}$, and $\mathcal{N}({\nu}_{N},{\sigma}_{N}^{2})$ represents the normal probability density distribution with mean value ${\nu}_{N}$ and variance ${\sigma}_{N}^{2}$ as follows:

$$\begin{array}{c}\hfill \begin{array}{ccc}{\lambda}_{N}({r}_{s})& \to & {\displaystyle {\int}_{{r}_{s}}\mathcal{N}(\Lambda {\nu}_{N},{\Lambda}^{2}{\sigma}_{N}^{2})m(dx)\times \prod _{i}^{N}{q}_{i}(r),}\hfill \end{array}\end{array}$$

$$\begin{array}{c}\hfill \begin{array}{ccc}{\nu}_{N}& :=& {\displaystyle \sum _{i=1}^{N}{\int}_{\mathbb{R}}\mathbf{1}\left(x|x\in r\right)x{\mu}_{i}(dx),}\hfill \\ {\sigma}_{N}^{2}& :=& {\displaystyle \sum _{i=1}^{N}\left({\int}_{\mathbb{R}}\mathbf{1}\left(x|x\in r\right){x}^{2}{\mu}_{i}(dx)-({\nu}_{N}^{2}-2{\nu}_{N})\right)}\hfill \\ & \to & {\displaystyle \sum _{i=1}^{N}\left({\int}_{\mathbb{R}}\mathbf{1}\left(x|x\in r\right){x}^{2}{\mu}_{i}(dx)-{\nu}_{N}^{2}\right).}\hfill \end{array}\end{array}$$

A numerical example of the convolution ${\lambda}_{N}({r}_{N})$ is presented in Figure 4. Theorems 2 and 3 can be directly generalized to ${\mathbb{R}}^{n}(n\in \mathbb{N})$, with $r\subset {\mathbb{R}}^{d}$.

#### 3.3. Topological Structure of Complexity 1: Total Order of Observations

We consider the topological structure of inter-subjective objectivity based on the complexity, defined as the convolution between different observations. As the commonality within inter-subjective objectivity is defined with multiple different observations, the topological ordering based on these complexity measures is possible with $N>2$ observations of any nature.

We consider the commonality space with respect to each observation dataset as a point, and commonality between them as the distance between each pair of points. This can be considered as the undirected complete graph with N vertices, and its pair-wise complexity measure as ${}_{N}{C}_{2}$ edges length. The general property of Euclidean space allows a complete graph of size N to be embedded in $N-1$ dimensions (e.g., any line between two points is one-dimensional space, and any triangle with three points is two-dimensional surface, etc.), although an additional quantitative restriction such as triangle inequality on each triplet of edges is required. In order to treat an arbitrary set of the complexity measures and yield general characteristics of commonality space, we need to focus not on the actual values of complexity, but on the topological order between them.

Let us first consider the total order between complexity values with $N>2$ observation data contained in N vertices $V:={\{{v}_{i}\}}_{i=1,\cdots ,N}$. One can determine the total order between ${}_{N}{C}_{2}$ edges $E:={\{{e}_{k}\}}_{k=1,\cdots ,{}_{N}{C}_{2}}:=\{{v}_{i},{v}_{j\ne i}\in V\}$ by taking a mean order relationship between each pair of edges by the following algorithm (namely the pair-wise order algorithm):

- For each pair of edges $\{{e}_{i},{e}_{j\ne i}\in E\}$, calculate the order relation ${e}_{i}\le {e}_{j}$ or ${e}_{i}\ge {e}_{j}$ with respect to the given complexity measure as an edge attribute such as length.
- Score each edge ${e}_{i}$ by mapping to integer $z:{e}_{i}\mapsto \mathbb{Z}$ by adding $+1$ if ${e}_{i}\ge {e}_{j\ne i}$ and by adding $-1$ if ${e}_{i}\le {e}_{j\ne i}$, with respect to all other edges ${e}_{j\ne i}$.
- The sorting with the score $\{z({e}_{i})\}$ provides the total order of E.

Note that the quantitative difference is completely lost in the case of antisymmetry, $({e}_{i}={e}_{j})\equiv ({e}_{i}\le {e}_{j})\wedge ({e}_{i}\ge {e}_{j})$. We will consider the meaning of this information loss with respect to other compatible sets of observation in Section 3.4.

Next, we consider the topological order of complexity for $N>2$ observations according to the total order of these commonalities. We need here to translate the total order between edges E to that of observations V. This can be obtained by calculating the ${}_{N}{C}_{3}$ triplet of $N>2$ vertices and associated total order of edges with the following algorithm (namely, the triplet order algorithm schematically represented in Figure 5):

- For each triplet of observation ${V}_{i,j,k}:=\{{v}_{i},{v}_{j\ne i},{v}_{k\ne i,j}\in V\}$ and associated edges $\{{e}_{i}:=\{{v}_{i},{v}_{j}\},{e}_{j}:=\{{v}_{j},{v}_{k}\},{e}_{k}:=\{{v}_{k},{v}_{i}\}\}$, update score of each observation by mapping to integer ${z}^{\prime}:{V}_{i,j,k}\mapsto \mathbb{Z}$ with the following six rules:
- If ${e}_{i}\ge {e}_{j}\ge {e}_{k}$, then ${z}^{\prime}({v}_{i})={z}^{\prime}({v}_{i})-1$, ${z}^{\prime}({v}_{j})={z}^{\prime}({v}_{j})+1$, ${z}^{\prime}({v}_{k})={z}^{\prime}({v}_{k})+0$.
- If ${e}_{i}\ge {e}_{k}\ge {e}_{j}$, then ${z}^{\prime}({v}_{i})={z}^{\prime}({v}_{i})+1$, ${z}^{\prime}({v}_{j})={z}^{\prime}({v}_{j})-1$, ${z}^{\prime}({v}_{k})={z}^{\prime}({v}_{k})+0$.
- If ${e}_{j}\ge {e}_{i}\ge {e}_{k}$, then ${z}^{\prime}({v}_{i})={z}^{\prime}({v}_{i})+0$, ${z}^{\prime}({v}_{j})={z}^{\prime}({v}_{j})+1$, ${z}^{\prime}({v}_{k})={z}^{\prime}({v}_{k})-1$.
- If ${e}_{j}\ge {e}_{k}\ge {e}_{i}$, then ${z}^{\prime}({v}_{i})={z}^{\prime}({v}_{i})+0$, ${z}^{\prime}({v}_{j})={z}^{\prime}({v}_{j})-1$, ${z}^{\prime}({v}_{k})={z}^{\prime}({v}_{k})+1$.
- If ${e}_{k}\ge {e}_{i}\ge {e}_{j}$, then ${z}^{\prime}({v}_{i})={z}^{\prime}({v}_{i})+1$, ${z}^{\prime}({v}_{j})={z}^{\prime}({v}_{j})+0$, ${z}^{\prime}({v}_{k})={z}^{\prime}({v}_{k})-1$.
- If ${e}_{k}\ge {e}_{j}\ge {e}_{i}$, then ${z}^{\prime}({v}_{i})={z}^{\prime}({v}_{i})-1$, ${z}^{\prime}({v}_{j})={z}^{\prime}({v}_{j})+0$, ${z}^{\prime}({v}_{k})={z}^{\prime}({v}_{k})+1$.
- The sorting with the score $\{{z}^{\prime}({v}_{i})|i=1,\cdots ,N\}$ provides the total order of V.

The commonality order of V represents the topological structure of collective intelligence in citizen science with respect to inter-subjective objectivity, which corresponds to the topological inclusion relation of the Venn diagram in Figure 1.

#### 3.4. Topological Structure of Complexity 2: Permutation between Total Orders of Observations

We expand the situation to two sets of $N>2$ observations—namely, observation I and $I\phantom{\rule{-1.00006pt}{0ex}}I$. For example, observer I and $I\phantom{\rule{-1.00006pt}{0ex}}I$ observing N objects, or N observers observing 2 different objects I and $I\phantom{\rule{-1.00006pt}{0ex}}I$. It can also represent the application of two different complexity measures I and $I\phantom{\rule{-1.00006pt}{0ex}}I$ to N observations. For simplicity, we limit the formalization to two sets of $N>2$ observations, but generalization to a greater number of sets is possible.

In the general case, total orders I and $I\phantom{\rule{-1.00006pt}{0ex}}I$ do not necessarily coincide. The relationship between two total orders with N observations can be described with the permutation of N elements (Figure 6a). In order to analyze the permutation between total orders, let ${\mathcal{G}}_{N}$ be a symmetric group with degrees of N. For $g\in {\mathcal{G}}_{N}$, we define a linear transformation ${L}_{g}:{\mathbb{S}}^{N}\mapsto {\mathbb{S}}^{N}$ by
which describes the permutation between commonality orders I and $I\phantom{\rule{-1.00006pt}{0ex}}I$.

$${L}_{g}:({v}_{1},\cdots ,{v}_{N})\mapsto ({v}_{g(1)},\cdots ,{v}_{g(N)}),$$

We define a subspace ${\mathbb{S}}^{\prime}(g)$ of ${\mathbb{S}}^{N}$ by
which represents the subspace with compromise of total order. While by defining its complementary subspace
we obtain the subspace in which there is no compromise, or the complete matching of two commonality orders. The whole commonality space can be divided into ${\mathbb{S}}^{\prime}(g)$ and ${\mathbb{S}}^{\prime \prime}(g)$:

$${\mathbb{S}}^{\prime}(g)=\{{v}_{i}\in \mathbb{S}|{v}_{i}\ne {v}_{g(i)}\},$$

$${\mathbb{S}}^{\prime \prime}(g)=\{{v}_{i}\in \mathbb{S}|{v}_{i}={v}_{g(i)}\},$$

$${\mathbb{S}}^{N}={\mathbb{S}}^{\prime}(g)\times {\mathbb{S}}^{\prime \prime}(g).$$

As depicted in Figure 6a,b, the compromise between two commonality orders is expressed as a non-linear folding relationship between them. Making the assumption that the complexity measure is a continuous function, the integrated complexity measure that supports both commonality orders can be expressed as a folded structure (topologically speaking), such as the shape of the letter “N” (also the capital letter of Non-identical), taking the commonality measure of I and $I\phantom{\rule{-1.00006pt}{0ex}}I$ as an affine coordinate: the example with a red dotted line in Figure 6b shows that we can compose an integrated commonality measure by bending the commonality measure $I\phantom{\rule{-1.00006pt}{0ex}}I$ in an “N” shape with respect to that of I kept straight (in “I” shape, for Identical), which resolves the compromise. The “N” shape transformation of commonality measure means to change the topology of commonality order with respect to a permutation $g\in {\mathcal{G}}_{N}$ $(g(i)>g(j),1\le i<j\le N)$, while that of “I” shape represents the identical order $(g(i)<g(j),1\le i<j\le N)$. The non-compromising part of the two commonality orders conserves its order to the projection onto any linear combination of the two commonality measures, which topologically do not require “N” shape folding, but maintain “I” shape matching.

For simplicity, We call the topological compromise between commonality orders the I–N compromise, and we call topologically identical matching I–I matching. Then, I–I matching subspace ${\mathbb{S}}^{\prime}(g)$ can be obtained as the linear combination of commonality measures I and $I\phantom{\rule{-1.00006pt}{0ex}}I$, and the subspace required for the resolution of I–N compromise corresponds to the complementary space ${\mathbb{S}}^{\prime \prime}(g)$ (Figure 6b,c).

We call ${\mathbb{S}}^{\prime}(g)$ an I–I space that consists of I–I dimensions, and ${\mathbb{S}}^{\prime \prime}(g)$ an I–N resolution space that consists of I–N resolution dimensions. The mean commonality order of two commonality orders projected onto I–I space (red solid arrows in Figure 6b,c) can be obtained with the use of the pair-wise order algorithm in Section 3.3, applied not to commonality itself, but to commonality orders. We call this the I–N mean commonality order, since it adopts the mean total order of commonality orders of I and $I\phantom{\rule{-1.00006pt}{0ex}}I$, resolving the I–N compromise. Note that the information lost by antisymmetry of the pair-wise order algorithm does not affect the division of I–I and I–N resolution subspaces. Geometrical representation of the I–N compromise, I–I matching, and these corresponding dimensions, spaces, and the I–N mean commonality order are given in Figure 6.

We finally consider a statistical test on the degree of coincidence (TDC) between 2 commonality orders.

**Theorem**

**4.**

Statistical test on the degree of coincidence (TDC) between two commonality orders:

Given that commonality orders I and $I\phantom{\rule{-1.00006pt}{0ex}}I$ with N observations follow a uniformly random permutation with ${\mathcal{G}}_{N}$ as null hypothesis, the degree of coincidence ${d}_{c}$ between the three commonality orders follows a binomial distribution:
where $B(M,p)$ signifies a binomial distribution of parameters $M={}_{N}{C}_{2}$ and $p=0.5$, ${k}_{\mathrm{I}-\mathrm{I}}$ represents the degree of coincidence as the number of I–I matching, $\#(\xb7)$ returns the size of the set, and $P[\xb7]$ the probability of the degree of coincidence ${d}_{c}$.

$$\begin{array}{c}\hfill \begin{array}{ccc}{k}_{\mathrm{I}-\mathrm{I}}& :=& \#\{(i,j)|g(i)<g(j),1\le i<j\le N,g\in {\mathcal{G}}_{N}\},\hfill \\ P[{d}_{c}={k}_{\mathrm{I}-\mathrm{I}}]& :=& {}_{M}{C}_{{k}_{\mathrm{I}-\mathrm{I}}}{p}^{{k}_{\mathrm{I}-\mathrm{I}}}{(1-p)}^{N-{k}_{\mathrm{I}-\mathrm{I}}}\hfill \\ & \sim & B(M,p),\hfill \end{array}\end{array}$$

With respect to the buoy–anchor–raft model in Section 2.2, the following correspondence is possible:

- Two observers observing N objects: Commonality orders I and $I\phantom{\rule{-1.00006pt}{0ex}}I$ can correspond to either subjective (buoy) or objective (anchor) observation. The I–N resolution provides integrated commonality measure such as buoy–anchor connection and raft evaluation according to the nature of the observation. TDC provides connections between buoys and/or anchors.
- N observers observing two different objects: The commonality of N observers—whether it be subjective (buoy) or objective (anchor)—are ranked with respect to two different objects I and $I\phantom{\rule{-1.00006pt}{0ex}}I$. The I–N resolution provides a mean ranking of N observers’ commonality upon these observations. TDC provides the reproducibility of commonality among N observers.
- Application of two different complexity measures to N observations: For example, the case of raft–anchor connection where N subjective observers (buoys) are ranked with inter-subjective commonality (raft evaluation) and weighted with two different anchors. The I–N resolution provides mean ranking of N observers’ inter-subjective objectivity, integrating multiple criteria of inter-subjective and objective evaluation. TDC represents statistical dependencies between two complexity measures in response to a given inter-subjective objective measurement. While significant matching between two commonality orders assures the reproducibility based on the coincidence of observation with these measures, non-significance can also be used to quantify complementarity of different evaluations [32].

## 4. Computational Complexity

The computation of complexity measures and commonality orders depends on the exhaustive calculation of combinatorics between observations. The computational complexity of such calculation should also be investigated in terms of topological complexity, in order to yield a general theoretical platform that does not depend on the particularity of the database.

#### 4.1. Topological Complexity of Commonality

First, we investigate topological order of commonality among N observations. Using the convolution as commonality (27), we define the maximum commonality order $O:\mathbb{X}\mapsto \mathbb{N}$ as follows:

$$\begin{array}{c}\hfill O(r\subset \mathbb{X}):=max\{k\in 1,\cdots ,N|{\lambda}_{k}(r)>0\}.\end{array}$$

The general topological structure of $O(\mathbb{X})$ is depicted in Figure 7.

On the cardinality of $O(\mathbb{X})$, the following holds:

**Theorem**

**5.**

As $\#(\mathbb{X})\to {\aleph}_{0}$, ${}^{\exists}r\subset \mathbb{X}$ such that $\#(\{r|O(r)=\infty \})={\aleph}_{0}$, where ${\aleph}_{0}$ represents aleph-naught.

This means that for any elaborated inter-subjective objectivity, there is always the possibility to develop another different set of observations that attains higher inter-subjective objectivity by increasing the dataset. This structure assures the representation of a paradigm shift in science when sufficient contradicting evidence gained a majority compared to an old model. For example, minority reports in biology that may lead to novel discoveries in the future can be properly stored and distinguished from erroneous reports as more evidence accumulates [27].

#### 4.2. Algorithmic Complexity

Secondly, we evaluate the computational complexity with respect to the computing time scale. Since data-driven citizen science requires real-time computation in a highly interactive manner with observation process, the algorithmic complexity of the calculation of complexity measures is an essential limiting factor of performance. As commonality is based on the intersection of multiple observations, its exhaustive computing confronts combinatorial explosion as datasets increase. Although computation of complexity itself, or resolution of search query as mathematical theorem is provable and an algorithmic solution can be found, the computation resource is another practical issue for real-world implementation—especially in distributed observation.

The computational time scale required for the sorting of a database according to a given utility such as commonality is listed in Table 3. Under a general condition with the observation probability database ${\mathbb{X}}_{N}$ of size N, ${\mathbb{X}}_{N}:=\{{\mu}_{i}(x)|x\in \mathbb{X},i=1,\cdots ,N\}$, maximum complexity lies in the calculation of commonality order based on the intersection of $\lfloor \frac{N}{2}\rfloor$ or $\lceil \frac{N}{2}\rceil$ elements, whose sorting time belongs to factorial order of N . The case with $N=5$ is depicted in Figure 7. This means that an algorithmic burden exists towards the calculation of middle-scale commonality with respect to the data size. As an inter-subjective objectivity successfully increases in citizen science, this peaking of algorithmic complexity in intermediate scale may hinder the effective feedback necessary for guided self-organization.

However, in a practical situation, the actual computation time may remain in polynomial order if effective data size shrinks with respect to the increase of maximum commonality order:

**Theorem**

**6.**

By defining the diminution rate of data combination $\Delta :\mathbb{N}\mapsto \mathbb{R}$ with respect to maximum commonality order $1\le i\le N\in \mathbb{N}$ as
the order of its product is upper bounded by the d-th root of maximum computational complexity at ${N}^{\prime}=\lfloor \frac{N}{2}\rfloor$
where $dim(\xb7)$ returns the size of the database, and $d>0$ represents the polynomial order of the algorithm $\mathcal{O}({N}^{d})$ with respect to the data size N.

$$\begin{array}{c}\hfill \Delta (i):=\frac{{}_{N}{C}_{dim(\{{\mathbb{X}}_{N}|O(\mathbb{X})=i\})}}{{}_{N}{C}_{dim(\{{\mathbb{X}}_{N}|O({\mathbb{X}}_{N})=i-1\})}}\end{array}$$

$$\begin{array}{c}\hfill \prod _{i}^{{N}^{\prime}}\Delta (i)\le \sqrt[d]{\mathcal{O}({N}^{d{N}^{\prime}})},\end{array}$$

From this result, we can conjecture that for ${N}^{\prime \prime}\le {N}^{\prime}$,
will assure exhaustive feedback with polynomial response time of degree c. Usually, the left side is based on the past calculation of lower maximum commonality order, we can annotate interactively whether interactive information processing can assure comprehensive feedback. This will add a criterion on the criticality of guided self-organization mediated by computation, which will be explored in Section 5.

$$\begin{array}{c}\hfill \prod _{i}^{{N}^{\prime \prime}}\Delta (i)\le \mathcal{O}({N}^{\frac{c}{d}})\end{array}$$

Another methodology other than exhaustive computing is to implement a local gradient algorithm as a local interaction that leads to a global heuristic solution without top-down control. This can also be achieved with the use of limited maximum commonality order (e.g., $O(\mathbb{X})=k<{N}^{\prime}$), which will keep its computational time within polynomial order $\mathcal{O}({N}^{dk})$.

#### 4.3. Big Data Integration

Thirdly, we consider the computational complexity required for big data integration. As open data is increasingly gaining its availability in citizen science, integration of massive databases from different resources has become one of the most important data processing methods. The conversion of different databases through the application programming interface is a basic protocol when the database is distributed over multiple servers.

The computation required in big data integration is the extensive calculation of commonality in the direct product of multiple databases. For simplicity, we consider the integration of two databases ${\mathbb{X}}_{N}$ and ${\mathbb{X}}_{M}$, with size N and $M\in \mathbb{N}$, ${\mathbb{X}}_{N}:=\{{\mu}_{i}(x)|x\in \mathbb{X},i=1,\cdots ,N\}$, ${\mathbb{X}}_{M}:=\{{\mu}_{i}(x)|x\in \mathbb{X},i=1,\cdots ,M\}$, respectively. A joint distribution between subsets of ${\mathbb{X}}_{N}$ and ${\mathbb{X}}_{M}$ needs to be determined with respect to common parameters in order to obtain an integrated database including the calculation of up to $(N+M)$-th order of commonality, such as order-wise correlations [32]. Exhaustive computing follows the argument in Section 4.2, giving the extension of Theorem 6:

**Theorem**

**7.**

Given the diminution rate of data combination ${\Delta}^{\prime}:{\mathbb{N}}^{2}\mapsto \mathbb{R}$, with respect to maximum commonality order $1\le i\le N\in \mathbb{N}$ and $1\le j\le M\in \mathbb{N}$, during the integration of two databases ${\mathbb{X}}_{N}$ and ${\mathbb{X}}_{M}$, respectively, as
the order of its product is upper-bounded by the d-th root of maximum computational complexity at ${N}^{\prime}=\lfloor \frac{N}{2}\rfloor $ and ${M}^{\prime}=\lfloor \frac{M}{2}\rfloor $
where $d>0$ represents the polynomial order of the algorithm $\mathcal{O}({[{N}^{{N}^{\prime}}{M}^{{M}^{\prime}}]}^{d})$ with respect to the data size N and M.

$$\begin{array}{c}\hfill {\displaystyle {\Delta}^{\prime}(i,j):=\frac{{}_{N}{C}_{dim(\{{\mathbb{X}}_{N}|O(\mathbb{X})=i\})}}{{}_{N}{C}_{dim(\{{\mathbb{X}}_{N}|O(\mathbb{X})=i-1\})}}\xb7\frac{{}_{N}{C}_{dim(\{{\mathbb{X}}_{M}|O(\mathbb{X})=j\})}}{{}_{N}{C}_{dim(\{{\mathbb{X}}_{M}|O(\mathbb{X})=j-1\})}},}\end{array}$$

$$\begin{array}{c}\hfill {\displaystyle \prod _{(i,j)}^{({N}^{\prime},{M}^{\prime})}{\Delta}^{\prime}(i,j)\le \sqrt[d]{\mathcal{O}({[{N}^{{N}^{\prime}}{M}^{{M}^{\prime}}]}^{d})},}\end{array}$$

In this formalization, computational complexity of database integration also confronts combinatorial explosion with respect to data size. Similarly to (42), we then explore a practical condition that effective maximum commonality order can be treated with polynomial time of degree $c>0$, such that

$$\begin{array}{c}\hfill \mathcal{O}({[{N}^{{N}^{\prime}}{M}^{{M}^{\prime}}]}^{d})\le \mathcal{O}({[N+M]}^{c}).\end{array}$$

For that purpose, we set the uniform sparseness u$(0<u<1)$ of random databases representing the density of combination that supports the existence of commonality at each order,
which maintains the diminution rate of data combination $\Delta $ (40) and ${\Delta}^{\prime}$ (43) invariant under the definition. With respect to the total size of the database after integration $L=N+M$, the following holds:

$$\begin{array}{c}\hfill \begin{array}{ccc}\hfill {\displaystyle \frac{{}_{N}{C}_{dim(\{{\mathbb{X}}_{N}|O(\mathbb{X})=k\})}}{{}_{N}{C}_{k}}}& {\displaystyle =\frac{{}_{M}{C}_{dim(\{{\mathbb{X}}_{M}|O(\mathbb{X})=k\})}}{{}_{M}{C}_{k}}}\hfill & =u\hfill \\ \hfill \phantom{\rule{4.pt}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}k& =1,\cdots ,{N}^{\prime}\phantom{\rule{4.pt}{0ex}}\mathrm{or}\phantom{\rule{4.pt}{0ex}}{M}^{\prime},\hfill \end{array}\end{array}$$

**Theorem**

**8.**

As $L\to \infty $ in random data (46), the mean condition of (45) for all $\{N,M|N+M=L\}$ converges to the following inequality, which represents polynomial time constraints on computational complexity for exhaustive calculation of newly emerging commonality order within data size L:
where
and * signifies the discrete convolution (20):

$$\begin{array}{c}\hfill u\le \mathcal{O}(f*f(L)),\end{array}$$

$$\begin{array}{c}\hfill f(x):=\frac{{L}^{\frac{c}{4d}}{x}^{-\frac{L}{8}}}{\sqrt{L}},\end{array}$$

$$\begin{array}{c}\hfill f*f(L):=\sum _{N=1}^{L-1}f(N)f(L-N).\end{array}$$

Numerical observation of the proof is given in Figure 8.

This signifies that the convolution of the power function of each database’s size serves as the complexity measure of big data integration with respect to computational complexity. This provides the condition of data sparseness u such that exhaustive calculation of all newly generating commonality orders within size L can be treated with polynomial time order c under algorithmic constraint d. As the inequality indicates, the more data is sparse, the easier we can calculate joint commonality.

## 5. Conjectures on Guided Self-Organization

With effective feedbacks by computation, citizen science dynamics is expected to converge to a critical state where objective is collectively optimized through the mutual increase of inter-subjective objectivity. However, several aspects may intervene in the resulting self-organized state, on which we need theoretical interpretation. In this section, general important aspects are exemplified in relation to self-organized criticality.

#### 5.1. Criticality by Limitation

The accuracy and reproducibility of observation is a primary factor that defines the consequent resolution of information represented in a database. Computational complexity also gives constraint on the speed of information processing for prediction. Several limiting factors may generically arise, such as:

- Limitation by principle: Deterministic chaos inherent in a natural system does not allow for long-term prediction, because the tiniest observation error of the present state will develop in exponential order [35]. Short-term validity of meteorological prediction is a typical example.
- Limitation by computational complexity: As explored in Section 4.2, extensive feedback based on exhaustive computing is often impossible with respect to available computing resources. The resolution of feedback may include time delay or incomplete optimization. Spatial-temporal scale of the forecast also sets the constraint as a general trade-off between prediction accuracy and computational resources. The coarser the forecast granularity is, the more costly the calculation becomes, but the more likely it is to realize an accurate long-term prediction.

These limitations fundamentally regulate the order of significant digits in the prediction process, at the edge of resulting precision where the accuracy reaches criticality. The whole dynamics is also confined by the criticality of the observing phenomena itself, by which observers’ behaviour is influenced.

#### 5.2. Criticality by Successful Learning

The motivation of citizen science is not necessarily the construction of versatile artificial intelligence, but the integration and augmentation of human capacity as well [4,12,13]. Successfulness of citizen science can also be defined in terms of information transition from machine to human, on which criticality is assumed to appear.

Let us consider the case when successful learning mediated by computation transferred an effective prediction model into human cognitive capacity. We take an example with Bayesian estimation, which is also a general model of our brain function [36]. General formulation of Bayesian estimation updates the parameter of hypothesized prior probability $P(A)$ with respect to the observed data $P(B)$, and provides an estimation of posterior probability $P({A}^{\prime}|B)$ given by Bayes’ theorem:
where $P(B|A)$ is considered as likelihood function, which updates $P(A)$ to $P({A}^{\prime}|B)$.

$$\begin{array}{c}\hfill P({A}^{\prime}|B):=\frac{P(B|A)P(A)}{P(B)},\end{array}$$

We now consider that the prior probability $P(A)$—or the model of prediction—depends on the process of computation C and human decision D. As human decision is supported by computation,

$$\begin{array}{c}\hfill P(A):=P(D|C).\end{array}$$

This formalization corresponds to Bayesian hierarchical modelling, where computation C provides hyperparameter of human decision D as prior distribution:

$$\begin{array}{c}\hfill \begin{array}{ccc}P(D,C|B)& :=& {\displaystyle \frac{P(B|D)P(D,C)}{P(B)},}\hfill \\ & :=& {\displaystyle \frac{P(B|D)P(D|C)P(C)}{P(B)}.}\hfill \end{array}\end{array}$$

When human successfully acquired the model represented in computational model,
as independent identical distribution, and
as independent and informationally homologous distribution.

$$\begin{array}{c}\hfill P(D|C)\approx P(D)\end{array}$$

$$\begin{array}{c}\hfill P(D)\sim P(C)\end{array}$$

This criticality qualitatively corresponds to the saturation stage of Markov chain Monte Carlo method (MCMC) in the optimization of a hierarchical model (52), where hyperparameter and parameter converge to independent stable distributions. Therefore, by monitoring the dependency of machine–human interaction with respect to the actual predictability, one can suggest whether the computation model or human observation should change, or if the actual phenomenon is in transition:

- When the actual prediction accuracy is high and human–machine interaction is high, this indicates the successful modelling of observing phenomenon with the use of computation.
- When actual prediction accuracy is high and human–machine interaction is low, it means the human has achieved a successful understanding of the phenomenon with less dependency on a machine.
- When actual prediction accuracy is low and human–machine interaction is high, it indicates the possibility that computational capacity is not sufficient to effectively treat the phenomenon. Otherwise, the observing phenomenon might be in dynamical transition that effective computational model needs to be changed.
- When actual prediction accuracy is low and human–machine interaction is low, more human effort needs to be engaged both on actual observation and the utilization of the machine interface.

#### 5.3. Criticality by Guided Optimization

The actual management task of citizen science is often firmly related to the sustainability of a social–ecological system, where the achievement of robustness and resilience is an important criterion of criticality [3,5]. A universally robust model with respect to an arbitrary variable cost function is canonically given by uniform distribution, which is commonly adopted as a prior of Bayesian estimation and random search algorithm [14]. It is also widely prevalent in biological phenomena, as the survival rate depends on the geometric mean of evolutionary fitness, which is maximized with uniformity in space, time, and statistical configuration [32,37].

On the other hand, a short-term management goal is usually biased by a given objective. How to reconcile short-term local efficiency and long-term global sustainability is a crucial issue for guided self-organization of management in citizen science.

In order to optimize the balance between different spatio-temporal scales, information geometry can provide a theoretical compromise in terms of informational complexity. Suppose the actual distribution of variable $X\subset \mathbb{X}$ is given by ${P}_{a}(X)$, a short-term management goal as ${P}_{s}(X)$, and idealized long-term robust distribution as ${P}_{l}(X)$. In many natural systems, the uniformity of ${P}_{l}(X)$ supporting robustness as the result of self-organization is expressed with entropy maximization principle under parameter constraints such as resource availability and energy flux level [38].

For simplicity, take an example with Shannon’s diversity index ${H}^{\prime}$ defined on discrete distribution $P(X)$ on symbols $X=\{{s}_{0},{s}_{1},\cdots ,{s}_{n}\}$, such as frequency of n species in biodiversity observation.
where ${s}_{0}$ represents the non-occurrence of any species. $P(X)$ and ${H}^{\prime}$ could be either buoy or anchor. Note that ${H}^{\prime}$ can be generalized to mutual information ${H}_{2}^{\prime}$ to express raft, buoy–anchor, and raft–anchor connections,
where ${P}_{2}(\xb7,\xb7)$ denotes joint distribution on $X\times X$.

$$\begin{array}{c}\hfill {H}^{\prime}:=-\sum _{i=0}^{n}P({s}_{i})logP({s}_{i}),\end{array}$$

$$\begin{array}{c}\hfill {H}_{2}^{\prime}:=\sum _{i,j}{P}_{2}({s}_{i},{s}_{j})log\frac{{P}_{2}({s}_{i},{s}_{j})}{P({s}_{i})P({s}_{j})},\end{array}$$

By maximizing ${H}^{\prime}$, we can determine the most diverse distribution ${P}_{l}$ as
which represents the most robust ecosystem taking on the assumption that every species including the gap is equally invaluable in terms of ecosystem function in a randomly changing environment.

$$\begin{array}{c}\hfill {P}_{l}({s}_{i})=\frac{1}{n+1},\end{array}$$

With respect to the short-term management goal, both ${H}^{\prime}({P}_{a})<{H}^{\prime}({P}_{s})$ and ${H}^{\prime}({P}_{a})>{H}^{\prime}({P}_{s})$ could occur. However, a general relationship between biodiversity and ecosystem functions imposes ${H}^{\prime}({P}_{a})<{H}^{\prime}({P}_{s})$, meaning a net positive impact on biodiversity and good management in terms of sustainability. ${H}^{\prime}$ can be generalized to complexity measure ${G}^{-1}$ in Section 3.1 with respect to the commonality $\lambda $ in Section 3.2, which will be detailed in Section 7.

Expressed as an exponential family, $P(X)$ can be parameterized as a statistical manifold based on the canonical setting of information geometry, with the dual-flat coordinates $\Theta =\{{\theta}_{i}|i=1,\cdots ,n\}$ and $H=\{{\eta}_{i}|i=1,\cdots ,n\}$, with potential functions $\varphi $ and $\psi $, respectively, based on the Fisher information metric g and connection coefficients ${\Gamma}^{(\alpha )}$ [39,40]:
under the correspondence of the following transformation for discrete distribution,

$$\begin{array}{c}\hfill \begin{array}{ccc}P(X,\Theta )& =& exp\left[{\displaystyle C(X)+\sum _{i=1}^{n}{\theta}_{i}{f}_{i}(X)-\psi (\Theta )}\right],\hfill \\ {\displaystyle \frac{\partial}{\partial {\theta}_{i}}\psi}& =& {\eta}_{i},\hfill \\ {\displaystyle \frac{\partial}{\partial {\eta}_{i}}\varphi}& =& {\theta}_{i},\hfill \\ \varphi (H)& =& {\displaystyle \sum _{i=1}^{n}{\theta}_{i}{\eta}_{i}-\psi (\Theta ),}\hfill \end{array}\end{array}$$

$$\begin{array}{c}\hfill \begin{array}{ccc}C(X)& =& 0,\hfill \\ {f}_{i}(X)& =& \mathbf{1}(X|X={s}_{i}),\hfill \\ \psi (\Theta )& =& -logP({s}_{0}),\hfill \\ {\theta}_{i}& =& {\displaystyle log\frac{P({s}_{i})}{P({s}_{0})},}\hfill \\ {\eta}_{i}& =& E[{f}_{i}(X)]=P({s}_{i}).\hfill \end{array}\end{array}$$

The elements of Fisher information metric $g=\left({g}_{ij}\right)$ are given with respect to the dual coordinates,
where $\left({g}_{ij}^{inv}\right)$ is the inverse matrix of $({g}_{ij})$. This relation defines $\Theta $ and H as the dual coordinate systems orthogonal to each other with respect to g. The $\alpha $-connection coefficients ${\Gamma}^{(\alpha )}=\left({\Gamma}_{ij;k}^{(\alpha )}\right)$ $(i,j,k\in \{1,\cdots ,n\})$ with respect to a real number $\alpha $ is given by the Fisher information metric as
where $E[\xb7]$ is the mean value function. The values $\alpha =1$ and $-1$ are essential in information geometry, which define the e- and m-flat connections, respectively, in terms of the invariance of tangent space under the covariant differential ${\nabla}^{(\alpha )}$ on arbitrary coordinates $\{{\xi}_{i}\}(i=1,\cdots ,n)$ of the statistical manifold:
where ${\Gamma}_{ij;k}^{(1)}=0$ for ${\xi}_{i}={\theta}_{i}$, and ${\Gamma}_{ij;k}^{(-1)}=0$ for ${\xi}_{i}={\eta}_{i}$. For example, the model $P(X;\Theta )$ is e-flat with respect to the coordinates $\Theta $, and m-flat with respect to the coordinates H. ${\nabla}^{(\pm 1)}$ is called the dual-flat connection of the statistical manifold. The concept of flatness defined by these connections further extends to the concept of geometric parallel and geodesic. As an autoparallel submanifold with respect to the connection, e- and m-flat geodesic $\Theta (w)$ and $H(w)$ between two distributions ${P}_{1}(X)$ and ${P}_{2}(X)$ are defined as follows with one-dimensional parameter w:

$$\begin{array}{c}\hfill \begin{array}{ccccc}{g}_{ij}& =& {\displaystyle \frac{\partial}{\partial {\theta}_{i}}\frac{\partial}{\partial {\theta}_{j}}\psi (\Theta )}& =& {\displaystyle \frac{\partial {\eta}_{j}}{\partial {\theta}_{i}},}\\ {g}_{ij}^{inv}& =& {\displaystyle \frac{\partial}{\partial {\eta}_{i}}\frac{\partial}{\partial {\eta}_{j}}\varphi (H)}& =& {\displaystyle \frac{\partial {\theta}_{j}}{\partial {\eta}_{i}},}\end{array}\end{array}$$

$$\begin{array}{c}\hfill \begin{array}{c}{\Gamma}_{ij;k}^{(\alpha )}={\displaystyle \frac{1}{2}\left(\frac{\partial}{\partial {\theta}_{i}}{g}_{jk}+\frac{\partial}{\partial {\theta}_{j}}{g}_{ik}+\frac{\partial}{\partial {\theta}_{k}}{g}_{ij}-\alpha E\left[\frac{\partial}{\partial {\theta}_{i}}logP(X)\frac{\partial}{\partial {\theta}_{j}}logP(X)\frac{\partial}{\partial {\theta}_{k}}logP(X)\right]\right),}\end{array}\end{array}$$

$$\begin{array}{c}\hfill {\nabla}_{\frac{\partial}{\partial {\xi}_{i}}}^{(\alpha )}\frac{\partial}{\partial {\xi}_{j}}=\sum _{k=1}^{n}{\Gamma}_{ij;k}^{(\alpha )}\frac{\partial}{\partial {\xi}_{k}},\end{array}$$

$$\begin{array}{c}\hfill \Theta (w)=w\Theta ({P}_{1}(X))+(1-w)\Theta ({P}_{2}(X)),\end{array}$$

$$\begin{array}{c}\hfill H(w)=wH({P}_{1}(X))+(1-w)H({P}_{2}(X)).\end{array}$$

The unique ${\nabla}^{(\alpha )}$-divergence ${D}^{(\alpha )}({P}_{1}(X):{P}_{2}(X))$ that satisfies $D({P}_{1}(X):{P}_{2}(X))\ge 0$ and $D({P}_{1}(X):{P}_{2}(X))=0$⇔${P}_{1}(X)={P}_{2}(X)$, and that remains invariant under possible transformations of the dual-flat coordinates with the connections ${\nabla}^{(\pm \alpha )}$ is given by
whose dual divergence coincides with Kullbuck–Leibler divergence in case of $\alpha =1$,

$$\begin{array}{c}\hfill {D}^{(\alpha )}({P}_{1}(X):{P}_{2}(X))=\Psi ({P}_{1}(X))+\Phi ({P}_{2}(X))-\sum _{i=1}^{n}{\theta}_{i}({P}_{1}(X)){\eta}_{i}({P}_{2}(X)),\end{array}$$

$$\begin{array}{c}\hfill {\displaystyle {D}^{(1)}({P}_{1}(X):{P}_{2}(X))={D}^{(-1)}({P}_{2}(X):{P}_{1}(X))=\sum _{X}{P}_{2}(X)log\frac{{P}_{1}(X)}{{P}_{2}(X)}.}\end{array}$$

From the Pythagorean relation and the projection theorem of Kullbuck–Leibler divergence on the dual-flat statistical manifold [39] (p. 63), the following holds:

**Theorem**

**9.**

Let $({\Theta}_{a},{H}_{a}),({\Theta}_{s},{H}_{s})$, and $({\Theta}_{l},{H}_{l})$ be the dual-flat coordinates of ${P}_{a}(X),{P}_{s}(X)$, and ${P}_{l}(X)$, respectively, with the canonical definition of e- and m-flat dual connections. We define the optimal distribution ${P}_{o}(X)$ with coordinates $({\Theta}_{o},{H}_{o})$ on m-flat geodesic between ${P}_{a}(X)$ and ${P}_{l}(X)$ with parameter $w\in \mathbb{R}$ as

$$\begin{array}{c}\hfill \begin{array}{ccc}{H}_{o}& :=& w{H}_{a}+(1-w){H}_{l}.\hfill \end{array}\end{array}$$

By optimizing ${H}_{o}$ with orthogonal projection of e-flat geodesic from ${\Theta}_{s}$ to ${\Theta}_{o}$ as
the following Pythagorean relations hold:
where ${D}^{m}(\xb7:\xb7)$ and ${D}^{e}(\xb7:\xb7)$ are Kullback–Leibler divergence and its dual divergence, respectively,

$$\begin{array}{c}\hfill w=\underset{w}{argmin}({D}^{m}({P}_{o}:{P}_{s}))=\underset{w}{argmin}({D}^{e}({P}_{s}:{P}_{o})),\end{array}$$

$$\begin{array}{c}\hfill \begin{array}{ccc}{D}^{m}({P}_{a}:{P}_{s})& =& {D}^{m}({P}_{a}:{P}_{o})+{D}^{m}({P}_{o}:{P}_{s}),\hfill \\ {D}^{m}({P}_{l}:{P}_{s})& =& {D}^{m}({P}_{l}:{P}_{o})+{D}^{m}({P}_{o}:{P}_{s}).\hfill \end{array}\end{array}$$

$$\begin{array}{c}\hfill {\displaystyle {D}^{m}({P}_{o}:{P}_{s}):={D}^{e}({P}_{s}:{P}_{o}):=\sum _{i=0}^{n}{P}_{o}({s}_{i})log\frac{{P}_{o}({s}_{i})}{{P}_{s}({s}_{i})}.}\end{array}$$

Figure 9 shows the geometrical structure of this theorem. In this case, supposing ${H}^{\prime}({P}_{a})<{H}^{\prime}({P}_{s})<{H}^{\prime}({P}_{l})$ as effectiveness of complexity measure ${H}^{\prime}$ for management, we want to find the optimal distribution of biodiversity ${P}_{o}$ balancing between ${P}_{s}$ and ${P}_{l}$ with respect to actual distribution ${P}_{a}$, such that
based on statistical dependencies between variables that can be orthogonally separated with Pythagorean relation. As a result, ${P}_{o}$ provides the optimized distribution with respect to minimum informational discrepancy from the short-term objective to the ideal transition towards the long-term most diverse state. The meaning of major components of Kullbuck–Leibler divergence to be used as a guide of self-organization is listed as follows:

$$\begin{array}{c}\hfill {H}^{\prime}({P}_{a})<{H}^{\prime}({P}_{s})<{H}^{\prime}({P}_{o})<{H}^{\prime}({P}_{l}),\end{array}$$

- ${D}^{m}({P}_{a}:{P}_{o})$: Discrepancy between actual distribution and optimum portfolio strategy that orthogonally decomposes and attempts to achieve a balance between short-term management objective and long-term sustainability.
- ${D}^{m}({P}_{a}:{P}_{s})$: Target risk of short-term management objective.
- ${D}^{m}({P}_{o}:{P}_{s})={D}^{e}({P}_{s}:{P}_{o})$: Buffering element of robustness trade-off between short-term management objective and long-term sustainability.
- ${D}^{m}({P}_{l}:{P}_{o})$: Potential risk of optimum portfolio w.r.t. long-term sustainability.
- ${D}^{m}({P}_{l}:{P}_{s})$: Potential risk of short-term management objective w.r.t. long-term sustainability.
- ${D}^{m}({P}_{l}:{P}_{a})$, ${D}^{m}({P}_{a}:{P}_{l})$: Potential risk of actual distribution w.r.t. long-term sustainability.

## 6. Results from Biodiversity Management

We demonstrate the application of the model developed in this article to actual citizen science observation data, taking a biodiversity observation activity supported by interactive database as a typical example [17]. Sample data contain the observation by seven citizen participants on 48 subjective binary indices on species occurrence as buoy data on biological diversity, resulting in 336 samples. On the other hand, a buoy–anchor connection was established separately by objective evaluation of each participant’s ability to detect these species.

Commonality orders among seven observers were obtained for both inter-subjectivity based on the mutual information of buoy data and subjective–objective unity by simply ranking with buoy–anchor connection data. These orders are shown in Figure 10. A binomial test defined in (38) was performed on the comparison between inter-subjective and subjective–objective commonality orders. The random order distribution hypothesis was rejected with respect to $4\%$ significance threshold. The matching was more consistent in a higher order of commonality, which implies the intervention of subjective bias in a lower order. With respect to the conjectures on criticality in Section 5, the results can be interpreted as a significant self-organization process towards criticality with the increase of inter-subjective objectivity.

## 7. Discussion

We have tackled the general situation in data-driven citizen science where scientific accuracy and reproducibility can only be discussed at the intersection of subjectivity, inter-subjectivity, and objectivity. Based on the conceptual definition of inter-subjective objectivity, a general topological structure was characterized with respect to complexity measure, search function, computational complexity, and criticality conditions. The results provide theoretical criteria for the development of information and communication technology in view of effective assistance and guidance of citizen science from a complex systems perspective.

The universality of the developed theory and models lies in the generality of the commonality concept formalized as convolution. In reality, a joint distribution of N variables can be represented as the function of convolution with degree N, which allows for extensive expression of informational complexities [32].

For example, by choosing the time range $T\subset \mathbb{R}$ with positive Lebesgue measure $m(T)>0$, marginal distribution $P(x|T)$ can be expressed as the time integral of probability measure $\mu $, such as
according to (24).

$$\begin{array}{c}\hfill \begin{array}{ccc}P(x|T)& :=& {\displaystyle {\int}_{T}\mu (dt)}\hfill \\ & =& {\displaystyle {\int}_{\mathbb{R}}\mathbf{1}(t|t\in T)\mu (dt)}\hfill \\ & =& q(T),\hfill \end{array}\end{array}$$

On the other hand, joint distribution $P({x}_{1},{x}_{2}|T)$ is also the time integral of the products between each variable’s probability measure ${\mu}_{1}$ and ${\mu}_{2}$, within simultaneous time range ${\{d{T}^{i}\}}_{i=1,\cdots ,n}$:
where $m(\xb7)$ is Lebesgue measure on $\mathbb{R}$. As defined in (25),
which derives the practical form for actual data processing as

$$\begin{array}{c}\hfill \begin{array}{ccc}\hfill P({x}_{1},{x}_{2}|T)& :=& {\displaystyle \sum _{i}^{n}{\int}_{d{T}^{i}}{\int}_{d{T}^{i}}{\mu}_{1}(d{t}_{1}){\mu}_{2}(d{t}_{2})m(d{T}^{i}),}\hfill \\ \hfill {\displaystyle \bigcup _{i}^{n}d{T}^{i}}& =& T.\hfill \end{array}\end{array}$$

$$\begin{array}{c}\hfill \begin{array}{ccc}\hfill {\displaystyle {\int}_{d{T}^{i}}{\int}_{d{T}^{i}}{\mu}_{1}(d{t}_{1}){\mu}_{2}(d{t}_{2})}& =& {\mu}_{1}*{\mu}_{2}(d{T}_{2}^{i}),\hfill \\ \hfill d{T}_{2}^{i}& :=& \left\{{\displaystyle \sum _{i=1,2}{t}_{i}|{t}_{i}\in d{T}^{i}}\right\},\hfill \end{array}\end{array}$$

$$\begin{array}{c}\hfill \begin{array}{ccc}\hfill P({x}_{1},{x}_{2}|T)& :=& {\displaystyle \sum _{i}^{n}{\mu}_{1}*{\mu}_{2}(d{T}_{2}^{i})m(d{T}^{i}).}\hfill \end{array}\end{array}$$

Taking $n\to \infty $, we obtain
the canonical definition of joint distribution with real value resolution of time.

$$\begin{array}{c}\hfill \begin{array}{ccc}\hfill P({x}_{1},{x}_{2}|T)& :=& {\displaystyle {\int}_{T}{\mu}_{1}*{\mu}_{2}(d{T}_{2})}\hfill \\ & =& {\displaystyle {\int}_{T}{q}_{1}(dT){q}_{2}(dT)}\hfill \\ & =& {\displaystyle {\int}_{T}{\mu}_{1}(dT){\mu}_{2}(dT),}\hfill \end{array}\end{array}$$

This follows the generalization to N variables with (A5) as

$$\begin{array}{c}\hfill \begin{array}{ccc}\hfill P({x}_{1},{x}_{2},\cdots ,{x}_{N}|T)& =& {\displaystyle \sum _{i}^{n}{\lambda}_{N}(d{T}_{N}^{i})m(d{T}^{i}),}\hfill \\ \hfill d{T}_{N}^{i}& :=& \left\{{\displaystyle \Lambda \sum _{j=1}^{N}{t}_{j}|{t}_{j}\in {T}^{i}}\right\},\hfill \\ \hfill {\displaystyle \bigcup _{i=1}^{n}d{T}^{i}}& =& T.\hfill \end{array}\end{array}$$

Taking $n\to \infty $, it converges to

$$\begin{array}{c}\hfill \begin{array}{ccc}\hfill P({x}_{1},{x}_{2},\cdots ,{x}_{N}|T)& =& {\displaystyle {\int}_{T}{\lambda}_{N}(d{T}_{N})}\hfill \\ & =& {\displaystyle {\int}_{T}{\mu}_{1}(dT){\mu}_{2}(dT)\cdots {\mu}_{N}(dT).}\hfill \end{array}\end{array}$$

Therefore, based on the commonality as convolution, we derive whole orders of the joint distribution necessary for the calculation of known complexity measures. In a general form, any complexity measure incorporating the information of a joint distribution can be described as the function of convolution ${G}^{-1}(Q(\lambda ))$, following the formalization of Section 3.1.

Commonality order is also accessible to existing algorithms that extract the total order of system elements, such as Dulmage–Mendelsohn decomposition [41] and phylogenetic tree analyses [42]. Although the calculation of joint distributions of all orders out of matrix data generally confronts exponential computational time, total order based on partial combinatorics and statistical testing with known distribution of p-value can provide a quick evaluation of matching on the results from different algorithms. The pair-wise and triplet order algorithms of N observations can be processed with $\mathcal{O}({N}^{2})$ and $\mathcal{O}({N}^{3})$, respectively, similar to the range of most other ranking algorithms based on low-order statistics. The comparison between N total orders of commonality requires only second-degree polynomial time $\mathcal{O}({N}^{2})$ (38). Taking such partial optimization and algorithm-wise comparison of performance into account, as an extensive Bayesian estimator including human of Section 5.2, a deep learning model with the use of massive parallel machine learning can be structurally effective for an interactive recombination of an estimation model based on human feedback [4].

In order to effectively attain criticality in citizen science where knowledge acquisition, transfer, and control are optimized through self-organization, we need to reach a collective intelligence that is distributed in a parallel way both in our subjective mind and in objective reality. The cost of data-driven science sometimes depends on the overly weighted objective measurement for complete modelling, which can also hinder the agility of taking actions, and opportunity of effective interaction through internal observation [3]. As explored in this article, if there exist natural laws extended in our collective intelligence—much like the physical law in objective nature—we may count on such topological structure, and it may be possible to take effective guidance through partial and distributed observation. Such a way to organize collective intelligence among independent and parallel activity producers could be considered as a social–environmental expansion of the “intelligence without representation”, which is based on the direct interface to the world through perception and action, rather than comprehensive representation of knowledge isolated from the environment [43]. Data acquisition needs to generate potentially effective action strategies, or the affordance under global management principles, instead of modelling the phenomena without essential intervention of actors [44]. This can be described as data-affordance science in contrast to exhaustive data-driven science, in which we substantially depend on the emergent topological structure of inter-subjective objectivity to make decisions in real time, represented at the intersection of the human mind, computation, and natural phenomena. The buoy–anchor–raft model developed as a mutual framework can provide a theoretical basis that expands external observation of conventional science to internal observation necessary for the management and knowledge extraction as a data-affordance science [5,27]. As a cumulative effect of synergistic efficiency, observation and data processing could diminish within a computable time scale by implicitly augmenting the knowledge representation incorporated into actual action principles. With measurement–action unity as a process of affordance in both data and reality, a cost-effective interface and a human-dependable system could be realized within the framework of internal observation, as a crucial premise for a sustainable solution. The edge of criticality for a successful citizen science—in terms of its nature and resource restriction—could find its limits neither in our internal mind nor external world, but on the topology of these interactions.

## Acknowledgments

This study was funded by Sony Computer Science Laboratories, Inc.

## Conflicts of Interest

The author declares no conflict of interest.

## Appendix A

**Proof**

**of**

**Theorem 1.**

Let us formulate Equation (12) as $I={F}^{-1}(Q(x))$. As I is an epimorphism but not necessarily a monomorphism, its inverse function generally retrieves a larger subset of conditions including $Q(x)$:

$$F(I)\supseteq Q(x).$$

Recursively defining ${Q}^{\prime}(x)$ by specifying the value of I as
one obtains the inverse function that brings us back exactly to the comprehensive search condition, ${F}^{\prime}(I)={Q}^{\prime}(x)$.

$${Q}^{\prime}(x)={S}_{R}^{-1}\left[\{x\in \mathbb{X}|I=const.\}\right],$$

Now, we consider the epimorphism $H:\{Q(x)\}\to \{{Q}^{\prime}(x)\}$ with its right-sided inverse as ${H}^{-1}:\{{Q}^{\prime}(x)\}\to \{Q(x)\}$ and ${H}^{-1}\circ H\circ Q(x)=Q(x)$. We set ${Q}^{\prime}(x)=H\circ Q(x)$, which gives ${F}^{\prime}(I)=H\circ Q(x)$, then ${H}^{-1}\circ {F}^{\prime}(I)=Q(x)$. Next, we consider ${I}^{\prime}$ such that ${F}^{\prime}({I}^{\prime})=Q(x)$. By resolving ${F}^{\prime}\circ {F}^{\prime \prime}(I)={H}^{-1}\circ {F}^{\prime}(I)$ with respect to ${F}^{\prime \prime}:\mathbb{R}\mapsto \mathbb{R}$, we obtain
then
which shows coincidence between ${F}^{\prime}$ and G with exclusively selective complexity measure ${I}^{\prime}$. The exact construction of ${Q}^{\prime},{F}^{\prime},H,$ and ${F}^{\prime \prime}$ depends on the exhaustive computation process, whose computational complexity is characterized in Section 4. ☐

$${I}^{\prime}={F}^{\prime \prime}(I),$$

$$Q(x)={F}^{\prime}\circ {F}^{\prime \prime}(I)={F}^{\prime}({I}^{\prime}),$$

**Proof**

**of**

**Theorem 2.**

From Tonelli’s theorem,
☐

$$\begin{array}{c}\hfill \begin{array}{ccc}{\lambda}_{N}({r}_{N})& :=& {\mu}_{1}*{\mu}_{2}*\cdots *{\mu}_{i}*\cdots *{\mu}_{N}(r)\hfill \\ & =& {\displaystyle {\int}_{\mathbb{R}}\cdots \left({\int}_{\mathbb{R}}\cdots \left({\int}_{\mathbb{R}}\left({\int}_{\mathbb{R}}\mathbf{1}\left({\displaystyle \Lambda \sum _{i=1}^{N}{x}_{i}|{x}_{i}\in r}\right){\mu}_{1}(d{x}_{1})\right){\mu}_{2}(d{x}_{2})\right)\cdots {\mu}_{i}(d{x}_{i})\right)\cdots {\mu}_{N}(d{x}_{N})}\hfill \\ & =& {\displaystyle \left({\int}_{\mathbb{R}}\mathbf{1}\left({x}_{1}|{x}_{1}\in r\right){\mu}_{1}(d{x}_{1})\right)\left({\int}_{\mathbb{R}}\mathbf{1}\left({x}_{2}|{x}_{2}\in r\right){\mu}_{1}(d{x}_{2})\right)\cdots}\hfill \\ & & \hfill \phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\cdots \left({\displaystyle {\int}_{\mathbb{R}}\mathbf{1}\left({x}_{i}|{x}_{i}\in r\right){\mu}_{1}(d{x}_{i})}\right)\cdots \left({\displaystyle {\int}_{\mathbb{R}}\mathbf{1}\left({x}_{N}|{x}_{N}\in r\right){\mu}_{1}(d{x}_{N})}\right)\hfill \\ & =& {\displaystyle \prod _{i}^{N}}{q}_{i}(r).\hfill \end{array}\end{array}$$

**Proof**

**of**

**Theorem 3.**

The central limit theorem with Lindeberg’s condition assures the following convergence as the sampling number ${N}^{\prime}\to \infty $ and the number of distribution $N\to \infty $:
where the variables ${x}_{ij}\in {X}_{i}=\{{x}_{i1},\cdots ,{x}_{i{N}^{\prime}}\}$ follow independent distributions $p({X}_{i})$, ${X}_{i}\in \{{X}_{1},\cdots ,{X}_{N}\}$, with finite mean $\alpha}_{i}^{\prime}=\sum _{j=1}^{{N}^{\prime}}\frac{{x}_{ij}}{{N}^{\prime}$ and variance $\beta}_{i}^{\prime 2}=\sum _{j=1}^{{N}^{\prime}}\frac{{x}_{ij}^{2}}{{N}^{\prime}}-{\alpha}_{i}^{\prime 2$ taken over ${N}^{\prime}$ samples, and

$$\begin{array}{c}\hfill {\displaystyle \sum _{j=1}^{{N}^{\prime}}\sum _{i=1}^{N}\frac{{x}_{ij}}{{N}^{\prime}}\to {\int}_{\mathbb{R}}\mathcal{N}({\nu}_{N}^{\prime},{\sigma}_{N}^{\prime 2})m(dx),}\end{array}$$

$$\begin{array}{c}\hfill {\nu}_{N}^{\prime}=\sum _{i=1}^{N}{\alpha}_{i}^{\prime},\end{array}$$

$$\begin{array}{c}\hfill {\sigma}_{N}^{\prime 2}=\sum _{i=1}^{N}{\beta}_{i}^{\prime 2}.\end{array}$$

Based on the central limit theorem, we consider the numerical convergence of ${\lambda}_{N}({r}_{N})$ in a way accessible to ${r}_{S}\subseteq {r}_{N}$. The convolution ${\lambda}_{N}({r}_{N})$ represents infinite random sampling of $x\in {r}_{N}:=\left\{{\displaystyle \Lambda \sum _{k=1}^{N}{x}_{k}|{x}_{k}\in r}\right\}$ at the limit of ${N}^{\prime}\to \infty $, from N independent distributions $\{{\mu}_{i}(x)|x\in r,i=1,\cdots ,N\}$ as the population distributions with finite mean ${\alpha}_{i}$ and variance ${\beta}_{i}^{2}$ as follows:
where each mean and variance is bounded within the total variation of r as

$$\begin{array}{c}\hfill \begin{array}{ccc}{\alpha}_{i}& =& {\displaystyle {\int}_{\mathbb{R}}\mathbf{1}\left(x|x\in r\right)x{\mu}_{i}(dx),}\hfill \\ {\beta}_{i}^{2}& =& {\displaystyle {\int}_{\mathbb{R}}\mathbf{1}\left(x|x\in r\right){x}^{2}{\mu}_{i}(dx)-{\alpha}_{i}^{2},}\hfill \end{array}\end{array}$$

$$\begin{array}{c}\hfill inf(r)\le {\alpha}_{i}\le sup(r),\end{array}$$

$$\begin{array}{c}\hfill {\beta}_{i}\le \frac{sup(r)-inf(r)}{2}=\frac{||r||}{2}.\end{array}$$

If ${\mu}_{i}(x\in r)$ are finite measures, we obtain the following from the central limit theorem of independent distributions with finitely bounded mean and variance:
where
which coincides with (33) as $N\to \infty $. In (A12), the term $\prod _{i=1}^{N}{q}_{i}(r)$ serves as the overall normalisation factor, since ${q}_{i}(r)$ is not necessarily normalized as a probability distribution with total probability 1. Since the convolution is replaced by the integral of normal distribution with single variable, by restricting on arbitrary subset ${r}_{s}\subseteq {r}_{N}$, we obtain the theorem (32):

$$\begin{array}{c}\hfill \begin{array}{ccc}{\lambda}_{N}({r}_{N})& \to & {\displaystyle {\int}_{\mathbb{R}}\mathbf{1}\left(x|x\in {r}_{N}\right)\mathcal{N}(\Lambda {\nu}_{N},{\Lambda}^{2}{\sigma}_{N}^{2})m(dx)}\hfill \\ & & {\displaystyle \times {\int}_{{\mathbb{R}}^{N}}\mathbf{1}\left({\displaystyle \Lambda \sum _{i=1}^{N}{x}_{i}|{x}_{i}\in r}\right){\mu}_{1}(d{x}_{1})\cdots {\mu}_{i}(d{x}_{i})\cdots {\mu}_{N}(d{x}_{N})}\hfill \\ & =& {\displaystyle {\int}_{\mathbb{R}}\mathbf{1}\left(x|x\in {r}_{N}\right)\mathcal{N}(\Lambda {\nu}_{N},{\Lambda}^{2}{\sigma}_{N}^{2})m(dx)\times \prod _{i}^{N}{q}_{i}(r),}\hfill \end{array}\end{array}$$

$$\begin{array}{c}\hfill \begin{array}{ccc}{\nu}_{N}& =& {\displaystyle \sum _{i=1}^{N}{\alpha}_{i},}\hfill \\ {\sigma}_{N}^{2}& =& {\displaystyle \sum _{i=1}^{N}{\beta}_{i}^{2},}\hfill \end{array}\end{array}$$

$$\begin{array}{c}\hfill \begin{array}{ccc}{\lambda}_{N}({r}_{s})& \to & {\displaystyle {\int}_{\mathbb{R}}\mathbf{1}\left(x|x\in {r}_{s}\right)\mathcal{N}(\Lambda {\nu}_{N},{\Lambda}^{2}{\sigma}_{N}^{2})m(dx)\times \prod _{i}^{N}{q}_{i}(r)}\hfill \\ & =& {\displaystyle {\int}_{{r}_{s}}\mathcal{N}(\Lambda {\nu}_{N},{\Lambda}^{2}{\sigma}_{N}^{2})m(dx)\times \prod _{i}^{N}{q}_{i}(r).}\hfill \end{array}\end{array}$$

In case ${\mu}_{i}(x\in r)$ includes infinite measures that do not guarantee the above convergence, ${}^{\exists}x\in r$, such that ${\mu}_{i}(x)=\infty $, though

$$\begin{array}{c}\hfill m(\{x\in r|{\mu}_{i}(x)=\infty \})=0.\end{array}$$

Because, in the opposite case, $m(\{x\in r|{\mu}_{i}(x)=\infty \})>0$, ${\mu}_{i}(r)={q}_{i}(r)=\infty $, which contradicts the definition (24). Since infinite measures could only appear within a countable set of zero Lebesgue measure,
which means for almost every $x\in {r}_{s}$, the theorem holds. ☐

$${\mu}_{i}(\{x\in r|\neg \mathrm{Theorem}3\})=0,$$

**Proof**

**of**

**Theorem 4.**

The null hypothesis can be represented as a random order distribution, in which $M={}_{N}{C}_{2}$ pairs of N observations are susceptible to generating an I–N compromise between I and $I\phantom{\rule{-1.00006pt}{0ex}}I$. Choose an arbitrary commonality order I and consider the null hypothesis distribution of $I\phantom{\rule{-1.00006pt}{0ex}}I$.

With respect to an arbitrary pair $(i,j)$ out of N observations, all permutations in ${\mathcal{G}}_{N}$ can be divided into two sets ${\mathcal{H}}_{\mathrm{I}-\mathrm{I}}$ and ${\mathcal{H}}_{\mathrm{I}-\mathrm{N}}$, which correspond to those generating I–I matching and I–N compromises, respectively:

$$\begin{array}{c}\hfill {\mathcal{H}}_{\mathrm{I}-\mathrm{I}}:=\{{g}_{\mathrm{I}-\mathrm{I}}\in {\mathcal{G}}_{N}|{g}_{\mathrm{I}-\mathrm{I}}(i)={i}^{\prime},{g}_{\mathrm{I}-\mathrm{I}}(j)={j}^{\prime},1\le {i}^{\prime}<{j}^{\prime}\le N\},\end{array}$$

$$\begin{array}{c}\hfill {\mathcal{H}}_{\mathrm{I}-\mathrm{N}}:=\{{g}_{\mathrm{I}-\mathrm{N}}\in {\mathcal{G}}_{N}|{g}_{\mathrm{I}-\mathrm{N}}(i)={j}^{\prime},{g}_{\mathrm{I}-\mathrm{I}}(j)={i}^{\prime},1\le {i}^{\prime}<{j}^{\prime}\le N\}.\end{array}$$

Here, ${\mathcal{H}}_{\mathrm{I}-\mathrm{I}}$ and ${\mathcal{H}}_{\mathrm{I}-\mathrm{N}}$ are not groups, but the subsets of the same size,
because
where for $k=1,\cdots ,N$,

$$\begin{array}{c}\hfill \#({\mathcal{H}}_{\mathrm{I}-\mathrm{I}})=\#({\mathcal{H}}_{\mathrm{I}-\mathrm{N}})=\frac{1}{2}\#({\mathcal{G}}_{N}),\end{array}$$

$$\begin{array}{c}\hfill {\mathcal{H}}_{\mathrm{I}-\mathrm{N}}={g}_{{i}^{\prime}{j}^{\prime}}\circ {\mathcal{H}}_{\mathrm{I}-\mathrm{I}},\end{array}$$

$$\begin{array}{c}\hfill {\mathcal{H}}_{\mathrm{I}-\mathrm{I}}\cup {\mathcal{H}}_{\mathrm{I}-\mathrm{N}}={\mathcal{G}}_{N},\end{array}$$

$$\begin{array}{c}\hfill {\mathcal{H}}_{\mathrm{I}-\mathrm{I}}\cap {\mathcal{H}}_{\mathrm{I}-\mathrm{N}}=\varnothing ,\end{array}$$

$$\begin{array}{c}\hfill {g}_{{i}^{\prime}{j}^{\prime}}(k):=\left\{\begin{array}{ccc}{j}^{\prime}& \mathrm{if}& k={i}^{\prime},\\ {i}^{\prime}& \mathrm{if}& k={j}^{\prime},\\ k& \mathrm{else}.\end{array}\right.\end{array}$$

Then, with respect to the random permutation, the probability p that each pair from N observations will be judged as I–I matching is given by:
which leads to the general probability of the occurence number of I–I matching $({k}_{\mathrm{I}--\mathrm{I}}\ge 1)$ follow binomial distribution with parameters $M={}_{N}{C}_{2}$ and p.

$$\begin{array}{c}\hfill p=\frac{\#({\mathcal{H}}_{\mathrm{I}-\mathrm{I}})}{\#({\mathcal{G}}_{N})}=0.5,\end{array}$$

Note that the binomial distribution can be approximated to a normal distribution with $N\ge 7$ in this case, according to the condition of the mean value $Mp>5$ and variance $Mp(1-p)>5$. ☐

**Proof**

**of**

**Theorem 5.**

Take $n>m\in \mathbf{N}$ and consider the database $\mathbb{X}$, $\#(\mathbb{X})=n$, in which we divide m observations with $k=\lfloor \frac{n}{m}\rfloor$ elements and these intersections as commonality structure. $\lfloor \xb7\rfloor$ is a floor function.

As the cardinality of rational number is ${\aleph}_{0}$, any positive common fraction, or ${\mathbb{N}}^{2}$, can find unique correspondence to $\mathbb{N}$. Now, for an arbitrary $k=\lfloor \frac{n}{m}\rfloor$, $\exists {n}^{\prime}$, such that $k<\lfloor \frac{{n}^{\prime}}{m}\rfloor$ (for example, take ${n}^{\prime}=m\lceil \frac{n}{m}\rceil$ with ceiling function $\lceil \xb7\rceil$). Since $k\in \mathbb{N}$, for simplicity, let us consider the correspondence $n=km$ for any $k,m\in \mathbb{N}$. With the use of Cantor’s pairing function $\langle \xb7,\xb7\rangle :{\mathbb{N}}^{2}\mapsto \mathbb{N}$, we obtain the unique counting natural number $\langle k,m\rangle $ for all pairs of $(k,m)$:

$$\begin{array}{c}\hfill \langle k,m\rangle :=\frac{1}{2}(k+m)(k+m+1)+m.\end{array}$$

As $n=km$ contains permutational symmetry with respect to k and m, the uniqueness does not hold for $\langle k,m\rangle \mapsto n$, though from the inverse function of $\langle \xb7,\xb7\rangle $,

$$\begin{array}{c}\hfill \underset{\langle k,m\rangle \to \infty}{lim}k=\infty ,\end{array}$$

$$\begin{array}{c}\hfill \underset{\langle k,m\rangle \to \infty}{lim}m=\infty ,\end{array}$$

$$\begin{array}{c}\hfill \underset{\langle k,m\rangle \to \infty}{lim}km=\underset{\langle k,m\rangle \to \infty}{lim}n=\infty .\end{array}$$

As $n\to \infty $ is equivalent with either $k\to \infty $ or $m\to \infty $,
which results in

$$\begin{array}{c}\hfill \underset{n\to \infty}{lim}\langle k,m\rangle =\infty ,\end{array}$$

$$\begin{array}{c}\hfill \underset{n\to \infty}{lim}k=\infty ,\end{array}$$

$$\begin{array}{c}\hfill \underset{n\to \infty}{lim}m=\infty .\end{array}$$

Taking $n=\#(\mathbb{X})$, $m=O(r)$, and $k=\#(\{r|O(r)=m\})$ gives the theorem. ☐

**Proof**

**of**

**Theorem 6.**

From the definition of $\Delta (i)$, when there is no diminution of data or equivalently ${\lambda}_{k}(\mathbb{X})>0$ for all $k\in \left\{1,\cdots ,{N}^{\prime}=\lfloor \frac{N}{2}\rfloor \right\}$,

$$\begin{array}{c}\hfill \begin{array}{ccc}\mathcal{O}\left({\displaystyle \prod _{i}^{{N}^{\prime}}\Delta (i)}\right)& =& {\displaystyle \frac{\mathcal{O}({N}^{2})}{\mathcal{O}(1)}\xb7\frac{\mathcal{O}({N}^{3})}{\mathcal{O}({N}^{2})}\xb7\frac{\cdots}{\mathcal{O}({N}^{3})}\cdots \frac{\mathcal{O}({N}^{{N}^{\prime}-1})}{\cdots}\frac{\mathcal{O}({N}^{{N}^{\prime}})}{\mathcal{O}({N}^{{N}^{\prime}-1})}}\hfill \\ & =& \mathcal{O}({N}^{{N}^{\prime}})\hfill \\ & =& \sqrt[d]{\mathcal{O}({N}^{d{N}^{\prime}})}.\hfill \end{array}\end{array}$$

As the product monotonically decreases with respect to the decrease of each element, the above relation gives the upper bound. Sorting time of N elements is usually given by $\mathcal{O}({N}^{2})$, $d=2$, and can be generalized to algorithms with polynomial order $d>0$. ☐

**Proof**

**of**

**Theorem 7.**

From
we directly obtain
☐

$$\begin{array}{c}\hfill {\Delta}^{\prime}(i,j)=\Delta (i)\Delta (j),\end{array}$$

$$\begin{array}{c}\hfill \begin{array}{ccc}\mathcal{O}\left({\displaystyle \prod _{(i,j)}^{({N}^{\prime},{M}^{\prime})}{\Delta}^{\prime}(i,j)}\right)& =& \mathcal{O}\left({\displaystyle \prod _{i}^{{N}^{\prime}}\Delta (i)}\right)\mathcal{O}\left({\displaystyle \prod _{j}^{{M}^{\prime}}\Delta (j)}\right)\hfill \\ & =& \mathcal{O}({N}^{{N}^{\prime}}{M}^{{M}^{\prime}})\hfill \\ & =& \sqrt[d]{\mathcal{O}({[{N}^{{N}^{\prime}}{M}^{{M}^{\prime}}]}^{d})}.\hfill \end{array}\end{array}$$

**Proof**

**of**

**Theorem 8.**

The condition (45) can be translated into the following with respect to the data sparseness u:

$$\begin{array}{c}\hfill \mathcal{O}({[{}_{N}{C}_{dim(\{{\mathbb{X}}_{N}|O(\mathbb{X})={N}^{\prime}\})}{}_{M}{C}_{dim(\{{\mathbb{X}}_{M}|O(\mathbb{X})={M}^{\prime}\})}]}^{d})\le \mathcal{O}({[N+M]}^{c}),\end{array}$$

$$\begin{array}{c}\hfill \mathcal{O}({[u{}_{N}{C}_{{N}^{\prime}}\xb7u{}_{M}{C}_{{M}^{\prime}}]}^{d})\le \mathcal{O}({[N+M]}^{c}),\end{array}$$

$$\begin{array}{c}\hfill \mathcal{O}({[{u}^{2}{N}^{{N}^{\prime}}{M}^{{M}^{\prime}}]}^{d})\le \mathcal{O}({[N+M]}^{c}).\end{array}$$

Expressed as the order of computational time on both sides of formula without $\mathcal{O}(\xb7)$ for simplicity,
and taking logarithmic scale,

$$\begin{array}{ccc}\hfill {[{u}^{2}{N}^{{N}^{\prime}}{M}^{{M}^{\prime}}]}^{d}& \le & {[N+M]}^{c},\hfill \end{array}$$

$$\begin{array}{c}\hfill d[log({u}^{2}{N}^{{N}^{\prime}}{M}^{{M}^{\prime}})]\le clogL,\end{array}$$

$$\begin{array}{c}\hfill \frac{1}{2}\sum _{k}^{\{N,M\}}\lfloor \frac{k}{2}\rfloor logk\le \frac{c}{2d}logL-logu.\end{array}$$

We consider the application of Chebyshev’s inequality on the left side, such that

$$\begin{array}{c}\hfill \frac{1}{2}\sum _{k}^{\{N,M\}}\lfloor \frac{k}{2}\rfloor \xb7\frac{1}{2}\sum _{k}^{\{N,M\}}logk\le \frac{1}{2}\sum _{k}^{\{N,M\}}\lfloor \frac{k}{2}\rfloor logk.\end{array}$$

Since $\lfloor \frac{k}{2}\rfloor /\frac{k}{2}\to 1$ as $k\to \infty $ and removing constant coefficient $\frac{1}{2}$, evaluation of the asymptotic behaviour of (A41) can be derived essentially for the left side from ${f}_{l}(N,M)$ and the right side from ${f}_{r}(N,M)$ defined as follows,
with which (A41) is described as

$$\begin{array}{c}\hfill \begin{array}{ccc}{f}_{l}(N,M)& :=& (N+M)(logN+logM),\hfill \\ {f}_{r}(N,M)& :=& NlogN+MlogM,\hfill \end{array}\end{array}$$

$$\begin{array}{c}\hfill \frac{1}{2}{f}_{l}(N,M)\le {f}_{r}(N,M).\end{array}$$

As $N,M\to \infty $ that becomes dominant as $L\to \infty $,
since ${}^{\exists}D>0$, ${}^{\exists}C>0$, such that $N,M\ge D$, then ${f}_{l}(N,M),{f}_{r}(N,M)\le C\xb7[(N+M)log(N+M)]$. This condition holds with $D\ge 1$, $C\ge 2$ for both ${f}_{l}(N,M)$ and ${f}_{r}(N,M)$. Note that although explicit inequality between ${f}_{l}(N,M)$ and ${f}_{r}(N,M)$ exists in (A43), these converge to the same asymptotic order $\mathcal{O}(LlogL)$ for all N and M, because as $L\to \infty $,
and
which remain within the ranges of multiplication with constant. The relations (A45) and (A46) can be proved by examining the minimum and maximum values of $\frac{{f}_{r}(N,M)}{{f}_{l}(N,M)}$, $\frac{{f}_{l}(N,M)}{LlogL}$, and $\frac{{f}_{r}(N,M)}{LlogL}$. By considering with the range of $1\le N\le \frac{L}{2}$ from the symmetry between N and M ($M=L-N$), we derive the following monotonicity conditions with respect to N,
from which we obtain the minimum value of $\frac{{f}_{r}(N,M)}{{f}_{l}(N,M)}$ at $N=\frac{L}{2}$,
the maximum value of $\frac{{f}_{r}(N,M)}{{f}_{l}(N,M)}$ at $N=1$,
the minimum value of $\frac{{f}_{l}(N,M)}{LlogL}$ at $N=1$,
the maximum value of $\frac{{f}_{l}(N,M)}{LlogL}$ at $N=\frac{L}{2}$,
the minimum value of $\frac{{f}_{r}(N,M)}{LlogL}$ at $N=\frac{L}{2}$,
and the maximum value of $\frac{{f}_{r}(N,M)}{LlogL}$ at $N=1$,
with the associated convergence as $L\to \infty $. Numerical observation of the convergence between ${f}_{l}(N,M)$, ${f}_{r}(N,M)$, and $LlogL$ is given in Figure 8a.

$$\begin{array}{c}\hfill \begin{array}{ccc}{f}_{l}(N,M)& \le & \mathcal{O}((N+M)log(N+M)),\hfill \\ {f}_{r}(N,M)& \le & \mathcal{O}((N+M)log(N+M)),\hfill \end{array}\end{array}$$

$$\begin{array}{c}\hfill \begin{array}{ccccc}1& \le & {\displaystyle \frac{{f}_{r}(N,M)}{{f}_{l}(N,M)}}& \le & 2,\\ {\displaystyle \frac{1}{2}}& \le & {\displaystyle \frac{{f}_{l}(N,M)}{LlogL}}& \le & 1,\end{array}\end{array}$$

$$\begin{array}{c}\hfill {\displaystyle \frac{{f}_{r}(N,M)}{LlogL}\to 1,}\end{array}$$

$$\begin{array}{c}\hfill {\displaystyle \frac{\partial {f}_{l}(N,M)}{\partial N}\ge 0,}\end{array}$$

$$\begin{array}{c}\hfill {\displaystyle \frac{\partial {f}_{r}(N,M)}{\partial N}\le 0,}\end{array}$$

$$\begin{array}{c}\hfill {\displaystyle \frac{{f}_{r}\left({\displaystyle \frac{L}{2},\frac{L}{2}}\right)}{{f}_{l}\left({\displaystyle \frac{L}{2},\frac{L}{2}}\right)}=1,}\end{array}$$

$$\begin{array}{c}\hfill {\displaystyle \frac{{f}_{r}(1,L-1)}{{f}_{l}(1,L-1)}=2\times \frac{L-1}{L}\to 2,}\end{array}$$

$$\begin{array}{c}\hfill {\displaystyle \frac{{f}_{l}(1,L-1)}{LlogL}=\frac{1}{2}\frac{log(L-1)}{logL}\to \frac{1}{2},}\end{array}$$

$$\begin{array}{c}\hfill {\displaystyle \frac{{f}_{l}\left({\displaystyle \frac{L}{2},\frac{L}{2}}\right)}{LlogL}=\frac{logL-log2}{logL}\to 1,}\end{array}$$

$$\begin{array}{c}\hfill {\displaystyle \frac{{f}_{r}\left({\displaystyle \frac{L}{2},\frac{L}{2}}\right)}{LlogL}=\frac{logL-log2}{logL}\to 1,}\end{array}$$

$$\begin{array}{c}\hfill {\displaystyle \frac{{f}_{r}(1,L-1)}{LlogL}=\frac{L-1}{L}\frac{log(L-1)}{logL}\to 1,}\end{array}$$

As it converges to the same asymptotic behaviour $\mathcal{O}((N+M)log(N+M))$ on both sides of (A43), we apply the left side of Chebyshev’s inequality ${f}_{l}(N,M)$ to (A40), which gives asymptotical relation
where coefficient $\frac{1}{8}$ is derived from the relation (A41), including the effect of transformation ${N}^{\prime}=\lfloor \frac{N}{2}\rfloor $ and ${M}^{\prime}=\lfloor \frac{M}{2}\rfloor $. As $L\to \infty $ and taking the sum over N, it converges to the theorem:

$$\begin{array}{c}\hfill \begin{array}{ccc}\hfill {\displaystyle \frac{1}{8}{f}_{l}(N,M)}& \le & {\displaystyle \frac{c}{2d}logL-logu,}\hfill \\ \hfill u& \le & {L}^{\frac{c}{2d}}{N}^{-\frac{L}{8}}{(L-N)}^{-\frac{L}{8}},\hfill \end{array}\end{array}$$

$$\begin{array}{c}\hfill \begin{array}{ccc}\hfill Lu& \le & {\displaystyle \sum _{N=1}^{L-1}{L}^{\frac{c}{2d}}{N}^{-\frac{L}{8}}{(L-N)}^{-\frac{L}{8}},}\hfill \\ \hfill u& \le & {\displaystyle f*f(L).}\hfill \end{array}\end{array}$$

Numerical observation of the proof is given in Figure 8 (b). ☐

**Proof**

**of**

**Theorem 9.**

We consider the $\Theta $ coordinates of ${P}_{o}(X)$ as ${\Theta}_{o}$, which constitutes the e-flat geodesic $\Theta (w)=\{{\theta}_{i}(w)\}$ between ${P}_{s}(X)$ and ${P}_{o}(X)$ as

$$\begin{array}{c}\hfill \Theta (w):=w{\Theta}_{s}+(1-w){\Theta}_{o}.\end{array}$$

The tangent vector ${T}^{e}$ of the e-geodesic is expressed as
and the tangent vector ${T}^{m}$ of the m-geodesic ${H}_{o}$ as

$$\begin{array}{c}\hfill {\displaystyle {T}^{e}=\sum _{i=1}^{n}\frac{d}{dw}{\theta}_{i}(w)\frac{\partial}{\partial {\theta}_{i}}=\sum _{i=1}^{n}\{{\theta}_{i}({P}_{s}(X))-{\theta}_{i}({P}_{o}(X))\}\frac{\partial}{\partial {\theta}_{i}},}\end{array}$$

$$\begin{array}{c}\hfill {\displaystyle {T}^{m}=\sum _{i=1}^{n}\frac{d}{dw}{\eta}_{i}({P}_{o}(X))\frac{\partial}{\partial {\eta}_{i}}=\sum _{i=1}^{n}\{{\eta}_{i}({P}_{a}(X))-{\eta}_{i}({P}_{l}(X))\}\frac{\partial}{\partial {\eta}_{i}}.}\end{array}$$

Then the inner product $<{T}^{e},{T}^{m}>$ of these tangent vectors at ${P}_{o}(X)$ is expressed as
since from the duality of the coordinates in (60),

$$\begin{array}{c}\hfill <{T}^{e},{T}^{m}>=\sum _{i=1}^{n}\{{\theta}_{i}({P}_{s}(X))-{\theta}_{i}({P}_{o}(X))\}\{{\eta}_{i}({P}_{a}(X))-{\eta}_{i}({P}_{l}(X))\},\end{array}$$

$$\begin{array}{c}\hfill {\displaystyle \u2329\frac{\partial}{\partial {\theta}_{i}},\frac{\partial}{\partial {\eta}_{j}}\u232a=\left\{\begin{array}{ccc}1& \phantom{\rule{4.pt}{0ex}}\mathrm{if}\phantom{\rule{4.pt}{0ex}}& i=j,\hfill \\ 0& \phantom{\rule{4.pt}{0ex}}\mathrm{else}.\phantom{\rule{4.pt}{0ex}}\end{array}\right.}\end{array}$$

As ${P}_{a}(X)$, ${P}_{l}(X)$, and ${P}_{o}(X)$ are aligned on the m-geodesic, the relation (A60) can be translated to
with some constant ${C}_{1}$ and ${C}_{2}$.

$$\begin{array}{c}\hfill \begin{array}{ccc}<{T}^{e},{T}^{m}>& =& {\displaystyle \sum _{i=1}^{n}\{{\theta}_{i}({P}_{s}(X))-{\theta}_{i}({P}_{o}(X))\}\{{\eta}_{i}({P}_{a}(X))-{\eta}_{i}({P}_{o}(X))\}\xb7{C}_{1},}\hfill \\ <{T}^{e},{T}^{m}>& =& {\displaystyle \sum _{i=1}^{n}\{{\theta}_{i}({P}_{s}(X))-{\theta}_{i}({P}_{o}(X))\}\{{\eta}_{i}({P}_{l}(X))-{\eta}_{i}({P}_{o}(X))\}\xb7{C}_{2},}\hfill \end{array}\end{array}$$

Now, from the definition of ${\nabla}^{(\alpha )}$-divergence (65) and its dual divergence (66), the Pythagorean relations between Kullback–Leibler divergences are expressed as

$$\begin{array}{c}\hfill \begin{array}{cc}& {D}^{m}({P}_{a}:{P}_{o})+{D}^{m}({P}_{o}:{P}_{s})-{D}^{m}({P}_{a}:{P}_{s})\hfill \\ =& {D}^{e}({P}_{o}:{P}_{a})+{D}^{e}({P}_{s}:{P}_{o})-{D}^{e}({P}_{s}:{P}_{a})\hfill \\ =& {\displaystyle \sum _{i=1}^{n}\{{\theta}_{i}({P}_{s}(X))-{\theta}_{i}({P}_{o}(X))\}\{{\eta}_{i}({P}_{a}(X))-{\eta}_{i}({P}_{o}(X))\}\xb7(-1)}\hfill \\ =& {\displaystyle -\frac{1}{{C}_{1}}<{T}^{e},{T}^{m}>,}\hfill \\ & {D}^{m}({P}_{l}:{P}_{o})+{D}^{m}({P}_{o}:{P}_{s})-{D}^{m}({P}_{l}:{P}_{s})\hfill \\ =& {D}^{e}({P}_{o}:{P}_{l})+{D}^{e}({P}_{s}:{P}_{o})-{D}^{e}({P}_{s}:{P}_{l})\hfill \\ =& {\displaystyle \sum _{i=1}^{n}\{{\theta}_{i}({P}_{s}(X))-{\theta}_{i}({P}_{o}(X))\}\{{\eta}_{i}({P}_{l}(X))-{\eta}_{i}({P}_{o}(X))\}\xb7(-1)}\hfill \\ =& {\displaystyle -\frac{1}{{C}_{2}}<{T}^{e},{T}^{m}>.}\hfill \end{array}\end{array}$$

When orthogonality holds between the e- and m- geodesic, $<{T}^{e},{T}^{m}>=0$ for (A62), which proves the Pythagorean relations from (A63).

Finally, we prove that ${P}_{o}(X)$ satisfies the minimum condition (68). By considering ${P}_{{o}^{\prime}}(X)$ with a parameter ${w}^{\prime}\ne w$ as
we obtain the Pythagorean relation

$$\begin{array}{c}\hfill {H}_{{o}^{\prime}}:={w}^{\prime}{H}_{a}+(1-{w}^{\prime}){H}_{l},\end{array}$$

$$\begin{array}{c}\hfill \begin{array}{ccc}{D}^{m}({P}_{{o}^{\prime}}:{P}_{s})& =& {D}^{m}({P}_{{o}^{\prime}}:{P}_{o})+{D}^{m}({P}_{o}:{P}_{s}),\hfill \\ {D}^{e}({P}_{s}:{P}_{{o}^{\prime}})& =& {D}^{e}({P}_{o}:{P}_{{o}^{\prime}})+{D}^{e}({P}_{s}:{P}_{o}).\hfill \end{array}\end{array}$$

Since ${D}^{m}({P}_{{o}^{\prime}}:{P}_{o})={D}^{e}({P}_{o}:{P}_{{o}^{\prime}})\ge 0$ from the definition of divergence, ${D}^{m}({P}_{{o}^{\prime}}:{P}_{s})\ge {D}^{m}({P}_{o}:{P}_{s})$ and ${D}^{e}({P}_{s}:{P}_{{o}^{\prime}})\ge {D}^{e}({P}_{s}:{P}_{o})$ hold, which means ${P}_{o}(X)$ is a stationary point giving the minimum value with respect to ${D}^{m}(\xb7:{P}_{s})={D}^{e}({P}_{s}:\xb7)$, on the m-geodesic between ${P}_{a}(X)$ and ${P}_{l}(X)$. Note that the theorem also holds when $\sum _{X}P(X)$ takes arbitrary finite values other than 1. ☐

## References

- Schwab, K. The Fourth Industrial Revolution; Crown Business: New York, NY, USA, 2017. [Google Scholar]
- Nature’s Notebook. Available online: https://www.usanpn.org/natures_notebook (accessed on 21 April 2017).
- Funabashi, M.; Hanappe, P.; Isozaki, T.; Maes, A.M.; Sasaki, T.; Steels, L.; Yoshida, K. Foundation of CS-DC e-Laboratory: Open Systems Exploration for Ecosystems Leveraging. In First Complex Systems Digital Campus World E-Conference 2015, Springer Proceedings in Complexity; Springer International Publishing Switzerland: Cham, Switzerland, 2017; pp. 351–374. [Google Scholar]
- Funabashi, M. Open Systems Exploration: An Example with Ecosystems Management. In First Complex Systems Digital Campus World E-Conference 2015, Springer Proceedings in Complexity; Springer International Publishing Switzerland: Cham, Switzerland, 2017; pp. 223–243. [Google Scholar]
- Tokoro, M. Open Systems Science: A Challenge to Open Systems Problems. In First Complex Systems Digital Campus World E-Conference 2015, Springer Proceedings in Complexity; Springer International Publishing Switzerland: Cham, Switzerland, 2017; pp. 213–221. [Google Scholar]
- Bak, P. How Nature Works: The Science of Self-Organized Criticality; Copernicus: New York, NY, USA, 1996. [Google Scholar]
- Jensen, H.J. Self-Organized Criticality; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
- Takayasu, H.; Sato, A.; Takayasu, A. Stable Infinite Variance Fluctuations in Randomly Amplified Langevin Systems. Phys. Rev. Lett.
**1997**, 79, 966. [Google Scholar] [CrossRef] - Scanlon, T.M.; Caylor, K.K.; Levin, S.A.; Rodriguez-Iturbe, I. Positive feedbacks promote power-law clustering of Kalahari vegetation. Nature
**2007**, 449, 209–212. [Google Scholar] [CrossRef] [PubMed] - Gabaix, X. Power Laws in Economics: An Introduction. J. Econ. Perspect.
**2016**, 30, 185–206. [Google Scholar] [CrossRef] - Alves, L.G.A.; Ribeiroa, H.V.; Lenzi, E.K.; Mendes, R.S. Empirical analysis on the connection between power-law distributions and allometries for urban indicators. Phys. A
**2014**, 409, 175–182. [Google Scholar] [CrossRef] - Michelucci, P.; Dickinson, J.L. The power of crowds. Science
**2016**, 351, 32–33. [Google Scholar] [CrossRef] [PubMed] - Hanappe, P.; Dunlop, R.; Maes, A.; Steels, L.; Duval, N. Agroecology: A Fertile Field for Human Computation. Hum. Comput.
**2016**, 1, 1–9. [Google Scholar] [CrossRef] - Scott, S.L. A modern Bayesian look at the multi-armed bandit. Appl. Stoch. Models Bus. Ind.
**2010**, 26, 639–658. [Google Scholar] [CrossRef] - Prokopenko, M. Guided Self-Organization: Inception; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
- Rekimoto, J.; Nagao, K. The World through the Computer: Computer Augmented Interaction with Real World Environments. In Proceedings of the 8th Annual ACM Symposium on User Interface and Software Technology (UIST’95), Pittsburgh, PA, USA, 15–17 November 1995; pp. 29–36. [Google Scholar]
- Funabashi, M. IT-Mediated Development of Sustainable Agriculture Systems: Toward a Data-Driven Citizen Science. J. Inf. Technol. Appl. Educ.
**2013**, 2, 179–182. [Google Scholar] [CrossRef] - Aichi Biodiversity Targets. Available online: https://www.cbd.int/sp/targets/ (accessed on 21 April 2017).
- Funabashi, M. Synecological farming: Theoretical foundation on biodiversity responses of plant communities. Plant Biotechnol.
**2016**, 33, 213–234. [Google Scholar] [CrossRef] - Goodchild, M.F. Citizens as sensors: the world of volunteered geography. GeoJoumal
**2007**, 69, 211–221. [Google Scholar] [CrossRef] - ISC-PIF (Institut des Systèmes Complexes, Paris Île-de-France). French Roadmap for Complex Systems. ISC-PIF, 2009. Available online: http://cnsc.unistra.fr/uploads/media/FeuilleDeRouteNationaleSC09.pdf (accessed on 21 April 2017).
- Solomon, R.C. Subjectivity. In Oxford Companion to Philosophy; Honderich, T., Ed.; Oxford University Press: Oxford, UK, 2005; p. 900. [Google Scholar]
- Gillespie, A.; Cornish, F. Intersubjectivity: Towards a Dialogical Analysis. J. Theory Soc. Behav.
**2009**, 40, 19–46. [Google Scholar] [CrossRef] - Galaxy Zoo. Available online: https://www.galaxyzoo.org/ (accessed on 21 April 2017).
- iNaturalist. Available online: http://www.inaturalist.org/ (accessed on 21 April 2017).
- Rowell, D.L. Soil Science: Methods & Applications; Wiley: New York, NY, USA, 1994. [Google Scholar]
- Kitano, H. Artificial Intelligence to Win the Nobel Prize and Beyond: Creating the Engine for Scientific Discovery. AI Mag.
**2016**, 37, 39–50. [Google Scholar] - Ioannidis, J.P. Why most published research findings are false. PLoS Med.
**2005**, 2, e124. [Google Scholar] [CrossRef] [PubMed] - Linked Data. Available online: http://linkeddata.org (accessed on 21 April 2017).
- Akao, Y. QFD: Quality Function Deployment—Integrating Customer Requirements into Product Design; Productivity Press: New York, NY, USA, 2004. [Google Scholar]
- Hawker, G.A.; Mian, S.; Kendzerska, T.; French, M. Measures of adult pain: Visual Analog Scale for Pain (VAS Pain), Numeric Rating Scale for Pain (NRS Pain), McGill Pain Questionnaire (MPQ), Short-Form McGill Pain Questionnaire (SF-MPQ), Chronic Pain Grade Scale (CPGS), Short Form-36 Bodily Pain Scale (SF-36 BPS), and Measure of Intermittent and Constant Osteoarthritis Pain (ICOAP). Arthritis Care Res.
**2011**, 63, 240–252. [Google Scholar] - Funabashi, M. Network Decomposition and Complexity Measures: An Information Geometrical Approach. Entropy
**2014**, 16, 4132–4167. [Google Scholar] [CrossRef] - Walter, R. Fourier Analysis on Groups, Interscience Tracts in Pure and Applied Mathematics, No. 12; Wiley: New York, NY, USA, 1962. [Google Scholar]
- Symmetrical 5-Set Venn Diagram. Available online: https://commons.wikimedia.org/wiki/File:Symmetrical_5-set_Venn_diagram.svg (accessed on 21 April 2017).
- Funanashi, M. Synthetic Modeling of Autonomous Learning with a Chaotic Neural Network International Journal of Bifurcation and Chaos. Int. J. Bifurc. Chaos
**2015**, 25, 1550054. [Google Scholar] [CrossRef] - Doya, K.; Ishii, S.; Pouget, A.; Rao, R.P.N. Bayesian Brain: Probabilistic Approaches to Neural Coding; The MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
- Yoshimura, J.; Clark, C.W. Individual adaptations in stochastic environments. Evol. Ecol.
**1991**, 5, 173–192. [Google Scholar] [CrossRef] - Harte, J. Maximum Entropy and Ecology: A Theory of Abundance, Distribution, and Energetics; Oxford University Press: Oxford, UK, 2011. [Google Scholar]
- Amari, S.; Nagaoka, H. Method of Information Geometry; American Mathematical Society: Providence, RI, USA, 2007. [Google Scholar]
- Rao, C.R. Information and accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc.
**1945**, 37, 81–91. [Google Scholar] - Murota, K. Matrices and Matroids for Systems Analysis; Springer: Berlin, Germany, 2000. [Google Scholar]
- Roy, S.S.; Dasgupta, R.; Bagchi, A. A Review on Phylogenetic Analysis: A Journey through Modern Era. Comput. Mol. Biosci.
**2014**, 4, 39–45. [Google Scholar] [CrossRef] - Brooks, R.A. Intelligence without representation. Artif. Intell.
**1991**, 47, 139–159. [Google Scholar] [CrossRef] - Gibson, J.J. The Ecological Approach to Visual Perception; Houghton Mifflin: Boston, MA, USA, 1979. [Google Scholar]

**Figure 1.**Schematic representation of the inter-subjective objectivity model. (

**a**) Relations between two subjectivities A and B, objectivity, inter-subjectivity between A and B, subjective–objective unity for A and B, and inter-subjective objectivity are depicted as inclusion relations between each other set. (

**b**) Development of inter-subjective objectivity as effective measurements of citizen science. As the inter-subjectivity increases along with the training of subjective–objective unity and inter-subjective feedbacks, the accuracy and reproducibility of measurement based on subjectivity can be assured by the convergence to inter-subjective objectivity.

**Figure 2.**Schematic representation of buoy–anchor–raft model. Buoy, raft, anchor, and connection rope refer to subjectivity, inter-subjectivity, objectivity, and subjective–objective unity, respectively. Concrete real-world examples are given in Table 1.

**Figure 3.**Schematic representation of complexity measures as non-linear feature space and search function as its inverse functions. (

**a**) Utility characteristics of a complex system, or complexity measure in general terms, is expressed with a complex configuration in parameter space. Parameters can also represent other complexity measures. (

**b**) Complexity measures transform parameter space into non-linear feature space, which provides easier interpretation by sorting the order of a given utility. The inverse functions of complexity measures therefore correspond to search functions with respect to the search condition on utility.

**Figure 4.**Numerical example of convolution ${\lambda}_{N}({r}_{N})$. For two kinds of probability measure ${\mu}_{1}$ (green distribution) and ${\mu}_{2}$ (blue distribution) on $r\subset \mathbb{R}$ (supported by black rug), the convolution ${\lambda}_{N}({r}_{N})$ with $N=2,4,8,16$ are shown with different colors based on random sampling of $600,000\times N$ points from $\frac{N}{2}$ pairs of ${\mu}_{1}$ and ${\mu}_{2}$. The case of $\Lambda =1$ is simulated, which shows the canonical convergence towards normal distribution following the central limit theorem with ${\sigma}_{N}\to \sqrt{N}\sigma $, where $\sigma =\frac{1}{2}({\beta}_{1}+{\beta}_{2})$ as defined in (33) and (A13). For simplicity, ${\nu}_{N}$ is adjusted to 0 by the symmetric selection of ${\mu}_{1}$, ${\mu}_{2}$, and r.

**Figure 5.**Schematic representation of the triplet order algorithm that calculates the total order of three observations with respect to the complexity defined on the pair-wise commonality between them. Three observations A, B, and C are expressed as vertices of triangle in a two-dimensional surface, whose edge lengths A–B, B–C, and A–C represent the commonality of each vertex pair. For simplicity, the triangles are projected as regular triangles, but the actual edge lengths generally differ, which provides the total order of edges. The six case statements of the algorithm are shown separately. Given the total order between the edges in blue magnitude relation, the corresponding total order of observations are depicted with orange axes at the side of each triangle. Orange axes superimposed with triangles signify that by orthogonally projecting the vertices onto them, the total order of vertices are obtained, whose generalization is developed in the Section 3.4. This holds for arbitrary three positive values of edge length without the constraint of triangular inequality, by considering appropriate projection of the triangles to a non-Euclidian surface.

**Figure 6.**Integration of two commonality orders. (

**a**) The correspondence between commonality orders I and $I\phantom{\rule{-1.00006pt}{0ex}}I$ (orange arrows) can be described as the permutation between N observations (black circles), providing the topology of I–I matching (green dotted line) and I–N compromise (blue dotted line); (

**b**) Affine space with respect to the commonality orders I and $I\phantom{\rule{-1.00006pt}{0ex}}I$ as coordinate system (orange arrows) for the resolution of I–N compromise. The I–N mean commonality order (red solid arrow) can be calculated from the pair-wise order algorithm (Section 3.3) applied on the commonality orders I and $I\phantom{\rule{-1.00006pt}{0ex}}I$, which makes the I–I matching identical to the I–I dimension (green arrow) and sets the mean order to I–N compromise. One I-N resolution dimension is required to resolve one I–N compromise (blue arrow). The implicit structure of the integrated commonality order with continuity assumption takes a complex form reflecting I–N compromises (red dotted arrow as an example), which corresponds to the complex utility configuration in Figure 3a; (

**c**) The general case with an arbitrary number of I–N compromises. Total commonality space of $N-1$ dimensions is divided between I–N resolution dimensions (blue arrows) and I–I dimensions (green arrow), between which I–N mean commonality order can be defined (red arrow). $k<N$ axes of I–N resolution dimensions are required to resolve k I–N compromises (blue arrows). Taking the I–I dimensions and I–N resolution dimensions as Affine coordinates, the integrated commonality order is projected onto the I–N mean commonality order as a simplest sorted order of utility, which corresponds to Figure 3b.

**Figure 7.**Topological hierarchy of commonality between observations. For example, five observations A, B, C, D, E are depicted with correspondence to the commonality order of each topological subset. The Venn diagram on the left represents the commonality structure within observation probability database ${\mathbb{X}}_{5}$ on variable $\mathbb{X}$ ($N=5$ in Section 4.2), where coincident observation is superimposed. The maximum commonality order is the projection between these topological subsets to the natural number $\mathbb{N}$ in right axis, describing the number of matching observations. Venn diagram cited from [34].

**Figure 8.**Numerical observation of the proof of Theorem 8. (

**a**) Chebyshev’s inequality (A41) and asymptotic convergence to $\mathcal{O}(LlogL)$ (A44) with respect to $N,M\ge 1$ $(N+M=L)$, $L=10,{10}^{2},{10}^{3}$. Y-axis is plotted with log scale. The equality in (A41) is given at $N=M=\frac{L}{2}$; (

**b**) Behaviour of $f(N)\sqrt{L}$, $f(M)\sqrt{L}$, and $f(N)f(M)L$ with respect to $L=10,{10}^{2},{10}^{3}$. For visibility, the Y-axis scale is given as ${log}^{2}({Y}^{-1})$ that represents smaller Y value to the bottom, and Y-axis label shows the value of $-logY$. The surface below the solid line $f(N)f(M)L$ represents the convolution multiplied by L, $f*f(L)L$. The mean value of solid line $f(N)f(M)L$ therefore corresponds to the upper limit of u that satisfies the polynomial constraint (45) with respect to given L. $c=d=2$ were used for the simulation.

**Figure 9.**Information geometrical optimization of diversity strategy portfolio with respect to actual distribution ${P}_{a}$, short-term management objective ${P}_{s}$, and long-term sustainability ${P}_{l}$. On a dual-flat statistical manifold based on Fisher information metric, each distribution is represented as a point (black circles). The m-geodesic is depicted with a blue line, while the e-geodesic is shown with a red line, which orthogonally cross at the optimized strategy ${P}_{o}$. Topological correspondence between complexity measure ${H}^{\prime}$ (aligned on left orange arrow) and diversity strategy portfolio (${P}_{a},{P}_{s},{P}_{l}$ and ${P}_{o}$) is shown with dotted lines with respect to the magnitude relation.

**Figure 10.**Results of inter-subjective and subjective–objective commonality orders in citizen observation of biodiversity. Seven people represented with numerical ID are aligned with commonality orders (

**a**) based on inter-subjectivity; and (

**b**) based on subjective–objective unity, which showed a 3.92% residual error probability regarding the rejection of the random order distribution hypothesis with respect to the binomial test (38).

**Table 1.**Examples of buoy, raft, and anchor in various social systems and scientific domains. Examples are not comprehensive, but a partial list of typical data from the recently increasing public availability.

Economy | Judiciary | Biodiversity Record | Medical Treatment | |
---|---|---|---|---|

Buoy | Demand, satisfaction | Sense of justice, guilt | Visual identification of species | Pain, psychological state |

Raft | Price, exchange rate | Law, court decision | Identification with voting | Diagnosis, prescription |

Anchor | Goods abundance | Evidential matter | DNA sequences | Physiological markers |

Section Number | 2.2 | 3.1 | 3.2 | 3.3 | 3.4 | 4 | 5 |
---|---|---|---|---|---|---|---|

Buoy | $\mathbf{B}$ | $\mu (\xb7)$,$I(\xb7)$ | ${\mu}_{i}(\xb7)$,${q}_{i}(\xb7)$ | Data contained in vertices V | Com. order I and $I\phantom{\rule{-1.00006pt}{0ex}}I$between N objects | Observations A, B, C, D, E | $P(\xb7)$, ${P}_{a}(\xb7)$, ${P}_{s}(\xb7)$, ${P}_{l}(\xb7)$, ${P}_{o}(\xb7)$, ${H}^{\prime}$ |

Anchor | $\mathbf{A}$ | ||||||

Raft | $\mathbf{R}$, $\mathbf{E}$ | ${\mu}^{\prime}(\xb7,\xb7)$, ${I}_{2}(\xb7,\xb7)$ | ${\lambda}_{N}(\xb7)$ | Edge attribute of E | Com. order I and $I\phantom{\rule{-1.00006pt}{0ex}}I$ b/w N observers, TDC, I-I and I-N res. dim. | $O(\xb7)$ | ${H}_{2}^{\prime}$, ${D}^{m}(\xb7:\xb7)$, ${D}^{e}(\xb7:\xb7)$ |

Buoy–Anchor | $\mathbf{C}$ | ||||||

Raft–Anchor | ${\mathbf{E}}^{\prime}$ |

**Table 3.**Algorithmic complexity for the calculation of commonality orders. With respect to the maximum commonality order in (39), an exhaustive number of combinations with the use of observation probability database ${\mathbb{X}}_{N}$ of size N and the time scale required for the sorting of the commonality measure is shown. Sorting time is based on the worst-case performance of canonical algorithms such as bubble sort and quick sort (polynomial degree $d=2$). $\mathcal{O}(\xb7)$ denotes asymptotic notation of Landau. $O(\mathbb{X})=\lfloor \frac{N}{2}\rfloor $ and $\lceil \frac{N}{2}\rceil $ require the maximum calculation and sorting time. Note that the total computation time is upper-bounded by the sorting process ($d=2$) than the combinatorics of commonality ($d=1$), though calculation time of each commonality such as convolution should be further considered in actual implementation.

Maximum Commonality Order $\mathit{O}(\mathbb{X})$ | Number of Combination | Sorting Time ($\mathit{d}=2$) |
---|---|---|

2 | ${}_{N}{C}_{2}$ | $\mathcal{O}({({}_{N}{C}_{2})}^{2})=\mathcal{O}({N}^{4})$ |

3 | ${}_{N}{C}_{3}$ | $\mathcal{O}({({}_{N}{C}_{3})}^{2})=\mathcal{O}({N}^{6})$ |

⋮ | ⋮ | ⋮ |

$\lfloor \frac{N}{2}\rfloor$ or $\lceil \frac{N}{2}\rceil$ | ${}_{N}{C}_{\lfloor \frac{N}{2}\rfloor}$=${}_{N}{C}_{\lceil \frac{N}{2}\rceil}$ | $\mathcal{O}\left({\left({}_{N}{C}_{\lfloor \frac{N}{2}\rfloor}\right)}^{2}\right)=\mathcal{O}\left({\left({}_{N}{C}_{\lceil \frac{N}{2}\rceil}\right)}^{2}\right)=\mathcal{O}\left({N}^{2\xb7\lfloor \frac{N}{2}\rfloor}\right)$ |

⋮ | ⋮ | ⋮ |

N | ${}_{N}{C}_{N}=1$ | $\mathcal{O}({({}_{N}{C}_{N})}^{2})=\mathcal{O}(1)$ |

© 2017 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).