Part–Whole Relations: New Insights about the Dynamics of Complex Geochemical Riverine Systems

: Nature is often characterized by systems that are far from thermodynamic equilibrium, and rivers are not an exception for the Earth’s critical zone. When the chemical composition of stream waters is investigated, it emerges that riverine systems behave as complex systems. This means that the compositions have properties that depend on the integrity of the whole (i.e., the composition with all the chemical constituents), properties that arise thanks to the innumerable nonlinear interactions between the elements of the composition. The presence of interconnections indicates that the properties of the whole cannot be fully understood by examining the parts of the system in isolation. In this work, we propose investigating the complexity of riverine chemistry by using the CoDA (Compositional Data Analysis) methodology and the performance of the perturbation operator in the simplex geometry. With riverine bicarbonate considered as a key component of regional and global biogeochemical cycles and Ca 2+ considered as mostly related to the weathering of carbonatic rocks, perturbations were calculated for subsequent couples of compositions after ranking the data for increasing values of the log-ratio ln(Ca 2+ /HCO 3− ). Numerical values were analyzed by using robust principal component analysis and non-parametric correlations between compositional parts (heat map) associated with distributional and multifractal methods. The results indicate that HCO 3− , Ca 2+ , Mg 2+ and Sr 2+ are more resilient, thus contributing to compositional changes for all the values of ln(Ca 2+ /HCO 3− ) to a lesser degree with respect to the other chemical elements/components. Moreover, the complementary cumulative distribution function of all the sequences tracing the compositional change and the nonlinear relationship between the Q-th moment versus the scaling exponents for each of them indicate the presence of multifractal variability, thus revealing scaling properties of the fluctuations.


Introduction
The relations between the whole and its parts have fascinated human beings throughout time, starting from the observation of nature (e.g., spirals of seashells, the form of flowers, proportions of the human body) to mathematical studies (e.g., the Fibonacci series, where the ratio of two subsequent terms tends to a constant irrational value equal to 1.618, the golden section, inversely proportional to 0.618) [1].
Part-whole relations are fundamental to governing the dynamics of a complex system and to moving from linear to nonlinear regimes. This occurs since the whole is not only given by the sum of the constituting parts, but it is affected by the nature of their complex interactions [2,3].
Compositional data are a typical example of relations between the whole (the composition) and the constituting parts (the concentration of single elements or chemical components) [4]. The nature of the link is given by the proportionality of the concentration measure unit as, for example, %, ppm, mg/L, meq/L and so on, so that each value is part (a proportion) of the same whole [5]. It is evident that univariate or bivariate analysis of such entangled data can give a partial idea of the natural processes affecting a given medium. In fact, a composition acts as a whole since all the variables respond to an environmental perturbation, thus giving birth to a new compositional state. The statistical analysis of this type of data should take into account this feature, and the application of multivariate methods is not sufficient if cases continue to move in the Euclidean world [6,7].
It is well known that a D-part composition is a (row) vector = [ 1 , 2 , … , ], where all its components are strictly positive real numbers and carry relative information. This means that the information is contained in the ratios between the components of the composition so that the numerical value of each isolate component is practically irrelevant [4]. The consequence is that compositional data pertain to a sample space called simplex S D (note that k depends on the unit of measurement or rescaling): where the Euclidean geometry, on which classical statistics is developed, is not valid [7]. The treatment of compositional data thus requires two possible strategies. The first one, called "stay in the simplex" approach, adopts the Aitchison geometry to work in a so-constrained sample space. The second one is based on the transformation of the data to move cases out from the simplex in the real Euclidean space, and it is known as "working in coordinates" [4,8,9]. In this work, our interest focused on the first approach and, in particular, on the use of the perturbation operator. The latter is one of the basic tools required to give to the simplex a vector space structure, an operator having a strategic role in monitoring changes [4,10]. Application examples of European riverine geochemical systems will illustrate the proposed procedure. The research question is whether the perturbation operator is able to give us information about the dynamic of compositional changes and to indicate similarity among chemical elements under this perspective, thus offering a new geochemical tool of investigation.

Monitoring the Compositional Change
The real space with its (Euclidean) geometry is a linear vector space with a metric structure. It is not a proper framework for compositional data and operations such as the addition of vectors or their multiplication by a constant or scalar value, or properties such as orthogonality or the calculus of the distance between points cannot be correctly performed [8,9]. If an analogous sensible geometry to work with compositional data is required, it is necessary to define basic operations to give a vector space structure to the simplex sample space. Two operations are able to achieve the aim; the first one is perturbation (⨁), which is analogous to addition in real space; the second one is powering (⨀), which is analogous to multiplication by a scalar in real space. The simplex ( , ⨁, ⨀) with perturbation and powering is a vector space, and with the definition of an inner product, norm and distance, it becomes a Euclidean vector space [7][8][9].
Consider two compositions x and y in the simplex sample space S D . The perturbation of x by y, representing the addition of two vectors (shifting operation), is a composition defined as where C is the closure operator given by with closure constant (1, 100, 10 6 , …). The ⊖ perturbation operation (perturbationsubtraction) is obtained by a component-wise division of the elements of the x and y vectors, thus representing the analogous of the difference operation in the ℝ (real) space.
The perturbation operation appears to have interesting features for the investigation of the dynamic of compositional changes. In fact, in several geochemical processes, change may be modeled by perturbation mechanisms when an initial specimen of composition 0 is subjected to a sequence of perturbations 1 , 2 , … , and then reaches its current state [4,10]: so that The perturbation operator could be a strategic tool for the investigation of the complexity of the critical zone (CZ), the near-surface layer of the Earth where the interlink between different reservoirs is at the base of the biogeochemical cycles of the elements. Its role could be particularly interesting in the understanding of compositional changes for riverine systems, where reactions work in thermodynamically open conditions [11].
By considering our proposal, compositional data pertaining to river chemistry are ranked under some physical-chemical hypothesis and the ⊖ perturbation-subtraction calculated between subsequent compositions. The original data set × (n number of cases, D number of variables) is transformed in the ( − 1) × perturbation-subtraction matrix. Each column of the matrix is a compositional signal tracing the contribution of a chemical component to the change. As an alternative, the matrix perturbation-subtraction can be determined by considering the compositional difference from the same composition as a reference (e.g., seawater, spring water or pristine water), or by using time or space to define sequences of observations. After this step, perturbations can be visualized in three different ways: as compositions expressed in percentage (%), as multiplicative (non-closed) factors and as percentages of increase/decrease, the last one being the traditional form of presenting perturbations [12]. Each of these ways is able to give us useful information about the dynamic governing the cycle of the elements, their mobility and the effect of environmental factors on their geochemistry [13].

The Data Set
In order to present a general application of the proposed methodology, stream water data of the FOREGS (Forum of European Geological Survey) repository were chosen. The database has largely contributed to defining standardized methods of sampling, chemical analysis and data management, thus offering a reference framework for the understanding of the biogeochemical cycles of the elements. Details, references and data sets can be found at the website of the project (http://weppi.gtk.fi/publ/foregsatlas/). By considering the stream waters, the main chemical composition (HCO3 − , Cl − , SO4 2− , NO3 − , K + , Na + , Mg 2+ , Ca 2+ , SiO2 and Sr 2+ in mg/L) of 805 cases was studied. Each sample represents a drainage area of 100 km 2 , corresponding to a density of about one sample per 4700 km 2 . Running stream water samples were collected from small second-order drainage basins (<100 km 2 ) at the same site as the active stream sediment. The sampling, whenever possible, was carried out during the winter and early spring months.

Ranking Data and Perturbation Calculus
The FOREGS repository does not contain adequate information to rank data by considering time or any other non-compositional criteria to obtain a sequence of observations. Thus, a certain geochemical choice is needed to order the waters. Since riverine bicarbonate (HCO3 − ) is a key component of regional and global biogeochemical cycles, the matrix could be ordered by considering increasing values of this chemical component. It mainly derives from mineral weathering with carbonic acid and globally represents an important sink for atmospheric CO2 [14,15]. Anthropogenic activities have greatly modified fluxes of elements in land-ocean-atmosphere systems, even though impacts on riverine carbon fluxes remain poorly resolved [15][16][17][18]. Interesting compositional changes are expected to be detected on a European scale and should be able to provide information about the behavior of this chemical component inside the composition it belongs to, enhancing the parts-whole relationships. However, under CoDA, this aim can be achieved by ranking the data by considering the increasing values of the log-ratio ln(Ca 2+ /HCO3 − ). Like HCO3 − , Ca 2+ in river water is almost entirely due to rock weathering, and the sources consist mainly of carbonate rocks containing calcite and dolomite, with a lesser proportion derived from Ca-silicate minerals and a minor amount from sulfate minerals [19]. As reported in the FOREGS Atlas (http://weppi.gtk.fi/publ/foregsatlas/), bicarbonate is the most abundant anion having a dominant role in determining the electrical conductivity. It ranges from <5 to 730 mg/L (excluding an outlier of 1804 mg/L) with a median of 126.4 mg/L. Calcium values in stream water range over three orders of magnitude, from 0.02 to 592 mg/L, with a median value of 40 mg/L, and show a correlation with pH, conductivity and bicarbonate.

Results and Discussion
After having ranked the data matrix for increasing values of the log-ratio ln(Ca 2+ /HCO3 − ), the +1 ⊖ perturbation-subtraction matrix was determined. The perturbation data matrix was expressed in the real space by using the centered log-ratio (clr) transformation to permit further statistical analysis [4]. The clr-transformation represents a D-part composition using D clrcoefficients, and it is defined as where ( ) = (∏ =1 ) 1⁄ stands for the geometric mean. The results for 804 cases are reported in Figure 1. Each subplot is related to the dominance of one chemical component over the geometric mean of the whole composition (according to interpretation of the clr coefficients) ranked by the increasing ln(Ca 2+ /HCO3 − ) values. The first features emerging from the plot inspection are the wide oscillations that characterize the clr-transformed values for the NO3 − perturbation, with an intermittent behavior for the entire range of the increasing values of the log-ratio ln(Ca 2+ /HCO3 − ). This indicates that on a European-scale stream, water compositional changes are governed by the variability of NO3 − and, consequently, by pollution phenomena able to hide other geogenic signals. Nitrate concentrations in large rivers worldwide are intimately related to human population density [20][21][22], and a similar result was found for the river chemistry of the Alpine region and the Tiber River basin in central Italy [11,23]. Thus, this condition appears to also affect running stream water samples collected from the small second-order drainage basins on a European scale, confirming that diffuse pollution is altering the nitrogen cycle on different scales [24]. With the aim of highlighting compositional changes in geogenic signals, the analysis was performed after having eliminated the NO3 − contribution. The obtained results are reported in Figure 2. As we can see, now the sequential signal of the variables is clearer, and all the variables related to the carbonatic cycle (HCO3 − , Ca 2+ , Mg 2+ and mostly Sr 2+ ) show lower fluctuations. In this framework, Cl − is the variable that presents the relative higher fluctuations, followed by SiO2(aq) and K + . Thus, the analysis appears to be able to discriminate two groups of variables: one given by HCO3 − , Ca 2+ , Mg 2+ and mostly Sr 2+ that is more resilient to the carbonate weathering increases, as represented by the log-ratio Ca 2+ /HCO3 − , and another given by Cl − , SO4 2− , K + , Na + and SiO2(aq) that is more sensitive to the environmental changes traced by it. This condition could reflect on a European scale the different responses of carbonate and silicate lithologies to runoff and relief [19]. Monitoring plans to point out important changes, also to be related to climate, should consider the more resilient variables.
The application of the run-test to the clr-transformed perturbation-subtraction data indicates the presence of non-random behavior with respect to the median value (run-test, p < 0.01) for all the sequences. This reveals that the stationarity of compositional changes for all the variables when ln(Ca 2+ /HCO3 − ) increases is not a sustainable hypothesis and that the concentration-discharge relationship has not canceled or obscured the effect of the biogeochemical processes on a European scale. Rates of solute production and/or mobilization that are nearly proportional to water fluxes, on both event and inter-annual time scales, lead to a chemostatic behavior of solute concentrations [25][26][27][28]. This occurs when solute concentrations can vary little relative to changes in discharge, depending on catchment features (e.g., lithology, geomorphology, use of the soil) and chemical properties of the solute (e.g., availability, reactivity, mobility) [29]. However, recently this concept has been revised [30] considering the effect of phenomena working at different scales (groundwater storage and fast chemical reactions versus critical zone evolution driven by climatic change). The compositional changes revealed here for each chemical component could reflect this condition.
Similarities in compositional changes among variables could be better identified by the investigation of the biplot of Figure 3, obtained through a robust principal component analysis [31,32]. The biplot takes into account 65% of the data variability, with a value of about 40% related to the first component (PC1). The biplot confirms the discrimination of the variables in two different groups, as previously reported, also adding SO4 2− to the carbonatic group, and highlights the strong relationship between Ca 2+ and HCO3 − (co-linear vectors) during processes affecting compositional changes. The results are also fully confirmed by the correlation analysis for compositional data based on the non-parametric Spearman correlation coefficient according to the method developed by [33]. The method takes into consideration all relative information of two variables (compositional parts) to that of the remaining variables and constructs orthonormal coordinates, for which standard correlations are computed. This is repeated for all pairs of variables, and the results are visualized by means of the so-called "heat map" in which chemical variables are rearranged according to their similarity using a hierarchical cluster analysis [33]. The results reported in Figure 4 confirm the presence of two distinct associations of chemical components, which are positively correlated in their chemical changes: (i) Na + , Cl − , K + and SiO2(aq) and (ii) Ca 2+ , HCO3 − , Sr 2+ , Mg 2+ and SO4 2− .
When in the evolution of dynamic systems, dissipative processes should not be negligible, and intermittent behavior is a typical result [34,35]. It implies a tendency for a chemical variable to concentrate into small-scale features of high intensity surrounded by extended areas of less intense fluctuations in temporal and spatial domains. Large fluctuations in intermittent processes are not as rare as those in Gaussian processes and contribute to the modification of statistical moments leading to multifractality [36]. A way to check for this behavior is to investigate the shape of the frequency distribution to search for power laws. If X has a power-law distribution, then in a log-log plot of [ ≥ ]-the complementary cumulative distribution function-asymptotically, a straight line will appear. Plotting the cumulative number ≥ versus x in log-log coordinates enables a comparison of the presence of the power-law distribution versus the normal (Gaussian), log-normal or exponential ones that show distinctly curved graphs [37]. The result for the clr-perturbation-subtraction values is reported in Figure 5, with the log-normal model represented by the continuous line [12,13,37]. As we can see, a log-normal model could work for most of the compositional changes, particularly when the values are low and dilution prevails, describing an interaction-dominant dynamic [38]. However, the fragmentation of the curve could reveal the presence of a multifractal pattern and changing scaling relations [36]. In this case, multiplicative interactions are associable with feedback loops across different scales, thereby modifying dynamics and resilience to change.
With the aim of verifying if compositional changes exhibit nonlinear power-law behavior that depends on higher-order moments and scale, an explorative multifractal analysis was performed. If power-law scaling exists for various statistical moments at different scales, the process is multifractal, and the scaling exponents are not a linear function of the moments of the distribution [36,[39][40][41]. The Q-th moment versus the scaling exponents plot is reported in Figure 6 [41][42][43]. Departures from linearity are evaluable for each variable, thus confirming the tendency towards multifractality, although with different patterns [44,45].

Final Thoughts and Conclusions
Geochemical elements are constituents of a composition, and their behavior cannot be analyzed by separating each part from the others. The risk is the loss of the information related to the laws governing the interdependence and the emerging of a complex dynamic. The only way to investigate the joint behavior of all the elements of a composition is to adopt multivariate methods, but this strategy could be insufficient if the Euclidean geometry does not represent the natural support of compositional data. If the "stay in the simplex" approach under CoDA is chosen to investigate compositional data, the perturbation operator is an interesting tool to probe the nature of the chemical change [4,10], adding valuable information about the dynamics of a geochemical system.
A signal related to the compositional change can be obtained after having ranked data by using some criteria and applying the difference perturbation at subsequent couples of compositions. When the compositional changes of the joint distribution of HCO3 − , Cl − , SO4 2− , NO3, K + , Na + , Mg 2+ , Ca 2+ , SiO2 and Sr 2+ (mg/L) from the European stream waters of the FOREGS repository are analyzed, after having ordered the cases for the increasing values of the log-ratio ln(Ca 2+ /HCO3 − ), interesting features emerge that cannot be deduced from a classical non-compositional analysis. The chosen log-ratio is an important index of alkalinity and carbonate weathering.
First of all, our results indicate that N-bearing species, and in particular NO3 − , due to their high variability also in compositional changes, are able to obscure variations in the multivariate geogenic signal [11,23,24]. When the data are cleaned up from this contribution, the plot of the relative fluctuations is clearer, and it is possible to appreciate the contribution of each chemical component in governing the compositional change. The application of the run-test indicates that the sequences are not stationary with respect to the increase of the log-ratio ln(Ca 2+ /HCO3 − ). Consequently, the hypothesis of chemostatic behavior is not sustainable, and the concentration-discharge relationship has not canceled or obscured the effect of the biogeochemical processes on a European scale [25][26][27][28]30]. However, some variables (HCO3 − , Mg 2+ , Ca 2+ and Sr 2+ ) appear to be more resilient to change (lower contribution to compositional changes for the entire range of values of ln Ca 2+ /HCO3 − ) with respect to the others. These variables could be used to construct potential indices for monitoring plans on a long timescale to also trace the effect of climate change on water-rock interactions (early warning signals). Moreover, as revealed by the robust principal component analysis and the heat map of correlations, both applied on the perturbation-subtraction matrix, variables aggregate each other for similar dynamics to HCO3 − , Mg 2+ , Ca 2+ , SO4 2− and Sr 2+ versus Cl − , K + , Na + and SiO2(aq), confirming the different resilience to change previously reported.
The complementary cumulative distribution function of the clr-perturbation-subtraction values suggests that most of the variables move thanks to an interaction-dominant dynamic, particularly for the lower values [38]. However, the nonlinear relationship between the scaling exponents versus the Q-th moments of the distribution indicates multifractality, which is the potential existence of different scaling regimes of the compositional changes at different spatial/temporal scales. These outcomes can be interpreted as a result of the superimposition of several processes acting at multiple spatial scales since sampling covered a reduced time interval. However, the scale-invariant nature of spatial patterns can also be considered an indication of self-organization, underlying the possible role of scale-dependent feedback in geochemical processes. In this perspective, for HCO3 − , Ca 2+ , Mg 2+ and Sr 2+ , more resilient to compositional changes, multifractality could be associated with the capacity to organize a complex network of interactions as an efficient answer to the different environmental factors. This condition could be favored by the rapid dissolution reaction of carbonates with respect to silicates, to the maintenance of near-saturation conditions in river waters and to the diffusion of carbonatic lithotype [14,19]. For the other variables, Cl − , K + , Na + and SiO2(aq), the intermittency in space could reveal heterogeneity and, consequently, a maintained fingerprint of environmental factors as, for example, the difference in the climate of the drainage basin and geology.
In any case, multifractality indicates the presence of dissipative systems working far from the thermodynamic equilibrium since when a field is intermittent, the structure of its energy dissipation is not homogeneous, but it is intermittent too [35,46,47]. This one appears to be a profitable path of research in light of the compositional data analysis approach.

Conflicts of Interest:
The authors declare no conflict of interest.