A Computer Aided Approach for River Styles—Inspired Characterization of Large Basins: A Structured Procedure and Support Tools

: This paper presents a systematic procedure for developing a characterization and classiﬁcation of river reaches inspired by the River Styles Framework, through which insight can be gained about the understanding of river behavior. Our procedure takes advantage of several computer based “tools”, i.e., algorithms implemented in software packages of various types, from ”simple” Excel sheets to sophisticated algorithms in Python language, in general all supported by Geographic Information Systems (GIS). The main potentially useful, existing tools for this speciﬁc aim are discussed here, revealing their strengths and weaknesses. New, complementary or alternative tools that have been developed in the project feeding this paper are presented, which can contribute to the scientiﬁc community and stakeholders of the topic. The main result of our research is a structured and practical guide (a ToolBox Manual) that can support practitioners and researchers wishing to characterize and classify large rivers, based on the River Styles Framework. The main contribution is that this set of ideas, solutions, and tools, makes this type of exercise signiﬁcantly more transparent and at the same time much less subjective. Moreover, the procedure is applicable to large systems and does not require more information than that generally available also in developing or emerging countries.


Introduction
Today, remote sensing offers unprecedented opportunities to analyze rivers, particularly in large basins. Piégay et al. [1] present a very complete, articulated, and up-to-date panorama of the techniques, recent experiences, and emerging research challenges that use or are related to remotely sensed data for geomorphic river characterization and management and, particularly, to investigate past, present, and future fluvial corridor conditions and processes. Most of these very refined and complex techniques and experiences relate to the identification of geomorphic units (e.g., main channel, bars, islands, etc.) or the measure of some features (e.g., topographic, vegetation structure); others are focused on the search for (cause-effect) relationships between natural or anthropogenic factors

•
The Riverscapes consortium https://www.riverscapes.xyz/, a collaboration between fluvial researchers in the United States of America, United Kingdom, New Zealand, Australia, Italy, France, and Spain, particularly with its ToolBox RCAT (http://rcat.riverscapes.xyz/), which includes: the Valley Bottom Extraction Tool (V-BET), the Riparian Vegetation Departure (RVD) tool, the Riparian Condition Assessment (RCA) tool, and the Riparian Recovery Potential (RRP) tool.

•
The Fluvial Corridor ToolBox (FCT), by the group led by Hervé Piégay at the University of Lyon (France); currently, they are developing a version within the QGIS 3.x interface compiling scripts written in Python (https://github.com/tramebleue/fct).

•
The River Meandering Migration Software (RVRMeander; http://www.rvrmeander.org/) of the University of Illinois, University of Pittsburgh, and USDA(U.S. Department Of Agriculture), with applications in particular to the Amazon River by Jorge Abad.
The use of most of these tools, however, is typically quite complicated and delicate. They generally require a Python installation that often generates difficulties. They always require a human refinement of their outputs, which introduces a hidden degree of arbitrariness. Some (such as RCAT) are explicitly designed to be fed with the standard information available in the USA. Others (e.g. the RVRMeander) are explicitly aimed at meandering rivers. Several are in principle interchangeable because they address the same task, although with some differences. No single one, however, embeds all of the functionalities needed to carry out a River Styles analysis. Finally, most of them are simply not applicable in cases characterized by scarce information, including low resolution geographic data (particularly Digital Elevation Models-DEM).
The most applicable tool to support RS analysis, amongst those listed, is certainly the Fluvial Corridor ToolBox. However, its actual usefulness for performing RS analysis is definitely much reduced, for several reasons, with respect to the potentialities that were originally been announced in the FCT Project. On the one hand, the sequence of sub-tools that appears when installing it (within ArcGIS) does not correspond to the actual sequence of use, a quite confusing fact. Additionally, several steps and functionalities are not relevant for RS analysis and thus must be skipped, while, conversely, other key functionalities are not supported (e.g., confinement or RS classification). Another issue is that the computation of sinuosity within the FCT is based on the polyline connecting inflection points of the active bed centerline; in some cases, this may result in a polyline external to the valley bottom (VB), which should not happen. Finally, the key Hubert test for identifying homogenous reaches does not work with non-cardinal indicators, which are nonetheless often involved in RS analysis. Moreover, several "tricks" need to be known; to a novel user, many of these may appear to be unsolvable snags, leading him/her to quit. This paper addresses mainly the community of practitioners by providing a guide through the different elaborations needed when conducting a RS exercise. Its main contribution in this sense is providing a procedure that allows to link all the required steps in a structured, operational form. In such a way, on the one side, the role for expert judgment becomes better defined, so that the outputs are structurally more objective; on the other side, the procedure makes RS characterization-classification viable also for large rivers, by taking advantage of computer-aided tools. The paper may also be of interest to the scientific community, as it addresses some challenging conceptual problems concerning the holistic synthesis of reductionist information that lie at the core of the automated analysis of complex information. We are aware, anyway, that the main added value of this paper can be appreciated by those who know and have adopted (or would like to adopt) the River Styles Framework.
In what follows, we first present the essence of the proposed procedure (details are provided in Appendix A); then, some of the newly developed tools are presented, together with demonstrative applications to the Magdalena River case study (Colombia); finally, conclusions are drawn. Figure 1 presents a summary view of the steps we consider necessary to conduct a RS analysis (Stage I, steps 1 and 2 of the RS Framework), while Table A1 in Appendix A presents a much more detailed picture. For each step, we indicate the tool/procedure suggested; in some cases, several options are proposed, depending on the original information available. The list does not claim to be exhaustive (see also [1,3,4,[11][12][13][14][15] for additional options); rather, its purpose is to give an idea of the complexity of a full RS analysis exercise, and to locate within the process the additional tools we explicitly developed, which are described in detail in the ToolBox Manual [16].

Systematizing a River Styles Inspired Characterization-Classification Exercise
Geosciences 2020, 10, x FOR PEER REVIEW 5 of 27 Figure 1. Scheme of the whole procedure proposed for the River Styles inspired characterization and classification of rivers within a basin (the Procedural Tree is the structured set of attributes adopted in the River Styles Framework (RS) analysis to characterize river reaches; Proforma is the standardized document adopted again in the RS analysis to described in a more detailed fashion each particular River Style found. Additional details are provided in Appendix A). FCT = Fluvial Corridor ToolBox.
It is worth noting that the first main product of this paper is Table A1 in Appendix A, which presents the full procedure we propose.
In the following section, we address just three "tools" amongst those developed and included in Table A1, in order to give an idea of the type of contribution this paper (and the underlying GeoMagda project) brings in. To illustrate some of the concepts and type of results we utilize the Magdalena case study developed in [16,17]. What follows should not be interpreted, however, as the application to the Magdalena of the full procedure; rather, the focus here is using the Magdalena data just to illustrate how these "tools" work (any other data would serve this purpose). A synthesis of Figure 1. Scheme of the whole procedure proposed for the River Styles inspired characterization and classification of rivers within a basin (the Procedural Tree is the structured set of attributes adopted in the River Styles Framework (RS) analysis to characterize river reaches; Proforma is the standardized document adopted again in the RS analysis to described in a more detailed fashion each particular River Style found. Additional details are provided in Appendix A). FCT = Fluvial Corridor ToolBox.
Expert judgment is involved on one hand in defining criteria to be adopted within the various steps (e.g., defining the Procedural Tree; or defining how to distinguish a planform type from another) and, on the other, after the procedure has been applied, in interpreting river behavior. This exercise must be accompanied by an analysis of the river's historical behavior and recent adjustments (not included herein).

of 27
As anticipated, the added value of this procedure can be fully appreciated by those who know and have used (or would like to adopt) the River Styles Framework. The innovations are evident only when analyzing the details, but consist mainly of: (a) Modifying the Procedural Tree, by "locating first what weighs more", in such a way that a hierarchical classification can be obtained that can manage all sub-types generated in particular by the assemblage of the different geomorphological units (which can give rise to a large number of sub styles). (b) Identifying objectively the (variable length) reaches according to the spatial intersection of only the set of intensive-type attributes of the Procedural Tree (i.e., those not depending on the extension of the length of the considered segment) and assessing all the extensive-type attributes as spatial statistics over such reaches. This idea is further explained in [16][17][18]. (c) Relying on a spatial database (called "skeleton") obtained by segmenting the valley bottom in sub-segments along its axis, and determining a value for each relevant attribute in each sub-segment. (d) Specifying a set of tools that ensure the ability to also deal with large river systems. The proposed tools (Appendix A) are a particular choice out of a virtually infinite (or at least quite large) number of alternatives; but having defined at least a tool set that does cover all the demanding steps (elaborations of extensive information and computational steps) brings in an added value as, without them, several snags might prevent a candidate user from applying the RS approach. (e) Proposing a sequence of operations that covers the whole exercise, ensuring a linkage amongst steps and avoiding redundancies.
Concerning aspect (d), the ToolBox Manual developed within the GeoMagda project provides quite detailed instructions on how to solve a number of difficulties; this might be superfluous to the trained GIS expert, but precious to a broader arena of users. It is worth stressing that the tools we propose require just a basic expertise in computer use, particularly GIS and Excel; this has a negative facet, as advantage is not taken of more powerful possibilities, but it also has a positive facet, as broadens the arena of candidate users and to improvements.
It is worth noting that the first main product of this paper is Table A1 in Appendix A, which presents the full procedure we propose.
In the following section, we address just three "tools" amongst those developed and included in Table A1, in order to give an idea of the type of contribution this paper (and the underlying GeoMagda project) brings in. To illustrate some of the concepts and type of results we utilize the Magdalena case study developed in [16,17]. What follows should not be interpreted, however, as the application to the Magdalena of the full procedure; rather, the focus here is using the Magdalena data just to illustrate how these "tools" work (any other data would serve this purpose). A synthesis of the Magdalena case study is available in Nardini et al. [17], where we show the essential outputs obtainable by applying the procedure described herein; details can be found in [16,17]. We warmly suggest that this paper be read together with the case study (and also with Nardini et al. [18] when it be published) because they are indeed highly complementary. From a methodological point of view, Nardini et al. [17] present the essential contents of a River Styles characterization-classification exercise, while here we present the organized collection of technical steps involved.
It is important that the reader not get lost in the following technicalities: the essential message we would like to convey is that the procedure we have developed addresses several points similar to those discussed below, proposing conceptual and operational solutions to overcome any issues, while simultaneously ensuring that all steps are linked in a harmonic fashion.

Confinement
Confinement, as adopted in the RS Framework, expresses the extent to which the river abuts either valley bottom (VB) margin. It is a very meaningful attribute because this condition strongly characterizes the dynamic behavior of the river itself. It should not be confused with "entrenchment", i.e., the ratio between active channel width and VB width [19].
To assess confinement, the objective, relevant base information is the set of contacts between the active channel (or, more generally, the envelope of active channels) and the VB margins. However, to meaningfully characterize a reach in terms of confinement, one has to rely on a statistical, categorical indicator (% of either bank's contact lengths over the corresponding axis length) according to some given convention (according to Brierley and Fryirs,[2]: laterally "unconfined" when total contact is less than 10%; "confined", when is over 85%; "partly confined", when in between) over a given reach (product of a segmentation). Indeed, the result strongly depends on the segmentation adopted, as shown in Figure 2.
Geosciences 2020, 10, x FOR PEER REVIEW 6 of 27 It is important that the reader not get lost in the following technicalities: the essential message we would like to convey is that the procedure we have developed addresses several points similar to those discussed below, proposing conceptual and operational solutions to overcome any issues, while simultaneously ensuring that all steps are linked in a harmonic fashion.

Confinement
Confinement, as adopted in the RS Framework, expresses the extent to which the river abuts either valley bottom (VB) margin. It is a very meaningful attribute because this condition strongly characterizes the dynamic behavior of the river itself. It should not be confused with "entrenchment", i.e., the ratio between active channel width and VB width [19].
To assess confinement, the objective, relevant base information is the set of contacts between the active channel (or, more generally, the envelope of active channels) and the VB margins. However, to meaningfully characterize a reach in terms of confinement, one has to rely on a statistical, categorical indicator (% of either bank's contact lengths over the corresponding axis length) according to some given convention (according to Brierley and Fryirs,[2]: laterally "unconfined" when total contact is less than 10%; "confined", when is over 85%; "partly confined", when in between) over a given reach (product of a segmentation). Indeed, the result strongly depends on the segmentation adopted, as shown in Figure 2. The main novelty of the algorithm we propose to define confinement lies in determining it after reaches are defined. In fact, this position overcomes an intrinsic logical problem of the traditional approach: in the RS framework, reaches displaying a recognizable and meaningful pattern are identified as the output of a process based on assessing the set of attributes included within the RS Procedural Tree; but the first of such attributes is indeed confinement, which requires-as we have shown-that segments (i.e., reaches) be pre-defined.
Additionally, our algorithm determines an explicit statistical confinement indicator as well as another important attribute which is the cause of confinement, once the polylines of the VB margins separated in categories (i.e., valley margin, planform, and infrastructure) are provided. Furthermore, it brings a clear advantage from a practical point of view: it is accessible to anyone able to use ArcGIS (or QGIS) and Excel. The main novelty of the algorithm we propose to define confinement lies in determining it after reaches are defined. In fact, this position overcomes an intrinsic logical problem of the traditional approach: in the RS framework, reaches displaying a recognizable and meaningful pattern are identified as the output of a process based on assessing the set of attributes included within the RS Procedural Tree; but the first of such attributes is indeed confinement, which requires-as we have shown-that segments (i.e., reaches) be pre-defined.
Additionally, our algorithm determines an explicit statistical confinement indicator as well as another important attribute which is the cause of confinement, once the polylines of the VB margins separated in categories (i.e., valley margin, planform, and infrastructure) are provided. Furthermore, it brings a clear advantage from a practical point of view: it is accessible to anyone able to use ArcGIS (or QGIS) and Excel.
Finally, as confinement is determined for any given segmentation, it is possible and meaningful to provide the output at different scales; particularly: (ii) basin scale (macro): indeed, it is possible to adopt a very simple and effective criterion to identify macro segments, e.g., where a significant change of VB width (V) occurs (a fact that can be detected by the Hubert Test in the FCT or simply...visually) and the resulting macro confinement is very meaningful (Figure 3). Finally, as confinement is determined for any given segmentation, it is possible and meaningful to provide the output at different scales; particularly: i) reaches (as usually done in the RS); ii) basin scale (macro): indeed, it is possible to adopt a very simple and effective criterion to identify macro segments, e.g., where a significant change of VB width (V) occurs (a fact that can be detected by the Hubert Test in the FCT or simply...visually) and the resulting macro confinement is very meaningful (Figure 3). As O'Brien et al. [6] have already presented an algorithm, with a corresponding GIS Python software tool, to determine the confinement indicator within the RS context, it is important to recall it and point out the differences with our approach. That algorithm goes as follows: o it determines the contacts-both margins-between active channel and "inflated" (buffer) VB margin through a GIS intersection; As O'Brien et al. [6] have already presented an algorithm, with a corresponding GIS Python software tool, to determine the confinement indicator within the RS context, it is important to recall it and point out the differences with our approach. That algorithm goes as follows: • it determines the contacts-both margins-between active channel and "inflated" (buffer) VB margin through a GIS intersection; • these contacts are projected onto the reference axis with a binary value (0 for no contact; 1 for contact at either bank); • the axis is discretized in "short-segments" (identified by k) of a pre-fixed length; • the algorithm goes on by computing locally (for each slice) the statistical indicator c(k) as the total length of the projected contacts (either banks) falling in it, over the segment length; • finally, it assumes that when the indicator c(k) is 0.50 < c(k) < 0.85 the cause of confinement is the valley margin, while when it is 0.10 < c(k) < 0.50 the cause is a planform geomorphic unit.
O'Brien's algorithm implements a peculiar way to obtain a holistic synthesis, starting from reductionist analytical information associated with short river segments. We found that this may lead to inconsistencies (as shown in Figure 4) or even mistakes (possibly due to our inappropriate use?), when the river is anastomosed with a very wide envelope of the active channels ("AC envelope" in what follows) (see Figure 5). Moreover, the assumption concerning the cause of confinement is evidently arbitrary and, furthermore, cannot include the frequent case of confinement by infrastructure (e.g., railways, roads, levees). Finally, it does not compute an explicit statistical indicator: the classification is obtained only by assigning a color scale. In general, the algorithm produces a somehow hysterical sequence of different confinement categories (in this case: mostly laterally unconfined with two shorter segments partly confined) and is not able to capture a more holistic character (indeed, we would classify the whole stretch as partly confined). Notice in Figure 4 that while the output of the algorithm is reported along the reference axis adopted (colored polyline axis: dark green: unconfined; light green: partly confined planform controlled; orange: partly confined valley margin controlled; red: confined (not detected in this figure along the axis)), to assess confinement one has to look at the blue polygon representing the envelope of active channels (which is wide or very wide where the river is anabranching or anastomosed), and identify sectors where it touches either margin of the VB polygon (grey area). Only the red contacts count because even though other lines seem in contact, visually, from a geographical point of view (GIS) they are not: sometimes, indeed, the difference amounts to a few meters only, but it is still larger than the tolerance adopted in the buffer used by the algorithm.  [6] algorithm, which reports a long river stretch as "confined" (red axis; green represents "laterally unconfined", instead) while in reality it only presents two small contact points (small red dotted circles), one at its beginning (top, right, corresponding to the beginning of the red stretch) and one at its end (bottom, left). The blue area is the envelope of the Active Channels (AC) that in this particular case cover a very wide area as the river is anastomosed. The pale blue-grey area is the valley bottom.
The algorithm we developed in GeoMagda overcomes such difficulties. As already stated, it assumes that the reaches over which the confinement statistical indicator is to be computed are already defined. This is the basic difference with O'Brien's algorithm.
Other differences, which constitute additional advantages, are as follows: o it is accessible to any user of ArcGIS (or QGIS) and Excel; o it computes an explicit statistical confinement indicator; o it determines the cause of confinement, if provided with the polyline of the VB margins separated in categories: valley margin, planform, or infrastructure.
Our algorithm undoubtedly suffers from some limitations: It is designed for isolated rivers (not a river network); the current version is based on the number of discretized short-segments with a contact (the "numerosity"), not on the actual contacts length; and it is not automated, which means that it requires a rather large number of manual (but not expert based) steps that, as a whole, can be quite cumbersome. However, with a sufficiently fine discretization, there is no actual gain in considering the length of the short segments rather than their numerosity; and automatizing such a procedure is perfectly possible.
In Appendix B we provide details of the algorithm, so that it can be replicated step by step; Figure 6 provides a general idea of the first part, related only to confinement.  [6] algorithm, which reports a long river stretch as "confined" (red axis; green represents "laterally unconfined", instead) while in reality it only presents two small contact points (small red dotted circles), one at its beginning (top, right, corresponding to the beginning of the red stretch) and one at its end (bottom, left). The blue area is the envelope of the Active Channels (AC) that in this particular case cover a very wide area as the river is anastomosed. The pale blue-grey area is the valley bottom.
The algorithm we developed in GeoMagda overcomes such difficulties. As already stated, it assumes that the reaches over which the confinement statistical indicator is to be computed are already defined. This is the basic difference with O'Brien's algorithm.
Other differences, which constitute additional advantages, are as follows: • it is accessible to any user of ArcGIS (or QGIS) and Excel; • it computes an explicit statistical confinement indicator; • it determines the cause of confinement, if provided with the polyline of the VB margins separated in categories: valley margin, planform, or infrastructure.
Our algorithm undoubtedly suffers from some limitations: It is designed for isolated rivers (not a river network); the current version is based on the number of discretized short-segments with a contact (the "numerosity"), not on the actual contacts length; and it is not automated, which means that it requires a rather large number of manual (but not expert based) steps that, as a whole, can be quite cumbersome. However, with a sufficiently fine discretization, there is no actual gain in considering the length of the short segments rather than their numerosity; and automatizing such a procedure is perfectly possible.
In Appendix B we provide details of the algorithm, so that it can be replicated step by step; Figure 6 provides a general idea of the first part, related only to confinement. Geosciences 2020, 10, x FOR PEER REVIEW 10 of 27 Figure 6. Steps for calculating Confinement: (1) Intersection between the valley bottom (VB) and a buffer of the Active Channel (AC) to determine contacts; (2) Determination of the correspondence between contacts and segmented VB (by applying mainly the FCT and obtaining the Discretized Geographical Objects (DGO)); (3) Determine the spatial correspondence between VB segments and contacts using the GIS tool "Spatial join"; (4) Determine the Confinement Type by applying a statistical rule using an Excel spreadsheet on the attribute table extracted from the segmented VB shapefile.

A Logical Heuristic "Reductionist Holistic Algorithm" for Mono-Dimensional Variables
We address here the need to obtain a holistic synthesis from reductionist information. The expression "Reductionist  Holistic" refers to the attempt to translate information related to an attribute at the local level (e.g., the width of the active channel or the presence/absence of islands within a short segment of the river) into information that synthesizes, in general, multiple attributes and with a view that spans well beyond the single short-segment. In the particular mono-dimensional applications presented below, the reductionist information is the local presence/absence value (binary case) or else the specific value within a finite set of possibilities (categorical case), while the (2) Determination of the correspondence between contacts and segmented VB (by applying mainly the FCT and obtaining the Discretized Geographical Objects (DGO)); (3) Determine the spatial correspondence between VB segments and contacts using the GIS tool "Spatial join"; (4) Determine the Confinement Type by applying a statistical rule using an Excel spreadsheet on the attribute table extracted from the segmented VB shapefile.

A Logical Heuristic "Reductionist→ Holistic Algorithm" for Mono-Dimensional Variables
We address here the need to obtain a holistic synthesis from reductionist information. The expression "Reductionist → Holistic" refers to the attempt to translate information related to an attribute at the local level (e.g., the width of the active channel or the presence/absence of islands within a short segment of the river) into information that synthesizes, in general, multiple attributes and with a view that spans well beyond the single short-segment. In the particular mono-dimensional applications presented below, the reductionist information is the local presence/absence value (binary case) or else the specific value within a finite set of possibilities (categorical case), while the holistic synthesis is a judgment obtained by looking at a sequence of such reductionist values along the river, identifying homogenous stretches. To this aim, we developed a logical heuristic algorithm, both for the binary and thee categorical cases, whose results are nicely consistent with expert judgment output. We adopt the term "logical heuristic" to stress the fact that the algorithm is based on explicit reasoning (logics), but with some steps that not necessarily lead to an optimum, although still achieving the purpose (at least in our application).

Reductionist → Holistic Algorithms: The Literature Status
For the mono-dimensional problem (only one attribute), a statistical approach seems to be the best option. Parker et al. [20] provide an interesting literature synthesis of statistical algorithms borrowed from geological analysis [21], recalling Davis' classification scheme [22] in (i) "local boundary hunting" and (ii) "global zonation". The former searches for the boundary between two distinct levels of the considered character; e.g., Webster's algorithm [23] and its variants identify the location of "breaks" by sliding a moving window until the difference of variance before and after its midpoint is maximized, while variance is kept at a minimum within each one of the two halves. it's the performance depends on the width of the moving window, a variable which is selected somewhat arbitrarily. The second approach, "global zonation", looks at the whole data set at a glance, trying to identify those subsets of points that are internally most uniform. For instance, Bohling et al.'s [24] cluster algorithm identifies first that location characterized by the minimum change of any possible couple of adjacent points within the dataset. These two adjacent zones with the smallest difference are then combined into one combined zone, which is treated as a single object, characterized by the mean value of the points within the zone, and considered in the next iteration. The process continues, with more and more zones being joined based on their similarity to each other. This method begins with zero within-zone variance, but each iteration results in an increase, until the proportion of total variance explained by the zonation falls below a specified threshold. Other "global zonation" methods, however, like Gill, [25] and Hawkins and Merriam, [26], are better suited for the identification of river reach boundaries because they statistically minimize within reach variation and maximize between reach differences, and are less influenced by local inconsistencies in the data sequence. Gill's method explicitly seeks to identify the zonation that minimizes the variance within each zone (reach) and maximizes the difference between the zones (reaches). Hubert [27] introduced an algorithm which works in a similar manner; although it is possibly the best performing one for this task, its outputs are not exempt from questioning ( Figure 7). Similar conclusions have been found by other researchers (e.g., Martinez-Fernandez et al. [28]).
Furthermore, it must be recognized that all of these approaches have been designed thinking of quantitative, continuous (scalar) variables; when binary variables are concerned instead, they conceptually and practically fail. This is particularly true for Hubert's test, which is considered the key tool to define separate reaches within the Fluvial Corridor ToolBox [29]. This is one of the reasons why we explored a different approach by developing logical heuristic algorithms. is questionable as, for instance, the segment delimited by the two red lines (amongst many others) would certainly be classified by expert judgment as narrower than the one above it (notice that there is no criterion linked to the length, as Hubert's test determined both short and long segments).
For evaluating the performance of algorithms, Parker et al. [20] assume that the higher the proportion of variability R is explained by the reach boundaries, for a given number of reaches, the stronger the performance, and hence the zonation algorithm is best suited to defining functional river reaches. Operationally (what follows has been slightly reformulated and corrected from Parker et al. [20]) R is equal to Equation (1): R = σ 2 zone/(σ 2 zone + σ 2 stretch) (1) where σ 2 zone is a measure of the variance of the reach means about the grand mean μ of the whole sequence, defined as follows Equation (2): (2) where σ 2 stretch, j, m and μj are defined as follows Equation (3): is the th point within zone , μj is the mean of the th zone, is the number of points in the th zone, N = (j=1…nj ) is the total number of points considered amongst all zones, and is the number of zones.
The closer the performance index R gets to 1, the better the output is, because this means that the intra-cluster variances countless (i.e., they are smaller because cluster data are more similar to cluster mean), while the difference amongst clusters (expressed as the variance of the clusters means) increases and becomes the dominant component of the differentiation. Thus, one has clusters internally more homogeneous and more diverse amongst them.
Although very reasonable, this criterion is not suitable for our case. Indeed, as is apparent from what follows, our algorithm tries to "give life" to all cases of "presence" as soon as they form a sequence sufficiently long (actually longer than LM), without leaving "holes" longer than LM, being this a significant length parameter to be set. So, the performance must be assessed on this basis. At the moment, we just considered ourselves satisfied with the visual appreciation. For evaluating the performance of algorithms, Parker et al. [20] assume that the higher the proportion of variability R is explained by the reach boundaries, for a given number of reaches, the stronger the performance, and hence the zonation algorithm is best suited to defining functional river reaches. Operationally (what follows has been slightly reformulated and corrected from Parker et al. [20]) R is equal to Equation (1): where σ 2 zone is a measure of the variance of the reach means about the grand mean µ of the whole sequence, defined as follows Equation (2): where σ 2 stretch , j, m and µ j are defined as follows Equation (3): where x ij is the ith point within zone j, µ j is the mean of the jth zone, n j is the number of points in the jth zone, N = ( j=1 . . . n j ) is the total number of points considered amongst all zones, and m is the number of zones.
The closer the performance index R gets to 1, the better the output is, because this means that the intra-cluster variances countless (i.e., they are smaller because cluster data are more similar to cluster mean), while the difference amongst clusters (expressed as the variance of the clusters means) increases and becomes the dominant component of the differentiation. Thus, one has clusters internally more homogeneous and more diverse amongst them.
Although very reasonable, this criterion is not suitable for our case. Indeed, as is apparent from what follows, our algorithm tries to "give life" to all cases of "presence" as soon as they form a sequence sufficiently long (actually longer than L M ), without leaving "holes" longer than L M , being this a significant length parameter to be set. So, the performance must be assessed on this basis. At the moment, we just considered ourselves satisfied with the visual appreciation.

Reductionist→ Holistic Algorithm: The Binary Case
In this case, the variable of interest is assessed at a number of points along the river in terms of presence/absence (usually corresponding to a relatively dense discretization as presented in Figure A1-Appendix A). Hence, the algorithm presented here operates on a binary variable (0,1) that maps the original variable over a fine discretization of the river axis into segments. We assume that the character of the river is assessed over a stretch of meaningful length, in no case shorter than a stretch with a "minimum significant length" L S . The philosophy interpreted by the proposed algorithm is hence in a few words: "identify those stretches, not shorter than L S , to which a predominant value can be assigned".
The algorithm operates as follows: • choose a significant minimum river length LS, basically according to the usual criterion of LS = 10 ÷ 20 W, where W is the width of the envelope of active channels. This minimum length corresponds to a number K of segments (K = Ls/d, with d length of a discretization segment) and their associated values of the considered variable; • identify the discontinuity points, i.e., locations where the binary indicator (measuring the relevant attribute) changes from 0 to 1 or vice versa, and determine the distance D(-1) from the previous discontinuity (the notation D(-n) denotes the number of points separating previous discontinuity from the segment located n steps before current one); • if the current point is not a discontinuity, then just keep the original value of the variable; if it is a discontinuity, instead, "look forward and backwards" to ascertain whether the discontinuity is due just to a local effect ("absence', i.e., a "hole", or "presence") or it is the beginning of a new reach within the significant horizon, and assign the new value accordingly (see Figure 8).

→
All of this results in a sequence of value for the new binary variable (0,1) "presence" called "Holistic 1". Now, run the following procedure for the newly calculated binary variable Holistic 1: • identify new discontinuities for this new variable; • to each one, apply a forward moving averaging (on the K horizon) operator and, if the result is less than the threshold α, come back to a 0 value (absence), otherwise set it to 1; • in this way, obtain a sequence of values for the new binary variable "presence" called "Holistic 2"; • in this series, filter out all reaches characterized by absence, but shorter than the minimum significant length L S ; • obtain thus a sequence of values for the new binary variable "presence" called "Holistic 3"; • in this series, filter out all reaches characterized instead by presence, but shorter than the minimum significant length.

→
Result is the final sequence of values for the binary variable called "Holistic ok".
This algorithm can be run for a "minimum significant length" L S varying along the river, according to the fluctuating (generally increasing) width of the active channel envelope.
We are aware that this is just a first version that can certainly be improved (some steps are perhaps unnecessary); however, as shown in the next section, it performs quite nicely, at least in our case study.

Binary Case: Application to Magdalena River's Islands
"Presence of islands" is an example of a binary attribute where a manipulation of the originally assessed values (1: presence, 0: absence of an island in each discretization segment) may be of interest if one wants to come up with an identification of reaches "with islands" as distinguished from those "without".
From Figure 9, it can be observed that no reaches shorter than the established significant length LS are left; but also that sometimes (particularly, in the first reach at the bottom of the first stretch at the left) isolated (tiny) islands appear in a stretch labelled as "no islands"; this is an expected result that is consistent with the way the algorithm is set up.

Binary Case: Application to Magdalena River's Islands
"Presence of islands" is an example of a binary attribute where a manipulation of the originally assessed values (1: presence, 0: absence of an island in each discretization segment) may be of interest if one wants to come up with an identification of reaches "with islands" as distinguished from those "without".
From Figure 9, it can be observed that no reaches shorter than the established significant length L S are left; but also that sometimes (particularly, in the first reach at the bottom of the first stretch at the left) isolated (tiny) islands appear in a stretch labelled as "no islands"; this is an expected result that is consistent with the way the algorithm is set up.

Binary Case: Application to Magdalena River's Islands
"Presence of islands" is an example of a binary attribute where a manipulation of the originally assessed values (1: presence, 0: absence of an island in each discretization segment) may be of interest if one wants to come up with an identification of reaches "with islands" as distinguished from those "without".
From Figure 9, it can be observed that no reaches shorter than the established significant length LS are left; but also that sometimes (particularly, in the first reach at the bottom of the first stretch at the left) isolated (tiny) islands appear in a stretch labelled as "no islands"; this is an expected result that is consistent with the way the algorithm is set up.  Figure 9. Results of the application of the binary, logical heuristic Holistic algorithm developed for the Magdalena River, as applied to the attribute "presence of islands". Several consecutive stretches are presented (the river flows from bottom-left to top-right). The small green polygons are the original islands (then transformed into a 0÷1 indicator in the discretized VB, not shown here); the resulting linear attributes along the active bed axis (centerline) takes the pink color, with a thicker line, when the holistic indicator of presence of islands is "yes" and the blue, thinner line, when it is "no". The significant length adopted is Ls = 5 km, shown at the bottom right, as the prevailing river width is W = 500 m (therefore, K = Ls/d = 10, as we assumed d = W).

Reductionist→ Holistic Algorithm: The Categorical Case
A categorical variable assumes discrete values defined over a non-ordinal scale (i.e., it is not possible to rank them in any order). As an example, for the Cauca River (main tributary of the Magdalena) we created a variable denoted "current," related to the type of surface flow (as recognizable from Google Earth imagery; this is a proxy for "character of the bed," a variable that cannot be assessed directly for turbid rivers like these, where the bed is in general not visible). The variable "current" has seven possible categories (and corresponding discrete values on a non-ordinal scale): 0: rapids; 1: corrugated (long riffles that do not end in a pool); 2: pool and riffle; 3: streamed (stretch with visible lines of current and local turbulence, but softer than in previous cases); 4: rippled (softer than streamed); 5: plane surface (probably corresponding to planar bed: runs or glides); 10: not detectable (bad quality of images).
The philosophy for the holistic algorithm in this case is very simple: "do not leave windows shorter than L S and assign to each stretch the prevailing value in it". It can be noted that this algorithm is structurally "rougher" than in the binary case because there is not a clear criterion to choose which category to keep when a stretch is too short and hence we adopted a very simple and practical criterion that, although working acceptably well, could be well substituted by others.
Hence, the proposed algorithm works as follows: Cycle I: • Identify discontinuities (where switches from one category to any other occur).

•
Where no discontinuity is present, the algorithm keeps the previously calculated value (as a result of this holistic algorithm; at the first step it assumes the original value).

•
Where a discontinuity occurs: -"K steps forward prevailing": the algorithm identifies that category which occurs with the highest frequency in the K steps downstream. -"Residual frontal window": analogous, but along just K-D(-1) steps forward (this is a forward moving horizon which progressively narrows down within a fixed window starting from the previous discontinuity; it is modified when a switch occurs to the next discontinuity). Its purpose is to consider what the prevailing value was previous to the current point. -When the prevailing value in the residual forward window coincides with that previous to the last discontinuity → current segment is just a "local hole" and as such the algorithm keeps the previous holistic value. -When the value is "not detectable" (one of the 7 possible categories), the algorithm just keeps it. -If not, the algorithm takes the same value of the prevailing category within the K steps forward.
Cycle II: removing residual segments shorter than K steps. Here the following (arbitrary) criterion is adopted: "just make them uniform with the previous adjacent category". Hence, the algorithm goes like this: -It identifies the discontinuities in the result from Cycle I.
Where no discontinuity is present, it keeps the previous value (the new one, produced by the algorithm; where a discontinuity occurs instead: -If the full distance from the previous discontinuity is smaller than K, the algorithm assigns the same value occurring before the discontinuity. - If not, it keeps the current value. Again, the fitness criterion here is purely visual (Figure 10).  figure): it can be seen that the number of segments has now been significantly reduced (blue labels with larger font), no segment is shorter than the adopted significant length L S , and the reaches that were "engulfed" are those that were originally the shortest.

Conclusions
We can honestly say that the proposed systematic procedure and set of computer-aided tools (the ToolBox, some of whose tools we presented herein) do provide a significant benefit. An objective and systematic identification of reaches (addressed in Nardini et al., [18]), a reliable (although perhaps somehow cumbersome) confinement procedure, the reductionist-holistic tools, and the automatic, GIS based, grouping tool (not described in detail here: see Table A1 of Appendix A) provide all together a substantial support.
We are aware that the procedure reported in Table A1 and the specific algorithms and tools we developed (e.g., the confinement tool) still require a significant manual intervention and sometimes not straightforward exchanges between GIS and Excel; in other words, the proposed procedures are automatable, but not yet fully automated. All this, however, lies just at the operational level; what really matters is that the expert judgment is actually needed only in specific, well identifiable tasks: the definition of criteria like the Procedural Tree; the fine revision of the VB identification; the overall interpretation of the river behavior; and progressively less, the mapping of geomorphic units (but this is very likely to be soon performed by automatic procedures as discussed, for instance, in Piégay et al. [1]).
The conceptual clarifications and operational procedures, tools, and hints provided within the ToolBox and associated Manual can significantly support many practitioners-particularly in the Spanish-speaking world-and can help disseminate the adoption of the River Styles Framework, as well as the use of the Fluvial Corridor ToolBox.  Steps operationally needed to carry out a full River Styles (Stage I, Steps 1 and 2) analysis and corresponding tools/procedures proposed for doing so. Bold italic type items are those we developed "ex-novo" and which are fully described in the ToolBox Manual; two of these-"Confinement tool" and "Reductionist → Holistic algorithm"-have been described in the text above (Section 3.1). Notice that in our application to the Magdalena River Basin, described in [16,17], we adopted a set of official digital ArcGIS shapefiles that allowed us to skip some of the steps related to the delineation of geomorphic elements and units (but not all), but at the same time raised some issues mainly due to resolution and lack of synchronicity with the available Digital Elevation Model (DEM) from the Shuttle Radar Topography Mission (SRTM) 30 m filled.

Methodology
Procedural Tree and definition of attributes Choose structure suited for the case at hand conserving a river basin (or even better, a regional) perspective  Instrumental

Contact points river-valley margin
Identify key points where the envelope of active riverbed (see below) touches the margin of the effective VB. Geological, planform, and anthropogenic contacts should be considered Manual, expert judgment procedure on GIS supported by DEM info + standard construction of a GIS point SHP (semi-automatic GIS algorithms are an alternative) Useful to assess degree of "constraint" and sediment sources, particularly in laterally unconfined (but constrained) reaches. Active channels' envelope needs to be determined first (see below) Instrumental Contact segments between active riverbed and VB margins Create a thin buffer around the envelope of the active riverbed; then intersect with the effective VB (see below) Manual standard ArcGIS procedures (buffer, intersection) It is a basic information for the confinement attribute. Active channels' envelope must be determined first (see below)

Sediment sources
Identify where there is: a contact point with landslides, fans or terraces; or meanders with external VB elevation higher than internal (so that the erosion-sedimentation balance is positive); or a tributary input

Manual (expert based) or semi-automated GIS procedures
The expert judgment can be synthesized in a logical tree that can then be implemented (even in an Excel spreadsheet) to carry out this task

Excel spreadsheet (GeoMagdaToolBoxManual: "Esqueleto_magda")
A constriction is not a contact point, but a short reach where the VB narrows down significantly, at the local scale, as compared with upstream and downstream. Constrictions are important because they can induce changes in hydraulic and sediment behavior and, consequently, in morphology

Instrumental
Segmented VB Spatial discretization of the VB according to regular reaches along its centerline with a relatively fine step (e.g., 500 m for the Magdalena) and sequenced (i.e., spatially ordered)

FCT: Disaggregation; Sequencing
This is a very important step that supports several subsequent steps. The key output is the "Rank_DGO" field of the obtained  Add mid channel and bank attached bars to the "active channel just water" polygon. NOTICE: The procedure presented here refers to the specific information available for the Magdalena, where an official SHP IGAC 1:100,000 was available for bars ("bancos de arena") but did not discriminate amongst main river and tributaries. It must be noted that this SHP presents some inconsistencies; it was probably built based on different, asynchronous images, possibly by different operators.
ArcGIS procedure: -draft (still without bars) envelope of active channels (see next step); -buffer of this envelope of a max width of lateral bar; -CLIP the original bars SHP (IGAC 2017 100 mil «bancos de arena») with this envelope; -manual check to eliminate bars not belonging to the river of interest; -MERGE with the active river bed; DISSOLVE Care must be put in not including bars belonging to tributaries; there is a degree of ambiguity. A more direct procedure consists of identifying bars directly from satellite images as bare surfaces, based for instance on the SWIR (Short-wavelength infrared) band ( Eliminate the envelope of active riverbed "just water" (which, by construction, includes islands and mid channel bars) from the envelope of active riverbed (CA)

ArcGIS: ERASE
Alternative methods exist; for instance, the GUT tool within the River Scape Community ( [34][35][36]), but they just work with high resolution imagery and altimetry. UAV surveys can provide viable alternative at moderate scales of analysis (e.g., Casado et al. [12] or for a review: Carrivick and Smith, [3]) Geomorphicelements

Mid-channel bars
Eliminate the active riverbed "just water" and island sand bank attached bars from the envelope of active riverbed   Confinement is a statistical indicator defined over an ordinal scale, assessed for each reach, with values: "confined", "partly confined", "laterally unconfined"

RS Attributes
Cause of confinement For each confined or partly confined reach, identifies which is the prevailing cause amongst: "valley", "planform", "works" (anthropogenic)

Manual ArcGIS procedure as detailed in the ToolBox Manual Modulo 3 bis (this paper)
In each reach, several causes may intervene; the prevailing one in terms of lengths involved (  Our procedure is as follows: (i) Identify local confined segments (contact between the AC envelope and VB margins, analogously to O'Brien et al. [6]): create a small buffer of the AC envelope (delta = 10 ÷ 20 m for the Magdalena; this is a parameter to be "calibrated" according to the river at hand); -apply "ERASE tool" of the output polygon with the VB polygon → the output is a SHP of small «contact» polygons (it is possible to transform them into lines, but it is not necessary); -apply "BUFFER tool" of such contact polygons with the same delta (set ArcGIS options as: Side type = FULL and Dissolve = NONE); -add an integer field (CONF); -separate the right from the left part by ERASE using two masks defined from the AC axis: use Right Mask for left part and vice versa.
(ii) Determine a local confinement indicator within the Skeleton, by using the spatial correspondence between contact segments and discretization "slices" obtained for instance with the Fluvial Corridor ToolBox (see Table A1) to this aim: apply "Spatial Joint tool" of the contact segments (polygons), first left and then right, with the effective, segmented VB (and sequenced, i.e., with a spatial order, output of a FCT tool which automatically includes the "Rank_DGO" field) as target; -assign (with Select by attribute and field calculator) 1 if there is no empty intersection (i.e., there is local confinement) or 0, if vice versa; -thus, obtain the segmented VB with the binary 0-1 field for confinement (two polygon SHP for left and right sides). What is useful in this SHP table is the "CONF" columns; -extract the SHP table in an Excel spreadsheet; Sort on the basis of the Rank_DGO field; -apply "PASTE tool" in it these two columns (one at a time) and the Rank_DGO column (the Excel Tool CONFINA can be used for this purpose).
(iii) obtain the final statistical indicator over the selected reaches (polyline SHP "Reaches" with final segmentation chosen to assess confinement); notice that such reaches typically have varying lengths. The statistical indicator can simply be the ratio of the total number of either bank confined slices falling within a reach, over the total number of slices within that reach. To this aim: insert a field "ID_Reach" into the table of the .shp Reaches, with-increasing value; -apply "Spatial Join tool" of the segmented SHP VB with this .shp Reaches (using VB as Target to obtain a complete correspondence of elements); -export the table and correct manually because, once ordered base on the Rank_DGO field, "holes" may appear (i.e., "0" values in the middle of a sequence of a given value corresponding to a long segment: this is caused by the fact that the axis of the AC may not intersect all the elements of the VB, as shown in Figure A1) in the ID_Tramos column: these must be filled. In this way, this instrumental Excel will now contain two fields (say "CONF_sx" and "CONF_dx") with the correspondence of elements between Skeleton (discretization "slices") and reaches; -in the same Excel it is now possible to compute the statistical indicator: The final product is a polyline SHP Reaches including a discrete attribute with (typically) the three categories "confined, partly confined, laterally unconfined".
to obtain a complete correspondence of elements); -export the table and correct manually because, once ordered base on the Rank_DGO field, "holes" may appear (i.e., "0" values in the middle of a sequence of a given value corresponding to a long segment: this is caused by the fact that the axis of the AC may not intersect all the elements of the VB, as shown in Figure A1) in the ID_Tramos column: these must be filled. In this way, this instrumental Excel will now contain two fields (say "CONF_sx" and "CONF_dx") with the correspondence of elements between Skeleton (discretization "slices") and reaches; -in the same Excel it is now possible to compute the statistical indicator: The final product is a polyline SHP Reaches including a discrete attribute with (typically) the three categories "confined, partly confined, laterally unconfined". Figure A1. Example (blue, dotted circle) of a situation where the segmentation of the VB may create an inconsistency with a line SHP (e.g., the AC axis, red line): the line does not touch all segments because some of them do not cross the whole VB polygon. Therefore, the intersection line-segmented VB will not contain the same number of elements; but the identifiers of those remaining will be correct.

Appendix B.2 B-Determining the Cause of Confinement
This part includes two stages (1) and (2) which are described in this section: (1) Build a VB margin polyline SHP with an attribute "Type of margin" (valley, planform, and infrastructure). To this aim: (i) Take the effective VB (already possibly cut by infrastructure and as such with margins in contact with them) and build a polyline SHP with two separate lines (right and left): the ArcGIS Tool "Feature to Line" can be adopted, then cutting the contour line at the beginning and end; add a field "Type of margin". (ii) Determine locations with binding infrastructure (by using polyline SHP of roads, railways, dykes, longitudinal defenses, gabions, batteries of groynes, etc.), that is, where the natural VB has been cut by some infrastructure thus originating an effective VB; to this aim, one possibility is as follows: -buffer of VB with +10 and another buffer with−10 (for the Magdalena); -take the spatial "difference" (ERASE) obtaining a kind of strip; -with this strip get the intersection with line works (CLIP) → obtain a polyline SHP of the interfering works; -make a BUFFER of such SHP; -apply the Analysis Tools/Overlay/IDENTITY Tool between the polyline SHP VB margin and this SHP of interfering works after applying a BUFFER tool to it; -apply the tool MULTIPART TO SINGLE PART to the result; -where there actually is an intersection (Select by Attribute), assign Type = "work" (with the Field calculator). Typically, the output is very fragmented, but this is not a problem as afterwards it is only used to compute a statistical indicator.
(iii) Determine where the margin is "planform" (a polyline SHP of the planforms must be available; a possibility is to build it by identifying surfaces at differential height from a sufficiently precise DEM); to this aim: -apply IDENTITY between the SHP VB margins and the SHP planforms; -then, in editing mode, by Select by attribute select all those items that are not "works" AND with non-empty intersection and assign Type = "planform". To the remaining ones assign type = "valley"; -finally, add a length field and compute it (by Geometry option) and clean the SHP by eliminating all useless fields other than margin (Left, Right) and the type → obtain the polyline SHP VB margins with attribute Cause of confinement.
(2) Produce the information needed to compute the statistical indicator to be associated with each "Reach" of the segmentation adopted for Confinement (as described above). Namely, this means producing a polyline SHP segmented and sequenced exactly as the VB (same elements), including: the identifiers (ID) of the VB slice the ID_Reach -the Type of margin (left and right, separated).
To this aim: (0) apply the BUFFER tool to the polyline SHP of Type of VB margin; (i) split it into a right and a left SHP utilizing the masks already obtained (above); (ii) with each one, apply a SPATIAL JOIN with the segmented VB as target; (iii) extract their Excel tables; ensure the completeness of the corresponding series of VB IDs (by filling where needed the "holes", analogously to what has been shown above); (iv) use the left and right columns ordered according to the Rank_DGO field to compute the statistical indicator that just selects the type of confinement with maximum number of contact slices within each Reach. To this aim, the Excel Tool CONFINA can be used (described in the GeoMagda ToolBox). Figure A2 presents the outputs obtained for the Cauca River (Colombia) with the above described algorithm.
Geosciences 2020, 10, x FOR PEER REVIEW 24 of 27 iv) use the left and right columns ordered according to the Rank_DGO field to compute the statistical indicator that just selects the type of confinement with maximum number of contact slices within each Reach. To this aim, the Excel Tool CONFINA can be used (described in the GeoMagda ToolBox). Figure A2 presents the outputs obtained for the Cauca River (Colombia) with the above described algorithm. Figure A2. Example of output for a stretch of the Cauca River from the proposed confinement algorithm: A) active channel (narrow, light blue polygon), VB (light green polygon) and reaches (identified by colors of the axis segments); B) identification of contacts (small red segments or dots); C) confinement categorical indicator and types of margins of the VB. Notice that several contiguous reaches have the same confinement classification (e.g., 80, 81, bottom left in A, are partly confined; 82, 83 confined), but different type of Cause of confinement (because of different type of VB margin) and hence are displayed in a different color. Figure A2. Example of output for a stretch of the Cauca River from the proposed confinement algorithm: (A) active channel (narrow, light blue polygon), VB (light green polygon) and reaches (identified by colors of the axis segments); (B) identification of contacts (small red segments or dots); (C) confinement categorical indicator and types of margins of the VB. Notice that several contiguous reaches have the same confinement classification (e.g., 80, 81, bottom left in A, are partly confined; 82, 83 confined), but different type of Cause of confinement (because of different type of VB margin) and hence are displayed in a different color.