Object-Based Ensemble Learning for Pan-European Riverscape Units Mapping Based on Copernicus VHR and EU-DEM Data Fusion

: Recent developments in the ﬁelds of geographical object-based image analysis (GEOBIA) and ensemble learning (EL) have led the way to the development of automated processing frameworks suitable to tackle large-scale problems. Mapping riverscape units has been recognized in ﬂuvial remote sensing as an important concern for understanding the macrodynamics of a river system and, if applied at large scales, it can be a powerful tool for monitoring purposes. In this study, the potentiality of GEOBIA and EL algorithms were tested for the mapping of key riverscape units along the main European river network. The Copernicus VHR Image Mosaic and the EU Digital Elevation Model (EU-DEM)—both made available through the Copernicus Land Monitoring Service—were integrated within a hierarchical object-based architecture. In a ﬁrst step, the most well-known EL techniques (bagging, boosting and voting) were tested for the automatic classiﬁcation of water, sediment bars, riparian vegetation and other ﬂoodplain units. Random forest was found to be the best-to-use classiﬁer, and therefore was used in a second phase to classify the entire object-based river network. Finally, an independent validation was performed taking into consideration the polygon area within the accuracy assessment, hence improving the e ﬃ ciency of the classiﬁcation accuracy of the GEOBIA-derived map, both globally and by geographical zone. As a result, we automatically processed almost 2 million square kilometers at a spatial resolution of 2.5 meters, producing a riverscape-units map with a global overall accuracy of 0.915, and with per-class F1 accuracies in the range 0.79–0.97. The obtained results may allow for future studies aimed at quantitative, objective and continuous monitoring of river evolutions and ﬂuvial geomorphological processes at the scale of Europe.


Introduction
Hydromorphological pressures affect 40% of European water bodies, hampering the achievement of their good ecological status [1]. Characterizing river morphology is essential in order to identify alterations and design restoration measures. Traditionally, most river geomorphological survey methodologies rely heavily on field-based methods that require time-and resource-intensive field campaigns, resulting in scattered information and sensitive to operator subjectivity [2][3][4]. Fluvial remote sensing, an emerging discipline joining the river science and remote sensing (RS) [5], now offers new possibilities for monitoring river processes at high spatial and temporal resolution, thanks to the exploitation of objective, repeatable and continuous information alongside the entire river network [6].
Unprecedented possibilities are available nowadays for river science and management, from the local, to the regional and, eventually, continental scales.
An essential aspect in river science-both for fluvial processes assessment and for river management purposes-is the delineation of the main riverscape units that characterize the fluvial landscape in the vicinity and within the so-called active river channel [7]. The active channel is defined in literature as the low-flow channel, plus adjacent sediment bars at the edges of perennial, terrestrial vegetation, usually subjected to erosion and deposition processes and by vegetation encroachment [8][9][10]. Mapping riverscape units is important for the understanding of the macrodynamics of a river system and if repeated through time it can be a powerful tool for different purposes, such as the assessment of evolutionary trajectories of river reaches [11], stream fragmentation [12] or human pressures [13,14], just to mention a few.
In the literature, few researchers have attempted to map these key units at the continental scale in an automated/semi-automated way through the use of RS data. Spada et al. [11] analyzed the main Albanian rivers over a 40-year period, extracting the river centerlines, as well as active and wet channel width using multisource satellite data, such as Landsat and Sentinel-2. Bertrand et al. [15] mapped some riverscape units and channel types based on RGB orthophotos for the Drôme catchment (France), using a geographical object based image analysis (GEOBIA) technique. Belletti et al. [16] manually digitized channel boundaries and associated riverscape units on available orthophotos, with the aim of analyzing historical changes in channel morphology for a selection of reaches. In the work of Demarchi et al. [17] the potentiality of GEOBIA and Machine Learning (ML) classification were integrated for the automatic delineation of main riverscape units at the sub-meter resolution (40 cm) for 1200 km of river length of the main river systems of Piedmont Region (Italy). The RS classification map was then used in support of a quantitative river type classification and characterization of fluvial forms across the region. The utility of such type of objective and continuous mapping was also demonstrated by the subsequent work of Bizzi et al. [18], in which a consistent assessment at the regional scale of human-induced 50-100-year channel changes was provided for the first time, thanks to the exploitation of the RS-based riverscape units map.
Over the last two decades, GEOBIA has emerged as a new paradigm in RS analysis, thanks to its capability of boosting classification performances as compared to per-pixel analysis, especially when processing very high-resolution (VHR) multi-spectral data [19][20][21][22][23][24][25]. Research interests have shifted from the development of theoretical foundations toward their implementation in a wide variety of real-world applications [26]. The integration of supervised ML techniques into GEOBIA has helped in reaching such achievements [27]. Ensemble learning (EL) methods have been reported to be the most influential development within ML in the past decade [28,29]. They are able to combine multiple classification models into one final improved result, thanks to different training techniques, namely bagging, boosting or voting. Several studies can be found in literature nowadays where different EL algorithms, mainly random forest, are used for the classification of image objects in a broad range of applications [30][31][32][33]. However, most of these studies, as reviewed in [27], are focused on study areas of less than 300 ha (3 km 2 ), with a spatial resolution of images between 0 and 2 m.
The main objective of this paper is hence to test the capability of GEOBIA combined with EL algorithms in automatizing the segmentation and classification procedures at the big-data scale, for the purpose of mapping the main riverscape units of the main European river network. A hierarchical object-based architecture, developed in a previous study by Demarchi et al. [34] and based on airborne sub-meter orthophotos and LiDAR data, is adapted in the present work for the fusion of two continental-scale datasets: the Copernicus VHR Image Mosaic and the EU Digital Elevation Model (EU-DEM)-both made available through the Copernicus Land Monitoring Service (https: //land.copernicus.eu/). Such types of open source VHR satellite data are becoming nowadays more and more available, therefore an adaptation of the previous methodology to a similar spectral and topographic dataset configuration is strongly envisaged. All river segments of Europe with basin area > 5000 km 2 were selected and the two-level segmentation procedure was implemented for the overlapping 700 image tiles of 50 km*50 km spread around Europe, with the aim of integrating the topographic data with the VHR data in an automated way. After collecting numerous training and validation reference objects by visual inspection, different EL classifiers were trained and tested for the mapping of the main riverscape units: water, sediment bars, riparian vegetation and other floodplain units. Random forest (RF), extra trees (ET), gradient tree boosting (GTB) and extreme gradient boosting (XGB) were tested and finally the different predictions were combined within a voting classifier (VC), one of the simplest ways to combine different ML techniques [35]. In this way, we could attempt to generalize a best-to-use classifier for the object-based classification of riverscape units. The best classifier was then used to predict the label class of all objects produced by GEOBIA at the European scale. Finally, the riverscape units map was validated with an independent set of samples, which resulted in a validated product covering almost 2 million square kilometers at the resolution of 2.5 m. The obtained results allow to shed lights on the usefulness of the Copernicus VHR and EU-DEM products for mapping the main landcovers describing fluvial landscapes at the European scale.

Remote Sensing Data and Areas of Interest
The focus of this work was on the main river systems of Europe with a basin size larger than a set threshold of 5000 km 2 . This threshold was set to exclude those river systems that would not properly be distinguishable at the scale of EU-DEM and VHR datasets, respectively 25 and 2.5 m. River segments were selected from the catchment characterization model (CCM2) database and divided in geographical zones, according to [36]. Figure 1 shows the Copernicus VHR Image Mosaic, with the selected zones of analysis and the relative image tiles selected for the processing (see also Section 3.1). Selected river segments analyzed in this work overlaid on the EU-DEM layer are instead showed in Figure 2.   The Copernicus VHR Image Mosaic ( Figure 1) is a seamless mosaic of very high resolution (VHR), based mainly on SPOT-5/6 and Formosat-2 data. It is available through the Copernicus Land Monitoring Service (https://land.copernicus.eu/) and it was developed to cover various land applications and emergency response services at EU level (e.g., Urban Atlas, Natura 2000 sites, etc.). It covers a surface of 7.3 million square kilometers at a spatial resolution of 2.5 meters and it comprises three spectral bands: red, (0.665 μm), green (0.560 μm) and blue (0.490 μm). The SPOT-5 images have a panchromatic geometry, resulting from the merging of two separate images, one in panchromatic mode at 2.5 meters, and the other one in three-band multi-spectral mode at 10-meter spatial resolution. A histogram equalization was applied for best visual impression [37] and a cloud-free optimization resulted in a cloud coverage below 5% per NUTS unit (Nomenclature of territorial units for statistics, [38]).
The Digital Elevation Model over Europe (EU-DEM) ( Figure 2) is a digital surface model (DSM) of the EEA39 countries, representing the first surface as illuminated by the sensors, a hybrid product based on SRTM and ASTER GDEM data, fused by a weighted, averaging approach. It was downloaded from the Copernicus Land Monitoring Service website as a mosaic product at 25-m spatial resolution. The final study area extent of this work is the result from the overlap between the Copernicus VHR imagery, EU-DEM and river network with basin area > 5000 km 2 .
The aim of this work is to map the main hydromorphological-significant landcover types that characterize fluvial landscapes in Europe, within the selected area of interest discussed above. The Copernicus VHR Image Mosaic ( Figure 1) is a seamless mosaic of very high resolution (VHR), based mainly on SPOT-5/6 and Formosat-2 data. It is available through the Copernicus Land Monitoring Service (https://land.copernicus.eu/) and it was developed to cover various land applications and emergency response services at EU level (e.g., Urban Atlas, Natura 2000 sites, etc.). It covers a surface of 7.3 million square kilometers at a spatial resolution of 2.5 meters and it comprises three spectral bands: red, (0.665 µm), green (0.560 µm) and blue (0.490 µm). The SPOT-5 images have a panchromatic geometry, resulting from the merging of two separate images, one in panchromatic mode at 2.5 m, and the other one in three-band multi-spectral mode at 10-meter spatial resolution. A histogram equalization was applied for best visual impression [37] and a cloud-free optimization resulted in a cloud coverage below 5% per NUTS unit (Nomenclature of territorial units for statistics, [38]).
The Digital Elevation Model over Europe (EU-DEM) ( Figure 2) is a digital surface model (DSM) of the EEA39 countries, representing the first surface as illuminated by the sensors, a hybrid product based on SRTM and ASTER GDEM data, fused by a weighted, averaging approach. It was downloaded from the Copernicus Land Monitoring Service website as a mosaic product at 25-m spatial resolution. The final study area extent of this work is the result from the overlap between the Copernicus VHR imagery, EU-DEM and river network with basin area > 5000 km 2 .
The aim of this work is to map the main hydromorphological-significant landcover types that characterize fluvial landscapes in Europe, within the selected area of interest discussed above. Considering the spatial resolutions of both dataset-25 m for the EU-DEM and 2.5 m for the VHR-we decided to target our classification problem to the following classes: water, sediment bars, riparian vegetation and other floodplain units (OFU). The major challenge of mapping these riverscape units automatically using RS data lies in distinguishing sediment bars and riparian vegetation units from those landscape units that can be found within the floodplain (e.g., OFU class) and that have similar spectral characteristics. For example, sediment bars are spectrally comparable with crop fields left fallow due to crop rotation practices or to gravel rural roads or urban settlements that can be found in the floodplain. In fact, discerning between spectrally similar land cover classes such as urban areas and bare soil fields is a huge challenge, even for RS data operating at high spatial and spectral resolution (e.g., hyperspectral data) [39,40]. One of the objectives of this study, is to test the capability of the Copernicus VHR dataset in this context, which comes with a very limited spectral configuration, in particular without the near-infrared spectral band.
In addition, European rivers are characterized by highly diverse and dynamic landforms. Figure 3 pictures some examples of the investigated river sections for this paper. Most representative river types are present in the European landscape, such as meandering, sinuous, wandering and braided, making the GEOBIA segmentation and EL classification an even more challenging problem.
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 26 Considering the spatial resolutions of both dataset-25 meters for the EU-DEM and 2.5 meters for the VHR-we decided to target our classification problem to the following classes: water, sediment bars, riparian vegetation and other floodplain units (OFU). The major challenge of mapping these riverscape units automatically using RS data lies in distinguishing sediment bars and riparian vegetation units from those landscape units that can be found within the floodplain (e.g., OFU class) and that have similar spectral characteristics. For example, sediment bars are spectrally comparable with crop fields left fallow due to crop rotation practices or to gravel rural roads or urban settlements that can be found in the floodplain. In fact, discerning between spectrally similar land cover classes such as urban areas and bare soil fields is a huge challenge, even for RS data operating at high spatial and spectral resolution (e.g., hyperspectral data) [39,40]. One of the objectives of this study, is to test the capability of the Copernicus VHR dataset in this context, which comes with a very limited spectral configuration, in particular without the near-infrared spectral band.

Data Pre-Processing and Organization
The different sources of data used in this work and their related processes are summarized in Figure 4. The Copernicus VHR dataset was delivered in the form of tiles of 50 km*50 km. All tiles covering the investigated rivers were selected and organized in a structured database, grouped according to the CCM2 zone subdivision described in Figure 1. In Table 1 the number of tiles and the corresponding processed areas in square kilometers are showed for each zone of analysis. In addition, European rivers are characterized by highly diverse and dynamic landforms. Figure  3 pictures some examples of the investigated river sections for this paper. Most representative river types are present in the European landscape, such as meandering, sinuous, wandering and braided, making the GEOBIA segmentation and EL classification an even more challenging problem.

Data Pre-Processing and Organization
The different sources of data used in this work and their related processes are summarized in Figure 4. The Copernicus VHR dataset was delivered in the form of tiles of 50 km*50 km. All tiles covering the investigated rivers were selected and organized in a structured database, grouped according to the CCM2 zone subdivision described in Figure 1. In Table 1 the number of tiles and the corresponding processed areas in square kilometers are showed for each zone of analysis.  This structure was used to create different project files under the eCognition Developer 9 ® software, which was employed for the first part of the processing: slope calculation and the two-level hierarchical object-based segmentation (explained in Section 3.2). The mosaicked EU-DEM ( Figure 2) was clipped according to the Copernicus VHR tile extents and cataloged in the same structured  This structure was used to create different project files under the eCognition Developer 9 ® software, which was employed for the first part of the processing: slope calculation and the two-level hierarchical object-based segmentation (explained in Section 3.2). The mosaicked EU-DEM ( Figure 2) was clipped according to the Copernicus VHR tile extents and cataloged in the same structured database, using a homogeneous naming convention, so to facilitate the automatic reading procedure when building the different eCognition's projects. The selected river segments, together with the EU-DEM, were used as input into the "fluvial corridor" toolbox described in Roux et al. [41], useful for the generation of the Valley Bottom (VB) and the detrended digital terrain model (DDTM). The VB, described by Alber and Piégay 2011 [42] as the modern alluvial floodplain, is an important fluvial unit in the geomorphological characterization of stream networks, representing the deposition zone of alluvium, including both riverbed and floodplain areas [43]. All further processing in this study will focus exclusively within the boundaries delineated by the VB shapefile, ignoring everything falling outside. The DDTM represents the elevation of all floodplain pixels with respect to the river channel and it has been proven in both [34] and [17] to be a powerful tool for properly identifying those riverscape units that could present similar spectral characteristics, but different topographical features. The "fluvial corridor" was run for each zone separately. The resulting DDTM rasters ( Figure 4) were then clipped and renamed into the same tile grid scheme as used for the EU-EUDEM and VHR database (as explained above). As a result of the pre-processing step, the following layers were automatically loaded in the eCognition's projects (Figure 4), created for each zone of analysis ( Figure 1 VB shapefile

Two-Level Hierarchical Object-Based Segmentation and Reference Dataset
The geographic, object-based image analysis (GEOBIA) consists in grouping connected pixels with similar spectral characteristics into meaningful image objects, with a similar approach as humans conceptually organize the landscape in order to comprehend it [20]. In this way, a broad range of features can be used to describe individual objects and classification performances can be enhanced [44]. GEOBIA techniques have been applied mainly to VHR imagery, in order to boost their limited spectral range (3-4 bands) [19]. In the previous works of Demarchi et al. 2016 [34] and Demarchi et al. 2017 [17], GEOBIA has proven its potentiality of integrating the spectral bands of VHR imagery with the topographical information coming from simultaneous airborne sub-meter orthophotos and LiDAR data for mapping the riverscape units for the main river network of the Piedmont region (Italy), at the spatial scale of 40 cm.
The riverscape unit segmentation and classification methodology implemented in this study is an adaptation of Demarchi et al. 2017 [17], applied in this case to the main river network of Europe (as explained in Section 2.1), and using only the three RGB spectral bands of the Copernicus VHR dataset. The first step of any GEOBIA procedure is the generation of objects, through the so-called segmentation step. In this study we adopted a two-level hierarchical approach, which has been proven to be an efficient method to well integrate the topographic coarser-resolution information with the higher spatial resolution spectral bands [17,34]. The first level segmentation was generated using a multi-resolution algorithm [44] in eCognition software using the slope layer as a unique input (layer 5 of Figure 4). Hence, specific morphological features of the active river channel, characterized by similar slope values, may be distinguishable. Within these topographically homogenous features a second, finer sub-level segmentation was then produced by using the 3 spectral layers available: Red, Green and Blue (layers 1,2,3 of Figure 4). Performing the hierarchical two-level segmentation proved to facilitate the distinction between the main landcover classes found in the active part of the river channel (for example, sediment bars) from those areas of the floodplain that are characterized by very similar spectral characteristics (for example, arable fields) [34]. For the two-level segmentations the following parameters were used: scale parameter 40, shape coefficient 0.1 and compactness coefficient 0.5. Using the same scale parameter for two sources of data with different spatial resolution, resulted in the generation of smaller objects in the level 2 segmentation, based on spectral differences. The Level 2 segmentation was used for the next step of classification. Therefore, for each object, the mean and standard deviation of the layers 1-6 ( Figure 4, RGB bands, EU-DEM, slope and DDTM) were computed and used as input into the classification models. Later, many Level-2 objects were labeled into the Remote Sens. 2020, 12, 1222 8 of 25 investigated classes (water, sediment bars, riparian vegetation and OFU) based on visual inspections, in order to create a vast reference database for a proper training and validation of the classification models. A spatial random sampling technique was adopted as much as possible.

Ensemble Learning Classification and Validation
Ensemble Learning (EL) classification techniques, based on the combination of multiple classifiers, have proven to be among the most powerful techniques in analyzing RS data, in particular to improve classification results when processing multisource data [45][46][47][48]. EL consists in individually training diverse classifiers and then combine them with various techniques, such as bagging, boosting or voting, to generate the final prediction. The bootstrap aggregation method, also called Bagging, builds an ensemble of learners using the same training algorithm but training each learner with several subsets of data (bags) that are randomly selected with replacement, meaning that the same sample can be selected in different subsets [49]. The output of each model is then combined in a final voting system. The bagging technique is at the base of the well-known random forest (RF) algorithm, which builds multiple decision trees using this technique [50]. The high number of trees and the low correlation between each other's is the powerfulness of RF, resulting in its massive implementation in several disciplines, among which RS classification and regression [51][52][53][54][55]. Extremely randomized trees or extra trees (ET) are reported to be a new advancement of RF and therefore were also selected for this study [56].
On the other hand, the boosting technique is an ensemble method in which the prediction power is improved by iteratively training a sequence of weak models, each compensating the weaknesses of its predecessors [57,58]. This is realized by adding miss-classified points to the next learner with a higher weight so that the next classifier will pay extra attention to classify them correctly. New weak learners are added sequentially so to focus the training on more difficult patterns. The final predictions are made by majority vote of the weak learners' predictions, weighted by their individual accuracy. Among these algorithms, the gradient tree boosting (GTB) [59][60][61] and the extreme gradient boosting (XGB) algorithms [62][63][64] have proven to be among the most effective from the ensemble family and are known for outstanding performances and state-of-the-art results in many research areas. In particular, the XGB was reported to be faster, more robust to noise, class imbalance (a problem in our case study) and exhibiting promising performances on classification tasks of RS data, outperforming various benchmark classifiers [65][66][67].
One of the aims of this paper is to shed lights on the proper EL algorithm to be used for object-based classification of riverscape units at the pan-European scale. Therefore, we tested several among the most commonly used EL algorithms: RF and ET, based on the Bagging technique and GTB and XGB based on the boosting technique and then we finally combined the different predictions with a voting classifier (VC), one of the simplest ways to combine different ML techniques [35]. The classification itself was approached in two steps under the Jupyter Notebook using the scikit-learn python library [68]. First, the different EL algorithms were trained and compared with the aim of identifying the most performing one, based on cross-validation metrics. Then, in the second step, the best selected algorithm was run onto the selected river network at pan-European scale. A post-classification, quality control operation was performed in QGIS (Figure 4), in order to visually check and remove major big errors. In the last final step, an independent validation was performed zone by zone and globally for the whole Europe in Jupyter Notebook, with the aim of assessing the real quality of the produced pan-European riverscape units map.
Remote Sens. 2020, 12, 1222 9 of 25 The assessment of GEOBIA products has specific issues as compared to standard pixel-based classification results. In recent years, new GEOBIA accuracy assessment methods are emerging, highlighting the drawbacks of point-based assessment methods, which consider each polygon as an individual sampling unit, without taking into account its area [27]. However, the variable polygon size is a major concern in assessing a GEOBIA-derived map. In fact, classification accuracy depends also on the polygon area. Adding the polygon area within the accuracy assessment could help improving the efficiency of the classification accuracy of a GEOBIA-derived map [69]. Therefore, in this paper we tested both point-based and area-based accuracy assessments, and computed accuracy indices both zone by zone and for the whole zones together (global assessment).

Pre-Processing and GEOBIA Segmentation Results
An example of the "fluvial corridor" toolbox outputs (Section 3.1, Figure 4), used for calculating the VB shapefile and the DDTM raster layer, is presented in Figure 5. The first one depicts the river floodplain, within which all analysis performed in this study were focused on, while the second one represents the elevation of any floodplain pixel with respect to the water channel. In Figure 6, the spatial distribution of the resulting VB shapefiles for all zones analyzed are displayed.
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 26 individual sampling unit, without taking into account its area [27]. However, the variable polygon size is a major concern in assessing a GEOBIA-derived map. In fact, classification accuracy depends also on the polygon area. Adding the polygon area within the accuracy assessment could help improving the efficiency of the classification accuracy of a GEOBIA-derived map [69]. Therefore, in this paper we tested both point-based and area-based accuracy assessments, and computed accuracy indices both zone by zone and for the whole zones together (global assessment).

Pre-Processing and GEOBIA Segmentation Results
An example of the "fluvial corridor" toolbox outputs (Section 3.1, Figure 4), used for calculating the VB shapefile and the DDTM raster layer, is presented in Figure 5. The first one depicts the river floodplain, within which all analysis performed in this study were focused on, while the second one represents the elevation of any floodplain pixel with respect to the water channel. In Figure 6, the spatial distribution of the resulting VB shapefiles for all zones analyzed are displayed. The two-level hierarchical segmentation processing was the most time demanding effort of the whole implementation methodology, requiring a total of about 25 days of processing time. The processing was performed on a local machine, using an Intel® Xeon® CPU @ 3.30 GHz with 16 GB of RAM (Intel Polska, Warsaw). The resulting time of the segmentation process divided by zone, is reported in Table 2.  The two-level hierarchical segmentation processing was the most time demanding effort of the whole implementation methodology, requiring a total of about 25 days of processing time. The processing was performed on a local machine, using an Intel ® Xeon ® CPU @ 3.30 GHz with 16 GB of RAM (Intel Polska, Warsaw). The resulting time of the segmentation process divided by zone, is reported in Table 2. Results of the two-level, object-based segmentation (Figure 7) show that the adopted GEOBIA segmentation is well representing the different landscape features that can be found within different types of river floodplain around Europe. The next step was the collection of different reference samples to be used for building the EL algorithms (Section 3.1, Figure 4) and for the validation of the final classification map. Therefore, several objects were labeled into the investigated classes (water, sediment bars, riparian vegetation and OFU), using a spatial random sampling approach, as showed in Figure 8.
The selected image tiles where reference samples were collected is plotted in Figure 6 in green boxes. As we can see, a quite good geographical distribution of reference samples was realized, well depicting the high heterogeneity of the analyzed landcover classes around Europe. This was an extensive time-demanding operation, which resulted in a high number of reference objects (as showed in Table 3), to be used for training/validation purposes. The variable number of objects indicated in Table 3 is mainly determined by the occurrence of such landcover types in naturally different river landscapes through Europe and by the fact that OFU is representing a much higher variability of landcover types, as compared to other classes. For each sample, the mean and standard  Results of the two-level, object-based segmentation (Figure 7) show that the adopted GEOBIA segmentation is well representing the different landscape features that can be found within different types of river floodplain around Europe. The next step was the collection of different reference samples to be used for building the EL algorithms (Section 3.1, Figure 4) and for the validation of the final classification map. Therefore, several objects were labeled into the investigated classes (water, sediment bars, riparian vegetation and OFU), using a spatial random sampling approach, as showed in Figure 8.
Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 26 deviation of all input layers (RGB bands, DEM, slope and DDTM) were computed in eCognition software and exported, so to be read into the Jupyter Notebook in the next following step: training and validation of the EL algorithms (Section 3.1, Figure 4).    deviation of all input layers (RGB bands, DEM, slope and DDTM) were computed in eCognition software and exported, so to be read into the Jupyter Notebook in the next following step: training and validation of the EL algorithms (Section 3.1, Figure 4).    The selected image tiles where reference samples were collected is plotted in Figure 6 in green boxes. As we can see, a quite good geographical distribution of reference samples was realized, well depicting the high heterogeneity of the analyzed landcover classes around Europe. This was an extensive time-demanding operation, which resulted in a high number of reference objects (as showed in Table 3), to be used for training/validation purposes. The variable number of objects indicated in Table 3 is mainly determined by the occurrence of such landcover types in naturally different river landscapes through Europe and by the fact that OFU is representing a much higher variability of landcover types, as compared to other classes. For each sample, the mean and standard deviation of all input layers (RGB bands, DEM, slope and DDTM) were computed in eCognition software and exported, so to be read into the Jupyter Notebook in the next following step: training and validation of the EL algorithms (Section 3.1, Figure 4).

Ensemble Learning Modelling Results
For each of the selected EL algorithm, random forest (RF), extra tree (ET), gradient tree boosting (GTB), extreme gradient boost (XGB) and voting classifier (VC), the splitting between training/validation polygons was done automatically on a 50/50 random basis, and repeated for 100 different runs, so to have 100 different classification results for each classifier. In this way a much higher statistical distribution of results could be used to compare classifiers performances. The results of such comparison are plotted in Figure 9. The p-values resulting from the "independent samples t-test" are also plotted. A p-value less than 0.05 (typically ≤ 0.05) indicates that there is a statistically significant difference between the two results, while on the contrary for a p-value higher than 0.05, there is not a statistically significant difference [70].

Ensemble Learning Modelling Results
For each of the selected EL algorithm, random forest (RF), extra tree (ET), gradient tree boosting (GTB), extreme gradient boost (XGB) and voting classifier (VC), the splitting between training/validation polygons was done automatically on a 50/50 random basis, and repeated for 100 different runs, so to have 100 different classification results for each classifier. In this way a much higher statistical distribution of results could be used to compare classifiers performances. The results of such comparison are plotted in Figure 9. The ƿ-values resulting from the "independent samples ttest" are also plotted. A ƿ-value less than 0.05 (typically ≤ 0.05) indicates that there is a statistically significant difference between the two results, while on the contrary for a ƿ-value higher than 0.05, there is not a statistically significant difference [70].
The RF classifier outperforms significantly any other tested classifier, generating the highest mean Overall Accuracy (OA) of 0.893. The ET classifier, another bagging technique, which has been reported to be an enhancement of RF [56], only reached a mean OA of 0.868. From the boosting ensemble classifiers, GTB scored very similarly with a mean OA of 0. 867, while with XGB a mean OA of 0.863 was obtained. Finally, the VC has produced a mean OA of 0.881. All results have proven to be statistically significant to each other's, due to the very low ƿ-value. The GTB is the classifier producing the highest standard deviation of OA, as compared to other tested classifiers, meaning it's the one being the most sensible to different training/validation samples selection. Figure 10 shows the features of importance selected by both the RF, ET and GTB, the best scoring classifiers. In both RF and GTB, the Mean DDTM is the most used input feature by the classifiers, followed by the mean RGB and DEM values. Mean and standard deviation of Slope are also rather important features.  The RF classifier outperforms significantly any other tested classifier, generating the highest mean Overall Accuracy (OA) of 0.893. The ET classifier, another bagging technique, which has been reported to be an enhancement of RF [56], only reached a mean OA of 0.868. From the boosting ensemble classifiers, GTB scored very similarly with a mean OA of 0. 867, while with XGB a mean OA of 0.863 was obtained. Finally, the VC has produced a mean OA of 0.881. All results have proven to be statistically significant to each other's, due to the very low p-value. The GTB is the classifier producing the highest standard deviation of OA, as compared to other tested classifiers, meaning it's the one being the most sensible to different training/validation samples selection. Figure 10 shows the features of importance selected by both the RF, ET and GTB, the best scoring classifiers. In both RF and GTB, the Mean DDTM is the most used input feature by the classifiers, followed by the mean RGB and DEM values. Mean and standard deviation of Slope are also rather important features.
Remote Sens. 2020, 12, x FOR PEER REVIEW 14 of 26 Figure 10. Features of importance for RF, GTB and ET respectively.

Validation of the Riverscape Units Map at Pan-European Scale
Due to the outstanding performance of the RF classifier as compared to other Ensemble methods, it was retained and used for the classification of the entire dataset, with the aim of producing the final riverscape units map for the whole Europe. The resulting processing time of RF classifier within the different processed zones is plotted in Table 4. With an average of 5-6 minutes it was possible to classify one tile of 50 * 50 km, which resulted in a total of almost three days (around 66 hours) for classifying the entire dataset, covering almost 2 million km 2 of land.
Once the final riverscape units map was produced, the next step was a quality control operation performed in QGIS (Figure 4, Section 3.1), in order to visually check and remove major big errors. After that, two different validation assessments were performed using an independent set of reference samples, which were left aside and not used during the EL training/validation steps. As explained in Section 3.3, point-based and area-based accuracy indices were computed both globally and zone by zone. The point-based normalized confusion matrix results are plotted in Figure 11, while the area-based ones are plotted in Figure 12. For ease of reading, the comparison of OA values for the point-and area-based assessment results are also plotted in Figure 13. Corresponding class codes are reported in Table 3.

Validation of the Riverscape Units Map at Pan-European Scale
Due to the outstanding performance of the RF classifier as compared to other Ensemble methods, it was retained and used for the classification of the entire dataset, with the aim of producing the final riverscape units map for the whole Europe. The resulting processing time of RF classifier within the different processed zones is plotted in Table 4. With an average of 5-6 minutes it was possible to classify one tile of 50 * 50 km, which resulted in a total of almost three days (around 66 hours) for classifying the entire dataset, covering almost 2 million km 2 of land. Once the final riverscape units map was produced, the next step was a quality control operation performed in QGIS (Figure 4, Section 3.1), in order to visually check and remove major big errors. After that, two different validation assessments were performed using an independent set of reference samples, which were left aside and not used during the EL training/validation steps. As explained in Section 3.3, point-based and area-based accuracy indices were computed both globally and zone by zone. The point-based normalized confusion matrix results are plotted in Figure 11, while the area-based ones are plotted in Figure 12. For ease of reading, the comparison of OA values for the point-and area-based assessment results are also plotted in Figure 13. Corresponding class codes are reported in Table 3.   Table 3. Remote Sens. 2020, 12, x FOR PEER REVIEW 17 of 26 Figure 12. Normalized confusion matrices for the area-base validation of the RF riverscape units map, zone by zone and global assessment. Corresponding class codes are reported in Table 3.   Table 3.
Remote Sens. 2020, 12, x FOR PEER REVIEW 17 of 26 Figure 12. Normalized confusion matrices for the area-base validation of the RF riverscape units map, zone by zone and global assessment. Corresponding class codes are reported in Table 3.  Considering the global assessment computed by analyzing all zones together, the point-based accuracy is OA = 0.893, while the area-based accuracy is OA = 0.915. The PA of each class, showed in the diagonals of the confusion matrices, is higher for the area-based assessment than for the point-based one. For the area-based assessment, all classes have a PA higher than 0.75: OFU = 0.94, RV = 0.82, SB = 0.77 and W = 0.96.
The zonal assessment can provide useful information on the classification performances in different geographical areas. In most zones analyzed, except zone 2004, the area-based accuracies generate a rather significant higher OA as compared to the point-based assessment. For the point-based assessment, most OA values are above 0.875. On the other hand, for the area-based assessment, most OA values except zone 2004, are above 0.90.
From the above discussed normalized confusion matrices, producer accuracies (PA) were extracted for each riverscape units' class and plotted in Figure 14 Some examples of the classification map obtained for different river types found around Europe, are plotted in Figure 15. We can see that in most cases the water and OFU classes are well classified, while the classification of RF and SB classes is sometimes not completely correct. For example, in zone 2002 (Po' river) an agricultural field has been classified as the SB class. In other cases, some RV objects were labeled as OFU class (zone 2007, zone 2009).
However, the F1 scores computed for each class at the global scale, representing per-class performances based on both producer's and user's accuracies, reveal that for all riverscape units we can achieve quite reliable results, with values ranging from 0.79 up to 0.97, if we consider the areabased assessment (Figure 16). Some examples of the classification map obtained for different river types found around Europe, are plotted in Figure 15. We can see that in most cases the water and OFU classes are well classified, while the classification of RF and SB classes is sometimes not completely correct. For example, in zone 2002 (Po' river) an agricultural field has been classified as the SB class. In other cases, some RV objects were labeled as OFU class (zone 2007, zone 2009). Some examples of the classification map obtained for different river types found around Europe, are plotted in Figure 15. We can see that in most cases the water and OFU classes are well classified, while the classification of RF and SB classes is sometimes not completely correct. For example, in zone 2002 (Po' river) an agricultural field has been classified as the SB class. In other cases, some RV objects were labeled as OFU class (zone 2007, zone 2009).
However, the F1 scores computed for each class at the global scale, representing per-class performances based on both producer's and user's accuracies, reveal that for all riverscape units we can achieve quite reliable results, with values ranging from 0.79 up to 0.97, if we consider the areabased assessment (Figure 16).  However, the F1 scores computed for each class at the global scale, representing per-class performances based on both producer's and user's accuracies, reveal that for all riverscape units we can achieve quite reliable results, with values ranging from 0.79 up to 0.97, if we consider the area-based assessment ( Figure 16).
Remote Sens. 2020, 12, x FOR PEER REVIEW 19 of 26 Figure 16. Examples of the classification map for different river types found around Europe.

Advances and Limitations of GEOBIA and EL for Mapping Riverscape Units at Pan-European Scale
One of the objectives of this paper was to test and compare the performances of different EL algorithms for mapping riverscape units. Because of the large area covered in this study, a high within-class heterogeneity of reference samples was remarked. To cope with this, 100 different training/validation selections were performed and therefore 100 different classifications were run. In this way, results can be less influenced by an individual sample selection and at the same time more reliable, due to a higher statistical distribution of results used to compare classifiers performances.

Advances and Limitations of GEOBIA and EL for Mapping Riverscape Units at Pan-European Scale
One of the objectives of this paper was to test and compare the performances of different EL algorithms for mapping riverscape units. Because of the large area covered in this study, a high within-class heterogeneity of reference samples was remarked. To cope with this, 100 different training/validation selections were performed and therefore 100 different classifications were run. In this way, results can be less influenced by an individual sample selection and at the same time more reliable, due to a higher statistical distribution of results used to compare classifiers performances.
The major challenge consisted in the distinction, with a limited number of features, of sediment bars and riparian vegetation units from those landscape units that can be found within the floodplain (i.e., OFU class) that could have highly similar spectral characteristics. Classifying some of these landcover types in automatic way can be a very challenging task (e.g., sediment bars from bare soil fields), even with richer hyperspectral data [39,40]. In this study, only three RGB layers were used, even the infrared band was not accessible from the Copernicus VHR product. Integrating this limited spectral information with a topographic layer (EU-DEM), by developing a hierarchical two-level object-based segmentation, and coupling it with powerful EL algorithms proved to be a robust solution in addressing such kind of mapping problems with limited spectral information. When comparing the OA of the various EL results, we found that RF is significantly outperforming other algorithms, highlighting the potentiality of the bagging technique against boosting and voting in handling such a challenging big-data classification problem. The analysis of the features of importance revealed the usefulness of the topographic information, in particular of the DDTM layer, representing the elevation of any floodplain pixel with respect to the water channel. In line with previously obtained results [34], this information was a key input in mapping riverscape units, and probably it would have not been possible to reach such a satisfiable level of classification accuracy without it.
The rather high RF results obtained from the 100 models (mean OA = 0.893) were subsequently confirmed at the stage of validation of the riverscape unit map at pan-European scale, based on a good geographical distribution of reference samples around the whole Europe and resulting in a global area-based OA of 0.915. The implemented methodology performed well in diverse geographical and topographical contexts across the whole Europe, from alpine valleys to lowland alluvial plains, however only river sections with basin size area larger than 5000 km 2 were considered for analysis, due to the spatial resolution of the analyzed datasets. If higher spatial resolution datasets are to be used in future, smaller river sections could be considered as well and the proposed methodology accordingly adapted, like it has already been done previously in other studies [34].
Comparing the point-and area-based assessment methods for the different zones revealed the influence of polygon's area on the assessment method-as pointed out by other studies in literature [27,69]. The fact that area-based OA values are higher in most of the zones means that in the final map, there are several correctly classified polygons covering large areas, and that wrongly classified polygons are generally covering small areas, because when the polygon's area is taken into account the accuracy is mostly increased. Otherwise, the overall accuracy will be reduced if the opposite would be true. This is particularly the case for zone 2007, where OA is increased from 0.876 up to 0.949 when considering the area of the polygons. The opposite occurs in zone 2004: when considering the area of the polygons, the accuracy is reduced from 0.912 to 0.858, meaning that in this case wrongly classified polygons cover larger areas than correctly classified polygons. However, looking at the producer's accuracy (PA) of individual classes we noticed an opposite behave: an increase when going from point-based to area-based assessment for classes RV and SB, indicating that, on the contrary, there are more small polygons wrongly classified than big polygons correctly classified for both classes, causing the PA increase when the polygon's area is considered. At the same time, we noticed that the opposite occurs for the OFU class: a PA decrease when comparing point-to area-based assessment. In fact, we observed that the wrongly classified OFU polygons cover noticeably big areas which determine a decrease both in the individual class-PA and in the OA, when comparing point-based to area-based values. This analysis underline once more the drawback of the point-based assessment which considers each polygon as an individual sampling unit and the misleading information that can be generated by not taking into account polygon's area within an object-based assessment.
The main source of error noticed in the final map was mostly related to the mixture of classes RV and SB with the OFU class. During the quality control and post-processing phase, we noticed that some misclassification may occur in particularly cloudy regions or in rivers with rather small channel width. However, when considering the polygon's area, most of the class PA are above 0.77; the only problems were found for the RV class in zones 2002 (OA = 0.56) and 2009 (PA = 0.65), and for the SB class in zone 2009 (OA = 0.39). Considering that zone 2009 is rather small, and that it represents only the 3% of the entire mapped area, we can assert that classification results are overall more than acceptable, also demonstrated by high per-class F1 values ranging from 0.79 up to 0.97.

Insights and Future Perspectives on the Applications of the Riverscape Units Map at Pan-European Scale
Mapping riverscape units has never been undertaken at the pan-European scale and represents, in this respect, a novelty that could enable continental spatial and statistical analysis in an objective, continuous and quantitative manner. Several hydromorphological indicators may be extracted and computed along the main river channels of Europe in a spatially continuous way (e.g., confinement, sinuosity, water channel percentage, floodplain-channel connectivity, etc.), which could offer a novel set of tools both for hydromorphologists and river managers to be exploited for several purposes, enriching traditional river characterization and classification practices. In the work of Demarchi et al. 2017 [17], it was demonstrated how a similar remote sensing-based dataset could be exploited for automated river type classification at the regional scale. Afterwards, a combination of different indicators extracted from the remote-sensing map was first used to qualitatively detect process-based anomalies due to human pressures [17] and lately was exploited as a source of quantitative and objective information by Bizzi et al. 2019 [18] to describe historical channel changes occurred at the regional scale and therefore put the basis for a robust assessment of large-scale past and future channel trajectories. The analysis presented here could be replicated as well at the level of individual river basin districts, arguably providing a cost-effective way to monitor the evolution of river landscapes and to analyze catchment scale effects of human impacts, enormously magnifying the capacity of data gauging compared to traditional field surveys and visual image interpretation. Besides, indicators based on the frequency and combination of riverscape features could be used as predictors of ecological status and pressures on rivers in a similar way as shown by Grizzetti et al. 2017 [13].
If a similar analysis were performed to sequential RS acquisitions in the future, the time evolution of hydromorphological parameters could be seamlessly, quantitatively measured along the main river networks of Europe. This will open the way for multi-scale and objective methodological frameworks able to characterize river conditions and monitoring riverscape unit changes, an invaluable resource for discerning the typology and magnitude of continental scales fluvial geomorphological processes, of which the riverscape units represent a signature in time. It will likely form the basis to start questioning established ideas in fluvial geomorphology, possibly moving towards a fully process-based frameworks, as envisaged by some authors few years ago [4,71]. Finally, these novel tools will be an indispensable source of information also for river managers to set restoration targets, foster the design of large-scale cost-effective rehabilitation plans and assessing their effectiveness.

Conclusions
The main objective of this paper was to test the capability of geographical object-based image analysis (GEOBIA) and ensemble learning (EL) algorithms for the automatic classification of riverscape units for the main European river network, based on Copernicus VHR imagery (2.5 m) and EU-DEM (25 m) datasets. For this purpose, a hierarchical GEOBIA methodology developed in a previous work and based on similar dataset [34], was adapted and implemented in the present study. Different EL algorithms, representing the most common bagging, boosting and voting techniques, were compared before running the classification on the whole European dataset and validating its quality with different assessment methods. The conclusions we can draw from the results of this work are as follows: • GEOBIA is a powerful analysis approach allowing at the same time efficient automation and integration of multi-source, multi-resolution satellite data. In this case, the hierarchical object-based segmentation has proven to be a sound and robust technique for combining spectral and topographical information of different spatial resolutions and hence enhancing the capability of low spectral resolution datasets; • Overall, the area-based assessment was a preferred method to validate the quality of an object-based map, such as the riverscape units map, improving the reliability of the classification accuracy metrics. Not taking into account polygon's area can generate misleading information within an object-based assessment; • Random forest proved to be the most efficient classifier among other well-known classifiers tested in this work: extra tree (ET), gradient tree boosting (GTB), extreme gradient boost (XGB) and voting classifier (VC); • The detrended digital terrain model (DDTM), calculated in GIS and representing the height of floodplain pixels with respect to the water channel, proved to be the most important and required feature to classify the investigated classes; The Copernicus VHR layer-although developed as a visual seamless mosaic from pan-sharpened SPOT5 data at 10-m spatial resolution, and with missing near-infrared band-still proved to be a useful layer for automated image analysis and classification if exploited in the proper way, and in combination with other sources of data; • The produced riverscape units map at pan-European scale was a novel product not existing so far, representing a notable source of information for forthcoming studies aimed at fluvial geomorphological processes monitoring at the continental scale. If a similar mapping were applied in the future to sequential RS observations, it could be possible to generate an archive of spatial and topographical riverscape units' characteristics, measured in an objective and quantitative way, through time and continuously along the main European river networks. Such information could help advance scientific understanding of fluvial geomorphology, while providing tools for river managers to design large-scale cost-effective rehabilitation plans and assess their effectiveness.