Next Article in Journal
Alteration in Hydrologic Regimes and Dominant Influencing Factors in the Upper Heilong-Amur River Basin across Three Decades
Previous Article in Journal
The Docking Mechanism of Public and Enterprise Green Behavior in China: A Scenario Game Experiment Based on Green Product Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integration of High-Accuracy Geospatial Data and Machine Learning Approaches for Soil Erosion Susceptibility Mapping in the Mediterranean Region: A Case Study of the Macta Basin, Algeria

1
Water Resources and Sustainable Development Laboratory, Department of Geology, Faculty of Earth Sciences, Badji Mokhtar—Annaba University, P.O. Box 12, Annaba 23000, Algeria
2
Department of Hydraulics, Laboratoire de Recherche des Sciences de L’eau, National Polytechnic School, 10 Rue des Frères OUDEK, El Harrach, Algiers 16200, Algeria
3
Department of Hydroscience and Engineering, Faculty of Civil Engineering, University of Zagreb, Kaciceva 26, HR-10000 Zagreb, Croatia
4
Department of Agrochemistry and Environment, University Miguel Hernández of Elche, 03202 Elche, Spain
*
Authors to whom correspondence should be addressed.
Sustainability 2023, 15(13), 10388; https://doi.org/10.3390/su151310388
Submission received: 5 May 2023 / Revised: 21 June 2023 / Accepted: 27 June 2023 / Published: 30 June 2023

Abstract

:
Erosion can have a negative impact on the agricultural sustainability and grazing lands in the Mediterranean area, especially in northern Algeria. It is useful to map the spatial occurrence of erosion and identify susceptible erodible areas on large scale. The main objective of this research was to compare the performance of four machine learning techniques: Categorical boosting, Adaptive boosting, Convolutional Neural Network, and stacking ensemble models to predict the occurrence of erosion in the Macta basin, northwestern Algeria. Several climatologic, morphologic, hydrological, and geological factors based on multi-sources data were elaborated in GIS environment to determine the erosion factors in the studied area. The conditioning factors encompassing rainfall erosivity, slope, aspect, elevation, LULC, topographic wetness index, distance from river, distance from roads, clay mineral ratio, lithology, and geology were derived via the integration of topographic attributes and remote sensing data including Landsat 8 and Sentinel 2 within a GIS framework. The inventory map of soil erosion was created by integrating data from the global positioning system to locate erosion sites, conducting extensive field surveys, and analyzing satellite images obtained from Google Earth through visual interpretation. The dataset was divided randomly into two sets with 60% for training and calibrating and 40% for testing the models. Statistical metrics including sensitivity, specificity, accuracy, and the area under the receiver operating characteristic curve (ROC) were used to assess the validity of the proposed models. The results revealed that machine learning and deep learning, as well stacking ensemble techniques, showed outstanding performance with accuracy over 98% with sensitivity 0.98 and specificity 0.98. Policy makers and local authorities can utilize the predicted erosion susceptibility maps to promote sustainable use of water and soil conservation and safeguard agricultural activities against potential damage.

1. Introduction

Water erosion is defined as the process in which surface runoff forms channels, establishing a dominant flow zone, and removes the soil from these restricted areas to great depths over short periods of time [1,2,3]. Aggressive soil erosion can lead to formation of deep gullies [4,5]; gully erosion contributes to soil loss rates ranging from 10 to 94 percent of the total sediment output resulting from water-induced erosion. Soil erosion causes important land degradation and significant damage to agricultural lands on a large scale as well as to construction sites such as bridges, roads, and villages [6,7,8,9,10]. Gully erosion is generated by several factors, mainly rainfall caused by extreme climatic events like heavy rains in short times or thunderstorms [6,11]. Besides the runoff, other factors contribute to soil erosion, such as vegetation, soil properties, subsurface flow [11] overland water flow [12], and wind [13,14]. Anthropic activities such as unsustainable agriculture practices [15,16,17], deforestation [18,19], and road construction [20,21] can also contribute to increased soil erosion rates.
Recently, gully erosion has attracted growing interest because of its negative impact on the environment. To study this phenomenon, various factors should be considered, including topographical factors (e.g., elevation, slope, and aspect), hydrological factors (e.g., rainfall, distance to river, and stream density), soil characteristics (e.g., soil type and structure), geological factors (e.g., lithology), and environmental factors (e.g., distance to road) [4,22].
According to the scholarly literature cited in [23], the development of gully erosion inventory maps has been investigated using a range of mathematical models, including both bivariate and multivariate approaches. In addition, the utilization of GIS and remote sensing has been found to be effective in this context. Among the used statistical techniques, there are empirical models such as USLE, RUSLE, and MUSLE that were designed to estimate long-term average annual soil loss caused by water erosion from specific field slopes in range of land-use applications and management systems (i.e., crops, rangeland, recreational areas, etc.) [24,25,26,27].
Despite the established efficacy of the methods, estimating soil loss and sediment discharge is a time-consuming process, which is complicated by the influence of multiple factors [28,29]. Thus, in the last decade some researchers have adopted a variety of mathematical, machine learning, and data-mining approaches that have since been built to map and analyze gullying and associated processes [4,29,30,31,32,33].
Many scientists describe a number of GIS-based models that have been utilized for erosion susceptibility mapping, including the frequency ratio model [34,35], weights of evidence [29,34], linear and logistic regression [31,36,37], and the analytical hierarchy process [38,39].
Avand et al. compared a random forest and a K-nearest neighbor classifier for gully erosion susceptibility mapping in the Hobaturck watershed in Iran [40]. The study was carried out on 242 gully erosion locations and 12 conditioning factors. The ROC-AUC results indicated that the random forest algorithm performed better than the K-nearest neighbor. Rainfall, altitude, and distance from the river were identified as the most influential parameters in mapping gully erosion susceptibility in the study. These findings highlight the significant role of these factors in shaping the spatial distribution and intensity of gully erosion. Saha et al. studied the vulnerability of fully erosion using MLP, MLF bagging, and ML bagging methods in the Hinzolo river basin in India [41]. The study indicated that the use of a hybrid method improved the accuracy of MLP models.
These findings indicate that elevation has the strongest influence on gully erosion susceptibility, followed by rainfall and NDVI. On the other hand, geology, soil type, and sediment transportation index (STI) were found to have relatively less influence on gully erosion susceptibility. Arabameri et al. [42] evaluated the accuracy of using a hybrid artificial intelligence model in mapping gully erosion susceptibility based on the use of 18 conditioning factors in the Kohpayeh-Sagazi river basin in Iran. The results indicated that the hybrid GE-XG boost model has better performance than the other benchmark solution. According to their results, and using gain ratio information, the highest information gain ratios are the soil depth, soil type, TWI, lithology, and NDVI. Additionally, the moderate values of information gain ratio were elevation, plan curvature, slope, TPI, and drainage density.
Another work was carried out by Yang et al. [43] to investigate gully erosion mapping in complex terrain in the Mizhigou watershed, China, by using the random forest (RF), gradient boosted decision tree (GBDT), and extreme gradient boosting (XGBoost) algorithms separately or combined with the statistical weight of evidence model (WoE). During this work, 14 conditioning factors were considered for mapping erosion susceptible area, and the results showed that slope gradient, land use, and altitude were the main factors. The results indicated that the area under curve (AUC) values of different models used were higher than 0.925 indicating high prediction of the models; it was also shown that the AUC values for marching learning regression methods were higher without the WoE model. The XGBoost algorithm performed better than the RF and GBDT, and the main factors for gully mapping were slope gradient, land use, and altitude. Goetz et al. [44] compared statistical and machine learning models for regional-scale landslide susceptibility modeling in Lower Austria. They used spatial K-fold cross-validation and variable importance assessment to evaluate the models. Random forest and bundling classification techniques exhibited the best predictive performances, with overall median estimated AUROC differences ranging from 2.9 to 8.9 percentage points. Slope angle, surface roughness, and plan curvature consistently emerged as highly influential variables. This evaluation framework offers valuable guidance for selecting appropriate modeling techniques for landslide susceptibility mapping.
These results provide valuable insights into the key factors driving gully erosion and can guide future research and management strategies. However, it is important to note that different machine learning models can yield varying results in terms of feature importance. To address this gap and improve the accuracy of mapping areas susceptible to soil erosion, the application of a stacking method can be explored. By integrating multiple machine learning models and considering their respective feature importance, the stacking method has the potential to enhance the accuracy of mapping and prediction for susceptible areas prone to soil erosion. Further research in this direction can contribute to the development of more robust and reliable erosion susceptibility models.
Different studies worldwide showed the high erosion rate in semi-arid regions, which shows the importance of this phenomenon and its impact on water resources and land development (Table 1).
In Algeria, water erosion poses a significant threat to the country’s agricultural productivity, leading to soil loss, depletion of fertilizers, and nutrient degradation. Moreover, the decline in water reserves in Algerian dams is a major concern due to sedimentation caused by erosion in watersheds and accumulation in reservoirs. Bathymetric surveys conducted by the National Agency for Dams and Transfers (ANBT 2005) on 31 dams revealed an average loss of 980 million cubic meters in storage capacity, equivalent to approximately 13% of the initial capacity. This research holds implications for land management, environmental planning, and decision-making processes in Mediterranean regions, while offering the potential to identify high-risk erosion areas and implement targeted control measures. It can also support policymakers in formulating sustainable land-use policies to mitigate soil erosion and promote effective land management practices.
The objectives of this study were: (i) To map erosion prone areas in the Macta basin Northwestern Algeria (ii) Investigate and compare the feature importance of several machine learning and deep learning techniques, while highlighting the significance of using stacking ensemble techniques to improve soil erosion mapping. (iii) To evaluate maximum conditioning factors that significantly control the erosion phenomenon and elaborate on the minimum factors needed to avoid over-fitting problems that could occur in the modeling of soil erosion.

2. Materials and Methods

2.1. Studied Area

The study area comprises the basin of Macta, which includes the wilayas of Mascara, SidiBel Abbes, Mostaganem, Tlemcen, Oran, and Saida. It is located between latitudes 34°34′ and 35°79′ N and between longitude 1°06′ W and 0°56′ E. The Macta basin is bounded by the Mediterranean Sea to the north, by the Tighenif plain and the Saida mountains to the east, by the highlands of Ras El Ma and the lowlands of Maalif to the south, by the plain of Telagh, the mountains of Tessala, and the mountains of Tlemcen to the west (Figure 1). The area covers 14,458 km2, and the perimeter is 717 km. It has a semi-arid climate [50]. It is composed of two tributaries of Mediterranean rivers, the Mekerra wadi to the west and the El Hammam wadi to the east. The topography and altitude vary in such a way that soil erosion occurs in most of the area. For this study, a total of 400 points (200 points of erosion location and 200 points of non-erosion locations) were randomly selected in the studied area.
The analysis of annual average precipitation recorded at rainfall stations in the Macta watershed, from 1980 to 2015, reveals variations in precipitation distribution across the entire basin. The precipitation values range from 289 mm to 486 mm, with an average of approximately 378 mm. The central area of the basin, characterized by higher altitudes, receives higher rainfall compared to the lower areas in the south and north, as altitude increases relative to sea level.
The spatial distribution of land use in the Macta basin exhibits several categories of land occupation. Grasslands dominate the basin, covering 8749 km2, which represents 60.51% of the total surface area. Croplands also occupy a significant portion, covering 4376 km2, accounting for 30.27% of the basin. Forest formations cover approximately 4% of the total area and are primarily found in regions with moderately rugged terrain. The remaining portion of the basin comprises unproductive lands such as rocky areas, bare lands, water bodies, and urbanized zones (Table 2).
The main soil types in the Macta basin are Calcisols, Luvisols, Vertisols, and Leptosols according to the latest version of the World Reference Base for soil resource (WRB). These soil types occupy a substantial area, accounting for 42.31%, 27.99%, 14.26%, and 11.23% of the total surface area, respectively. Cambisols and Kastanozems, on the other hand, constitute only 3.28% of the basin’s area. Therefore, it can be concluded that the majority of the soil in this watershed exhibits moderate resistance to water erosion (Table 3).

2.2. Machine Learning Methods

The methods used in this study are based on artificial intelligence (AI). The techniques used for modeling gully erosion include Adaptive boosting (AdaBoost), Categorical boosting (CatBoost), Convolutional Neural Network (CNN), the stacking method, and geospatial data processing.
  • AdaBoost is machine learning technique initiated by Freund and Schapire [51]; many algorithms are derived from AdaBoost either for classification or applied to regression [52,53]. The AdaBoost algorithm is an iterative approach that seeks to construct a robust classifier through the combination of weak learners generated in prior iterations. The algorithm modifies the learning pattern in accordance with the error returned by the weak learners, with the ultimate goal of achieving a final hypothesis that exhibits low error relative to a given distribution [51,54].
  • CatBoost is new gradient boosting based on decision tree [55], and its characteristic is that it requires small data training comparing to other models and deals with different data formats [56]. The CatBoost model employs the generation of random permutations of the dataset and gradients to inform the selection of an optimal tree structure, thereby enhancing the robustness of the algorithm and mitigating over-fitting [57].
  • CNN is a type of deep learning architecture that imitates the natural visual perception of living beings [58]. CNN comprises several layers, including the convolutional layer, non-linearity layer, pooling layer, and fullyconnected layer. While the convolutional and fullyconnected layers are parameterized, the pooling and non-linearity layers are not.Among the various forms of artificial neural networks, CNN is particularly remarkable [59]. As reported in the literature, the name “Convolutional Neural Network” (CNN) is derived from the mathematical operation of convolution, which involves the multiplication of matrices [60].
  • The stacking method was implemented in this study to improve the performance of developed predictive model. By leveraging ensemble learning methods, such as the stacking method, a meta-model is used to combine predictions generated by several base models [61]. Stacking, which is also referred to as stacked generalization, is a widely used ensemble learning technique that combines multiple base models to improve prediction accuracy. Here, three different algorithms were used as base models: CNN as a powerful deep learning architecture that has the ability to capture spatial features from input data and CatBoost and AdaBoostto combine weak learners to create a strong learner. Categorical boosting is specifically designed for categorical data, while Adaptive boosting is a general-purpose method that can be used for both categorical and numerical data.

2.3. GIS and Geospatial Data Processing

In this section, we provide a detailed description of the GIS environment and geospatial data preprocessing methods employed in our research (Figure 2). The utilization of GIS allowed us to effectively elaborate and process the collected data, which encompassed various climatological, morphological, hydrological, and geological factors. The integration of these factors within the GIS framework facilitated the generation of thematic maps that visually represented their spatial distribution across the study area.
Slope and aspect were calculated in the GIS Environment using a Digital Elevation Model (DEM). The DEM was imported into GIS to initiate the process. The “Slope” and “Aspect” were utilized to derive information about the steepness and orientation of the terrain.The topographic wetness index was calculated using rasters of flow accumulation and slope. TWI provided information on landscape wetness based on topographic characteristics and supported hydrological assessments.
The modified Fournier index values were calculated using the equation of (71) as explained in the Section titled ‘selection of variables’.
The distance from a river and the distance from roads were calculated by uploading the rivers and roads in the studied area and then calculating the distance of each pixel point to these targets using Euclidean distance measurement. The results can be visualized and used for various geospatial analyses, providing valuable information about proximity to rivers and roads separately.
The calculation of the clay mineral ratio was carried out on the selected spectral bands of Landsat 8 (Table 4). The Clay Index formula, defined as (Band 7 − Band 5)/(Band 7 + Band 5), is applied to quantify clay mineral abundance in the selected area.
The Raster Calculator in GIS software was a powerful tool used for performing mathematical operations on raster layers. It provided the capability to create new raster layers by applying various mathematical expressions or formulas to existing raster layers.
To ensure the reliability and accuracy of our analysis, we adopted a comprehensive approach to geospatial data preprocessing. The initial step involved the extraction of relevant factors using an inventory map that classified sites as either land degradation or non-degradation. This inventory map provided the foundation for gathering essential information for subsequent analysis.
Following factor extraction, we conducted a pre-treatment analysis of the statistical data. This involved applying appropriate statistical techniques to evaluate and preprocess the extracted values. The goal was to ensure data quality, consistency, and suitability for further modeling.
Subsequently, we employed classification modeling techniques to develop models capable of predicting erosion occurrence based on the identified factors. Machine learning algorithms were integrated within the GIS environment to facilitate this modeling process. By leveraging the power of machine learning, we aimed to capture the complex relationships and patterns between the factors and erosion susceptibility (Figure 2).
To assess the performance of the developed models, we utilized performance criteria as outlined in our methodology. These criteria allowed us to evaluate the accuracy and reliability of the models in predicting erosion susceptibility within the study area. The performance evaluation provided valuable insights into the strengths and limitations of the models and their applicability for practical use.
Furthermore, we employed feature importance analysis to determine the relative significance of each factor in contributing to the erosion susceptibility models. This analysis allowed us to prioritize and weigh the importance of different factors in the final susceptibility maps generated by the models. By identifying the most influential factors, we aimed to enhance the accuracy and effectiveness of the models’ predictions.
The integration of GIS and machine learning techniques in our research enabled us to leverage the spatial data and develop models that accurately predicted erosion susceptibility. This combination facilitated a comprehensive analysis of the study area, providing valuable insights into the factors influencing erosion occurrence. The incorporation of GIS and machine learning techniques showcased the potential for their synergy in addressing complex environmental issues and supporting informed decision-making processes.

2.4. Model Evaluation

The assessment of model accuracy in this study involved the evaluation of both goodness-of-fit, which reflects how well the model fits the calibration subset, and predictive performance, which measures the model’s ability to accurately predict the validation subset. To quantify model performance, we employed the area under the curve and receiver operating characteristic (AUC-ROC) metrics. As reported by Williams [62], a confusion matrix was generated to compare the final model’s predictions with the actual outcomes of the observations (Table 4). The actual observations were represented in the rows of the matrix, while the columns corresponded to the model’s predictions, and the cell counts indicated the numbers of observations for each variable.
As stated in the literature [63], a confusion matrix is typically a square matrix of size n × n that is used to evaluate the performance of a classifier by comparing its predicted and actual classifications. Here, n represents the number of different classes. For instance, a confusion matrix for binary classification with n = 2 typically has four entries, each with a specific meaning as shown in Table 5:
Accuracy is a metric that measures the overall performance of a classifier and indicates the fraction of total samples that are correctly classified [56]. The formula to calculate accuracy (τ) and the error (ε) are as follows:
τ = (a + d)/(a + b + c + d),
ε = (b + c)/(a + b + c + d),
This study used ROC curve analysis as another statistical technique to evaluate the goodness-of-fit and prediction performance of each model [64]. The ROC curve shape provides an indication of the accuracy of a model, where a curve closer to the upper left corner (AUC = 1) represents higher accuracy, while a curve closer to 0.5 indicates model inaccuracy [65]. According to AUC values, the predictive performance was classified as acceptable for AUC ≥ 0.7, excellent for AUC ≥ 0.8, and outstanding for AUC ≥ 0.9 [66]. To assess the robustness of the models, the positive and negative calibration and validation datasets with optimal pixel size were changed three times [33].

2.5. Selection of Variables

In order to select the most relevant variables for the classification model, a feature selection process was carried out considering 19 variables (elevation, aspect, slope, curvature, hill-shade, stream density, distance from rivers, distance from roads, modified Fournier index, NDVI, topographic wetness index, topographic roughness index, sediment transport index, stream power index, LULC, soil type, geology, lithology, and clay mineral ratio). To avoid the double effect of the same factors on the modeling, the NDVI and sediment transport index (STI) variables were eliminated based on their high correlation with clay mineral ratio and stream power index (Table 6). Then, the top 11 variables were selected using feature importance scores obtained from a CNN (see Table 6). The CNN was trained on the input data and was able to learn the relevant features through a series of convolutional and pooling layers. The feature importance scores were then calculated by evaluating the impact of each feature on the model’s accuracy. The 11 variables with the highest feature importance scores were retained for the final classification model. This approach ensures that only the most informative variables are included in the model for predicting soil erosion in the Macta basin.
To assess the susceptibility of the given area to erosion, a series of parameters and their relationship to the studied phenomena must be considered. It is worth noting that there is no conventional method for the selection of erosion conditioning factors. These vary from place to place depending on the study area and data availability.
In this work, a total of eleven conditioning factors are selected and mapped by using the Geographic Information System (GIS). The selected parameters are defined as follows:
Slope is a parameter that represents the degree of topographic change. The slope and water flow velocity are strongly related, where a higher slope increases the surface runoff velocity. Therefore, the risk of land erosion becomes more important. The slope map was determined using GIS, it was found that mild slopes (<7°) are located in the extreme south, north, and east of the region, meanwhile the steepest slopes (>12°) occur in the center of the basin. The south-west region is characterized by moderated slope values (7–12°) (Figure 3).
Aspect is defined as the slope orientation. In our case the slope directions occur irregularly in the basin, which means that there is no privileged direction in any part of the basin.The Telagh high lands, Tessala Mountains, and Tlemcen mountains in the west of the basinall have slopes that are directed towards the north-east and towards the south-east. The Ghriss plain region and the central massif of the basin are characterized mainly by slopes oriented towards the north-west.
The elevation map shows that the studied region topography is divided into three sections. The first one (elevation < 260 m) is located in low coastal plains in the north. The second section (260 m < elevation < 700 m) is in the center (Figure 3). The last section (elevation > 700 m) is located in the south, where we find the Tlemcen mountains (1412 m in Djebel Ouargla) and Dhaya Mountains (1455 m in Djebel Mezioud).
Land use/land cover (LULC) have been identified as factors that can impact runoff and soil loss [67,68]. Changes in land use and land cover can have a significant impact on erosion-prone areas, as they affect various hydrological processes such as infiltration, evaporation, evapotranspiration, and runoff. This can either accelerate or decelerate the erosion process in watersheds. In this study, the LULC map consists of several thematic maps, including water bodies, trees, flooded vegetation, crops, built-up areas, bare ground, snow/ice, and rangeland (Figure 3).
Topographic wetness index (TWI) is used to evaluate the impact of the topography on the hydrological process. The TWI map shows that low TWI values (3.018–8.4) are present in the central and south-east regions of the basin; therefore, a weak humidity is present in the two regions. Moderate TWI values ranging from 8.4 to 10.84 are present in north, south, and south-west of the basin, indicating the presence of an average humidity (Figure 4). A high humidity is detected in the outlet of and along the course of waterways due to the presence of runoff. The TWI values at this region varied from 10.84 to 25.20.
Modified Fournier index (MFI) has been demonstrated as a crucial factor in accurately estimating the R factor in areas that experience high-intensity rainfall events, which is essential for assessing the risk of soil erosion in the context of future changes in land use and climate, particularly under the Revised Universal Soil Loss Equation (RUSLE) framework [69]. The erosivity index in the current study was determined by utilizing average precipitation data collected from the national agency for hydraulic resources for one hundred and six (106) meteorological stations located in the Macta basin (Table A1). Analytical equations were employed to evaluate R factors based on the amount of rainfall. R values were calculated using the equations described by [70]. The calculated modified Fournier index ranges from 28 to 61, with the highest values located in the middle part of the basin from east to west (Figure 4). The lowest values of the index are found in the southern part of the Macta basin, where a semi-arid climate prevails.
Distance from roads, the proximity of a location to roads plays a crucial role in determining its susceptibility to erosion. This is because roads can hinder the absorption of water into the ground, leading to an increase in surface runoff and erosion-prone areas. As a result, areas situated closer to roads are more vulnerable to erosion due to reduced infiltration rates and faster runoff.
Distance from river can significantly impact the severity and extent of water-induced soil erosion. The distance from the river is considered by many researchers as a key factor in assessing erosion risk. When a location is situated close to a river, it becomes more vulnerable to soil erosion due to the increased water flow volume and velocity, which accelerate the process of erosion by flash floods. As a result, areas located in close proximity to a river are at a higher risk of experiencing soil erosion.
Lithology represents the geological composition of the region, characterized by a variety of quaternary formations with different lithological properties (Figure 5). The degrees of rock compaction and alteration, as well as the occurrence of fractures and joints in the subsurface or exposed rock, have a significant influence on the recharge of fractured aquifers [71,72,73]. The Wadi El Hammam basin is characterized by a diverse geology, with quaternary formations dominating the region. These formations have varying lithological characteristics, with the compactness and alteration state of the rock, as well as the presence of diaclase and joint, playing a crucial role in recharging fractured aquifers. The Ghriss–Mascara plain, which is drained by Wadi Ain Fekane, is mainly composed of detrital formations such as marl-clay and sandy-clay with gravel passages. The region’s massifs are made up of carbonate rocks, including Cretaceous limestone in the Tessala and Beni-Chougrane mountains and limestone and/or Jurassic dolomites in the Tlemcen and Dhaya mountains. In the western zone of the region, horsts and grabens oriented ENE-WSW are present, extending from the Tlemcen Mountains to the Traras massif, with large normal faults bounding the compartments [74].
Geology of the basin of Macta occupies the western part of the Tellian Atlas, encompassing a multitude of geological formations ranging from the primary age to the Quaternary, with marl and limestone facies predominating (Figure 5). Quaternary and Plio-Quaternary formations occupy depressions in the north and northeast as well as the hollows of valleys. The majority of Pliocene formations crop out to the west, while Upper Jurassic and Lower Cretaceous geological formations are found in the center, south, and southeast. The geological formations of the primary age appear in the south-eastern part, consisting mainly of schists and quartzites.
Clay Mineral Ratio (CMR) in soil has a significant impact on soil erosion. The presence of clay minerals in the soil increases its ability to retain water and reduce soil erosion (Figure 5). A higher clay mineral ratio in soil can enhance its ability to resist erosion by slowing down the rate of water infiltration, thus minimizing the velocity of surface runoff. On the other hand, soils with a lower clay mineral ratio are more prone to erosion as they tend to have a higher rate of water infiltration and faster surface runoff velocity [75,76].

3. Results

The CatBoost, AdaBoost, CNN, and stacking ensemble methods were trained on 60% of the data in the training sample and evaluated on 40% of the data in the validation sample. The ROC-AUC curves for each method are presented in (Figure 6). Following this, the susceptibility values were classified into four levels of risk (i.e., very low, low, medium, and high) using the quantile classification method, as previously described in the literature [44]. The resulting risk map is shown in (Figure 7).
Observations indicate that the hybrid method based on the stacking ensemble technique, using the combination of CNN, AdaBoost, and CatBoost, exhibits the highest performance in terms of area under the curve (AUC), achieving a score of 98% (Figure 6). The individual models, CNN, AdaBoost, and CatBoost, exhibit AUC values of 97%, 96%, and 94%, respectively (Table 7). These results demonstrate that machine learning and deep learning techniques can be effectively used for predicting erosion-prone areas, yielding robust and high-performing results.
The AdaBoost model was employed to model erosion occurrence in selected pixels based on the influencing factors. The obtained results (Table 6) demonstrate that the topographic wetness index (TWI) has the highest impact on the erosion modeling in the Macta basin, with an influence percentage of 16%. This is followed by aspect, distance from river, clay mineral ratio, and modified Fournier index, with percentage influences of 14%, 13%, 12%, and 11%, respectively. In contrast, elevation, geology, distance from roads, slope, lithology, and land use/land cover had a lesser influence on the erosion modeling process in the studied region.
The CatBoost model exhibited the lowest performance. The results were still considered outstanding in comparison to the other models. The obtained results (Table 6) indicate that the slope is the most influential parameter associated with erosion phenomena in this model, with an importance value of 60%. This is followed by land use/land cover, modified Fournier index, distance from roads, and distance from river, with importance values of 9.17%, 5.36%, 4.39%, and 4.18%, respectively.
CNN exhibited the best model performance compared to other individual ensemble methods due to its complexity and deep understanding of the phenomenon. The obtained results demonstrate that slope is the most influential parameter associated with erosion, with an importance value of 49.6%. This is followed by land use/land cover (LULC), lithology, and topographic wetness index (TWI), with importance values of 13.82%, 12.91%, and 11.17%, respectively. The remaining factors demonstrated insignificant importance, with percentages varying between 0% and 3.5%.
Stacking ensemble method is of great interest in combining multiple algorithms to model complex phenomena. The obtained results indicate that the use of stacking improves performance and demonstrates superiority over other models. The importance of the stacking ensemble method was calculated based on the aggregation of feature importance scores across all models. One approach to achieve this is to calculate the average importance score for each feature while considering the performance of each base model. The importance of each factor was ranked as follows: slope, topographic wetness index (TWI), LULC, lithology, aspect, distance from river, modified Fournier index, elevation, clay mineral ratio, distance from road, and geology.

4. Discussion

The Macta basin located in north-western Algeria has been experiencing a significant issue with erosion, resulting in substantial soil loss and impeding the implementation of sustainable land management practices. As a result, it is crucial to identify areas of high vulnerability using the most effective modeling techniques to enable the implementation of appropriate soil and water conservation measures.
To achieve our objective of accurately estimating erosion in this area and identify the most appropriate model, we applied four artificial intelligence models (CNN, CatBoost, AdaBoost, and stacking). Our research aimed to determine the primary factors contributing to gully erosion, as this phenomenon is influenced by various factors. Our analysis is based on the hybridization of the used models, combining the ensemble methods and deep learning techniques. This identified that slope is a crucial factor—as expected—additionally, TWI, LULC, and lithology are the most dominant factors in mapping area susceptible to erosion. Although variations between the importance of variables were observed between models, the accuracy of the performance results is still outstanding in the elaborated models. These results are consistent with the results of previous studies [22,39,40].
According to the results obtained from the stacking ensemble model shown in Figure 7, it can be inferred that the very high susceptibility level covers 19.97% of the total area of the basin, while 20.04% of the area is categorized as having a high susceptibility, which is mainly located in grasslands. Moreover, 20.74% of the total area faces moderate susceptibility, 19.86% faces low susceptibility, and 19.38% faces very low susceptibility.
Regarding the results obtained by the CNN and CatBoost models (Figure 7), they showed that 19.60% and 19.55% of the area are classified as very high susceptibility, 20.23% and 20.44% high susceptibility, 20.42% and 19.71% moderate susceptibility, 20.09% and 20.99% low susceptibility, and 19.66% and 19.31% having very low susceptibility, respectively. The CNN and CatBoost exhibited a highly similar distribution of erosion prone area across different LULC classes (Table 8).
In contrast, the results obtained by the AdaBoost model (Figure 7) showed that 19.23% of the area is classified as having very high susceptibility, 20.01% high susceptibility, 20.41% moderate susceptibility, 20.66% low susceptibility, and 19.69% having very low susceptibility.
Based on these results, it can be concluded that more than 60% of the basin is classified as having low to moderate erosion according to the stacking ensemble method, CNN, AdaBoost and CatBoost models, whereas around 40% of the area corresponds to the classes of high to very high erosion. The results of this study are consistent with the findings of Taye et al. [77], who reported significantly higher seasonal runoff coefficient values for grasslands compared to croplands. This aligns with our observation of high to very high susceptibility areas exhibiting similar patterns, indicating the potential influence of land cover on soil erosion dynamics (Table 8).
The understanding of the influential factors by each model differs, which implies that less precise factors can lead to high accuracy models. For a better understanding of this phenomenon, it is better to use complex algorithms for feature selection, which search for the associations between the studied phenomena and the influencing factors.
Creating an erosion susceptibility map is a valuable tool for mitigating the risks of water-induced soil erosion. Areas identified as having high or very high susceptibility, yet to experience erosion, indicate conditions favorable for erosion development. Therefore, these areas are particularly vulnerable to soil erosion, underscoring the significance of including them in the erosion susceptibility map and promoting sustainable practices, both in agriculture and forestry, to preserve soil quality.
Moreover, it is important to consider that the mapping of erosion-susceptible areas using AI techniques reveals variations in the machine’s understanding of the phenomena. The results indicate that the Adaptive boosting model exhibits different influencing factors compared to the CNN and CatBoost models, yet still achieves an overall accuracy higher than 94%.
To further improve the accuracy, a stacking method was employed, which combines the previous machine learning techniques in a hybrid model. This stacking approach yielded exceptionally high results, with an accuracy reaching 99% (Table 7). By leveraging the strengths of multiple models, the stacking method enhances the predictive power and reliability of the erosion susceptibility mapping process.
Overall, the integration of AI techniques and the application of the stacking method have proven to be effective in accurately identifying erosion-prone areas and providing valuable insights for soil conservation and management strategies.

5. Conclusions

This research showcases the significant potential of machine learning and deep learning tools, as well as meta-models, in improving the identification, visualization, and interpretation of areas susceptible to erosion. By focusing on the Macta basin in Algeria, this study successfully developed soil erosion susceptibility maps using four distinct machine learning algorithms: CNN, Adaptive boosting, Categorical boosting (CatBoost), and stacking ensemble methods.
The findings shed light on the factors influencing erosion-prone areas in the Macta basin. AdaBoost identified TWI, aspect, distance from river, and clay mineral ratio as key parameters in mapping erosion-prone areas. CatBoost highlighted slope, LULC, modified Fournier index, distance from road, and distance from river as the most influential factors. The CNN model emphasized slope, LULC, lithology, and TWI as critical factors in mapping erosion-susceptible areas.
Moreover, the utilization of stacking ensemble methods demonstrated exceptional accuracy and significantly improved the prediction and mapping of erosion-susceptible areas. By combining predictions from multiple base models, the stacking ensemble approach provided a more robust and reliable estimation of erosion susceptibility. This hybrid model leveraged the strengths of individual machine learning algorithms while effectively mitigating their weaknesses, resulting in a highly accurate and comprehensive assessment of erosion risks.
The reliable erosion susceptibility maps generated in this study serve as invaluable tools for decision-makers and government officials involved in erosion risk management. The integration of machine learning and deep learning techniques, along with the stacking ensemble method, offers a promising approach to better delineate, visualize, and interpret erosion-prone areas.
Moving forward, future research should focus on refining machine learning algorithms and ensemble methods in erosion modeling to mitigate water-induced soil erosion and enhance sustainable land use planning. An important direction for future studies is the analysis of erosion susceptibility over longer time periods using climatic models. This approach will provide valuable insights into the long-term impact of climate change and enable proactive measures to ensure the sustainability of soil resources. By integrating climate projections, erosion modeling can support decision-making and protect soil resources in watershed management and conservation, promoting sustainable practices. The combination of machine learning, ensemble methods, and climate projections has the potential to enhance our understanding of erosion dynamics and guide effective prevention and mitigation strategies, thereby ensuring the long-term sustainability of land use and ecosystem preservation.

Author Contributions

Conceptualization: S.E.T. and H.B. (Hamza Bouguerra); Methodology: S.E.T., H.B. (Hamza Bouguerra) and S.B.; Software: S.E.T., H.B. (Hamza Bouguerra), H.B. (Hamza Bouchehed) and Y.H.; Validation: S.E.T., A.A. and H.B. (Hamza Bouchehed); Formal analysis: H.B. (Hamza Bouguerra), H.B. (Hamza Bouchehed), Y.H. and A.A.; Investigation: S.E.T. and H.B. (Hamza Bouguerra); Resources: S.E.T., H.B. (Hamza Bouguerra) and N.A.; Writing—original draft preparation: S.E.T., G.G. and J.N.-P.; Writing—review and editing: J.N.-P., G.G. and S.E.T.; Supervision: S.E.T., S.B., H.B. (Hamza Bouguerra) and J.N.-P.; Project administration: H.B. (Hamza Bouguerra) and S.E.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Ministry of Higher Education and Scientific Research (MESRS—Algeria).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We would like to express our sincere gratitude to Hamza Atoui, from the department of electronics, Badji Mokhtar Annaba University, Algeria.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Modified Fournier index results for the 106 meteorological stations located in the Macta basin.
Table A1. Modified Fournier index results for the 106 meteorological stations located in the Macta basin.
StationLambert CoordinateMFI
No.CodeNameX (m)Y (m)(mm)
1110102RAS EL MA177,450139,50029.68
2110201SIDI ALI BEN YOUB186,550192,20044.18
3110202MOULAY SLISSENE MF181,200171,55041.79
4110203EL HACAIBA183,500161,65035.78
5110208SLISSENE CENTRE183,650174,65042.56
6110209TAMFOUSSET192,900183,35035.93
7110305SIDI BEL ABBES194,250214,15046.33
8110306SIDI BRAHIM203,230222,48046.87
9110307BEN BADIS170,850190,80046.72
10110308SIDI ALI BOUSSIDI178,250206,15046.49
11110309HASSI DAHO204,800204,10044.76
12110310LAMTAR181,400203,00045.96
13110311SIDI KHALED188,500207,50045.24
14110312MOSTEFA BEN BRAHIM221,700214,74049.03
15110313TESSALA184,500222,05046.25
16110314AIN TRID193,000226,00046.28
17110315AIN EL BERD208,400234,30048.62
18110317HASSI ZEHANA172,700198,20046.66
19110318SIDI LAHCENE191,200212,90045.19
20110319CAID BELARBI212,900210,70046.07
21110322TABIA186,800196,70044.52
22110328SULLY201,500206,40044.61
23110329LES TREMBLES204,800227,26047.28
24110334CHETOUANE175,300191,25045.96
25110401BOUDJEBAA (Dar Esba)226,200233,00049.16
26110402CHEURFAS Bge232,100238,30049.47
27110501MERINE216,300170,50028.31
28110502TELAGH200,650170,15032.76
29110503TEGHALINET203,450181,60039.19
30110504TENIRA205,500196,25042.29
31110505EL HADJIRA199,400195,60041.72
32110507FERME CHABRIER194,800190,45041.13
33110509SIDI AHMED204,050190,05042.08
34110510DOMAINE ZERROUKI204,650185,00040.34
35110514AIN CHAFIA210,700185,25038.36
36110602OUED SEFFIOUN221,150201,10046.84
37110603AIN FRASS237,750215,00051.73
38110605HASSI EL ABD226,750189,20043.40
39110701TOUAZIZINE M.F. (Dhaya)191,150155,20032.55
40110702DOUAHILA228,700155,35029.51
41110703TOUAZIZINE (Dhaya)196,300157,45030.36
42110802DAOUD YOUB234,500185,00043.25
43110902HASSI AYOUN MF241,750161,25029.18
44110903DOUI THABET252,100181,70034.21
45110904BOU EL FERID245,730169,15031.44
46111002FERME EL HARIG245,590192,45044.10
47111102MEFTAH SIDI BOUBEKEUR259,500195,75042.94
48111103AIN EL HADJAR266,500165,20031.70
49111105SID AMAR263,850195,10041.28
50111106KILOMETRE 50268,450192,00038.31
51111112HAMMAM RABI270,400184,50036.44
52111113DJEBEL KAROUS264,700181,20033.87
53111114REBAHIA FERME 917272,600180,50034.55
54111120FERME DU SYNDICAT263,700165,50031.14
55111128AIN ZERGA FERME273,900176,40033.58
56111130SAIDA ANRH266,750174,40033.80
57111201OUED TARIA262,350204,85045.95
58111202OUM EL DJIRANE283,000173,40034.24
59111203AIN BALLOUL296,850190,55038.90
60111204AIN TIFRIT290,050182,45036.80
61111205AIN SOLTANE281,400188,40037.44
62111208SIDI MIMOUN289,100196,10039.76
63111209BLED EL BEIDA283,300183,10035.98
64111210TAMESNA295,600174,50035.58
65111211SIDI BEN KADOUR MF291,500164,10033.64
66111213EL HAZEM272,200168,60032.20
67111215BOUCHERID MOHAMED276,750172,60032.81
68111217BENIANE275,000203,15044.52
69111219HASNA Dne BOUCHIKHI277,350194,55039.84
70111401MAOUSSA277,300233,92058.17
71111402FROHA266,100226,00054.25
72111404AOUF M.F.287,150211,80045.55
73111405MATEMORE273,970228,35053.66
74111407TIGHENNIF285,100237,90055.75
75111408KHAOUILA282,150243,10060.20
76111409AIN FARES277,500245,10060.20
77111413TIZI261,500227,80054.71
78111414SIDI KADA285,900228,30051.87
79111415AIN FEKAN MN255,600217,20052.53
80111416SIDI ALI KERROUCHA290,100214,60045.66
81111418NESMOTH M.F.289,250219,70049.02
82111422MASCARA Pedo.271,400232,60055.37
83111424GHRISS269,200219,80051.34
84111502SAHOUET OUIZERT247,620215,80050.51
85111503BOU HANIFIA Bge249,000223,60050.38
86111508SFISSEF233,750218,80053.65
87111509HACINE255,550243,50050.20
88111512FERGOUG259,100250,15049.13
89111513BOUHNIFIA MN250,200227,70050.34
90111517MOHAMMADIA SAEF261,750257,37041.49
91111601MACTA245,450279,70041.09
92111603SIG237,720252,00045.20
93111604OGGAZ232,200255,80043.18
94111605BOU HENNI247,500255,40042.44
95111606FORNAKA250,850278,50041.85
96111607SAMOURIA265,950261,20043.27
97111608EL GHOMRI274,000268,00040.85
98111609BOUGHIRAT278,000275,00041.30
99111610MOCTA DOUZ251,250260,20041.41
100111611FERME BLANCHE256,800265,35040.73
101111612BLED TAOURIA277,000284,60044.64
102111614AIN MOUISSY260,300281,50042.87
103111615FORNAKA254,950275,50040.59
104111616MARAIS DE SIRAT269,300275,60039.50
105111617FERME ASSORAIN281,250291,85048.03
106111618SOUAFFLIOS285,200285,65050.88

References

  1. Kettler, T.A.; Doran, J.W.; Gilbert, T.L. Simplified method for soil particle-size determination to accompany soil-quality analyses. Soil Sci. Soc. Am. J. 2001, 65, 849–852. [Google Scholar] [CrossRef] [Green Version]
  2. Bouguerra, H.; Bouanani, A.; Khanchoul, K.; Derdous, O.; Tachi, S.E. Mapping erosion prone areas in the Bouhamdane watershed (Algeria) using the Revised Universal Soil Loss Equation through GIS. J. Water Land Dev. 2017, 32, 13–23. [Google Scholar] [CrossRef]
  3. Tachi, S.E.; Bouguerra, H.; Derdous, O.; Djabri, L.; Benmamar, S. Estimating suspended sediment concentration at different time scales in Northeastern Algeria. Appl. Water Sci. 2020, 10, 118. [Google Scholar] [CrossRef] [Green Version]
  4. Arabameri, A.; Pradhan, B.; Rezaei, K.; Sohrabi, M.; Kalantari, Z. GIS-based landslide susceptibility mapping using numerical risk factor bivariate model and its ensemble with linear multivariate regression and boosted regression tree algorithms. J. Mt. Sci. 2019, 16, 595–618. [Google Scholar] [CrossRef]
  5. Poesen, J.W.; Vandaele, K.; Van Wesemael, B. Contribution of gully erosion to sediment production on cultivated lands and rangelands. IAHS Publ.-Ser. Proc. Rep.-Int. Assoc. Hydrol. Sci. 1996, 236, 251–266. [Google Scholar]
  6. Jahantigh, M.; Pessarakli, M. Causes and effects of gully erosion on agricultural lands and the environment. Commun. Soil Sci. Plant Anal. 2011, 42, 2250–2255. [Google Scholar] [CrossRef]
  7. Bouguerra, H.; Tachi, S.E.; Derdous, O.; Bouanani, A.; Khanchoul, K. Suspended sediment discharge modeling during flood events using two different artificial neural network algorithms. Acta Geophys. 2019, 67, 1649–1660. [Google Scholar] [CrossRef]
  8. Bouhadeb, C.E.; Menani, M.R.; Bouguerra, H.; Derdous, O. Assessing soil loss using GIS based RUSLE methodology. Case of the BouNamoussa watershed–North-East of Algeria. J. Water Land Dev. 2018, 36, 27–35. [Google Scholar] [CrossRef] [Green Version]
  9. Liu, H.; Zhang, T.; Liu, B.; Liu, G.; Wilson, G.V. Effects of gully erosion and gully filling on soil depth and crop production in the black soil region, northeast China. Environ. Earth Sci. 2013, 68, 1723–1732. [Google Scholar] [CrossRef]
  10. Poesen, J.; Nachtergaele, J.; Verstraeten, G.; Valentin, C. Gully erosion and environmental change: Importance and research needs. Catena 2003, 50, 91–133. [Google Scholar] [CrossRef]
  11. Lal, R. Restoring land degraded by gully erosion in the tropics. Soil Restor. 1992, 17, 123–152. [Google Scholar]
  12. Xiong, X.; Zhang, K.; Chen, X.; Shi, H.; Luo, Z.; Wu, C. Sources and distribution of microplastics in China’s largest inland lake–Qinghai Lake. Environ. Pollut. 2018, 235, 899–906. [Google Scholar] [CrossRef]
  13. Skidmore, E.L. Soil loss tolerance. Determ. Soil Loss Toler. 1982, 45, 87–93. [Google Scholar]
  14. Duniway, M.C.; Pfennigwerth, A.A.; Fick, S.E.; Nauman, T.W.; Belnap, J.; Barger, N.N. Wind erosion and dust from US drylands: A review of causes, consequences, and solutions in a changing world. Ecosphere 2019, 10, e02650. [Google Scholar] [CrossRef] [Green Version]
  15. Tarolli, P.; Preti, F.; Romano, N. Terraced landscapes: From an old best practice to a potential hazard for soil degradation due to land abandonment. Anthropocene 2014, 6, 10–25. [Google Scholar] [CrossRef]
  16. Hou, G.; Bi, H.; Huo, Y.; Wei, X.; Zhu, Y.; Wang, X.; Liao, W. Determining the optimal vegetation coverage for controlling soil erosion in Cynodondactylon grassland in North China. J. Clean. Prod. 2020, 244, 118771. [Google Scholar] [CrossRef]
  17. Mostazo, P.; Asensio-Amador, C.; Asensio, C. Soil Erosion Modeling and Monitoring. Agriculture 2023, 13, 447. [Google Scholar] [CrossRef]
  18. Gholami, V. The influence of deforestation on runoff generation and soil erosion (Case study: Kasilian Watershed). J. For. Sci. 2013, 59, 272–278. [Google Scholar] [CrossRef] [Green Version]
  19. Kouassi, J.L.; Gyau, A.; Diby, L.; Bene, Y.; Kouamé, C. Assessing land use and land cover change and farmers’ perceptions of deforestation and land degradation in South-West Côte d’Ivoire, West Africa. Land 2021, 10, 429. [Google Scholar] [CrossRef]
  20. Nyssen, J.; Poesen, J.; Moeyersons, J.; Luyten, E.; Veyret-Picot, M.; Deckers, J.; Haile, M.; Govers, G. Impact of road building on gully erosion risk: A case study from the northern Ethiopian highlands. Earth Surf. Process. Landf. J. Br. Geomorphol. Res. Group 2002, 27, 1267–1283. [Google Scholar] [CrossRef]
  21. Frick, W.F.; Kingston, T.; Flanders, J. A review of the major threats and challenges to global bat conservation. Ann. N. Y. Acad. Sci. 2020, 1469, 5–25. [Google Scholar] [CrossRef] [PubMed]
  22. Mosavi, A.; Sajedi-Hosseini, F.; Choubin, B.; Taromideh, F.; Rahi, G.; Dineva, A.A. Susceptibility mapping of soil water erosion using machine learning models. Water 2020, 12, 1995. [Google Scholar] [CrossRef]
  23. Saha, S.; Sarkar, R.; Thapa, G.; Roy, J. Modeling gully erosion susceptibility in Phuentsholing, Bhutan using deep learning and basic machine learning algorithms. Environ. Earth Sci. 2021, 80, 295. [Google Scholar] [CrossRef]
  24. Renard, K.G.; Freimund, J.R. Using monthly precipitation data to estimate the R-factor in the revised USLE. J. Hydrol. 1994, 157, 287–306. [Google Scholar] [CrossRef]
  25. Wischmeier, W.H.; Smith, D.D. Predicting rainfall erosion losses from cropland east of the rocky mountains: Guide for selection of practices for soil and water conservation. In Agriculture Handbook; United States Department of Agriculture: Washington, DC, USA, 1965; Volume 282, p. 58. [Google Scholar]
  26. Wischmeier, W.H.; Smith, D.D. Predicting rainfall erosion losses: A guide to conservation planning. In Agriculture Handbook; United States Department of Agriculture: Washington, DC, USA, 1978; Volume 537. [Google Scholar]
  27. Almagro, A.; Thomé, T.C.; Colman, C.B.; Pereira, R.B.; Junior, J.M.; Rodrigues, D.B.B.; Oliveira, P.T.S. Improving cover and management factor (C-factor) estimation using remote sensing approaches for tropical regions. Int. Soil Water Conserv. Res. 2019, 7, 325–334. [Google Scholar] [CrossRef]
  28. Hudson, N.W. Instrumentation for studies of the erosive power of rainfall. In Erosion and Sediment Transport Measurement (Proceedings of the Florence Symposium); IAHS Publication: Florence, Italy, 1981; pp. 383–390. [Google Scholar]
  29. Gayen, A.; Saha, S. Application of weights-of-evidence (WoE) and evidential belief function (EBF) models for the delineation of soil erosion vulnerable zones: A study on Pathro river basin, Jharkhand, India. Model. Earth Syst. Environ. 2017, 3, 1123–1139. [Google Scholar] [CrossRef]
  30. Gayen, A.; Pourghasemi, H.R.; Saha, S.; Keesstra, S.; Bai, S. Gully erosion susceptibility assessment and management of hazard-prone areas in India using different machine learning algorithms. Sci. Total Environ. 2019, 668, 124–138. [Google Scholar] [CrossRef]
  31. Cama, M.; Schillaci, C.; Kropáček, J.; Hochschild, V.; Bosino, A.; Märker, M. A probabilistic assessment of soil erosion susceptibility in a head catchment of the Jemma Basin, Ethiopian Highlands. Geosciences 2020, 10, 248. [Google Scholar] [CrossRef]
  32. Zakerinejad, R.; Märker, M. Prediction of Gully erosion susceptibilities using detailed terrain analysis and maximum entropy modeling: A case study in the Mazayejan Plain, Southwest Iran. Geogr. Fis. Din. Quat. 2014, 37, 67–76. [Google Scholar]
  33. Conoscenti, C.; Agnesi, V.; Cama, M.; Caraballo-Arias, N.A.; Rotigliano, E. Assessment of gully erosion susceptibility using multivariate adaptive regression splines and accounting for terrain connectivity. Land Degrad. Dev. 2018, 29, 724–736. [Google Scholar] [CrossRef]
  34. Rahmati, O.; Pourghasemi, H.R.; Melesse, A.M. Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: A case study at Mehran Region, Iran. Catena 2016, 137, 360–372. [Google Scholar] [CrossRef]
  35. Zabihi, M.; Mirchooli, F.; Motevalli, A.; Darvishan, A.K.; Pourghasemi, H.R.; Zakeri, M.A.; Sadighi, F. Spatial modelling of gully erosion in Mazandaran Province, northern Iran. Catena 2018, 161, 1–13. [Google Scholar] [CrossRef]
  36. Chaplot, V.; Le Brozec, E.C.; Silvera, N.; Valentin, C. Spatial and temporal assessment of linear erosion in catchments under sloping lands of northern Laos. Catena 2005, 63, 167–184. [Google Scholar] [CrossRef]
  37. Conoscenti, C.; Angileri, S.; Cappadonia, C.; Rotigliano, E.; Agnesi, V.; Märker, M. Gully erosion susceptibility assessment by means of GIS-based logistic regression: A case of Sicily (Italy). Geomorphology 2014, 204, 399–411. [Google Scholar] [CrossRef] [Green Version]
  38. Hembram, T.K.; Saha, S. Prioritization of sub-watersheds for soil erosion based on morphometric attributes using fuzzy AHP and compound factor in Jainti River basin, Jharkhand, Eastern India. Environ. Dev. Sustain. 2020, 22, 1241–1268. [Google Scholar] [CrossRef]
  39. Bouamrane, A.; Bouamrane, A.; Abida, H. Water erosion hazard distribution under a Semi-arid climate Condition: Case of Mellah Watershed, North-eastern Algeria. Geoderma 2021, 403, 115381. [Google Scholar] [CrossRef]
  40. Avand, M.; Janizadeh, S.; Naghibi, S.A.; Pourghasemi, H.R.; KhosrobeigiBozchaloei, S.; Blaschke, T. A comparative assessment of random forest and k-nearest neighbor classifiers for gully erosion susceptibility mapping. Water 2019, 11, 2076. [Google Scholar] [CrossRef] [Green Version]
  41. Saha, S.; Roy, J.; Arabameri, A.; Blaschke, T.; TienBui, D. Machine learning-based gully erosion susceptibility mapping: A case study of Eastern India. Sensors 2020, 20, 1313. [Google Scholar] [CrossRef] [Green Version]
  42. Arabameri, A.; ChandraPal, S.; Costache, R.; Saha, A.; Rezaie, F.; Seyed Danesh, A.; Pradhan, B.; Lee, S.; Hoang, N.D. Prediction of gully erosion susceptibility mapping using novel ensemble machine learning algorithms. Geomat. Nat. Hazards Risk 2021, 12, 469–498. [Google Scholar] [CrossRef]
  43. Yang, A.; Wang, C.; Pang, G.; Long, Y.; Wang, L.; Cruse, R.M.; Yang, Q. Gully erosion susceptibility mapping in highly complex terrain using machine learning models. ISPRS Int. J. Geo-Inf. 2021, 10, 680. [Google Scholar] [CrossRef]
  44. Goetz, J.N.; Brenning, A.; Petschko, H.; Leopold, P. Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling. Comput. Geosci. 2015, 81, 1–11. [Google Scholar] [CrossRef]
  45. Jebari, S.; Berndtsson, R.; Bahri, A.; Boufaroua, M. Spatial soil loss risk and reservoir siltation in semi-arid Tunisia. Hydrol. Sci. J.–J. Sci. Hydrol. 2010, 55, 121–137. [Google Scholar] [CrossRef]
  46. Balasubramani, K.; Veena, M.; Kumaraswamy, K.; Saravanabavan, V. Estimation of soil erosion in a semi-arid watershed of Tamil Nadu (India) using revised universal soil loss equation (rusle) model through GIS. Model. Earth Syst. Environ. 2015, 1, 10. [Google Scholar] [CrossRef]
  47. Mohapatra, R. Application of revised universal soil loss equation model for assessment of soil erosion and prioritization of ravine infested sub basins of a semi-arid river system in India. Model. Earth Syst. Environ. 2022, 8, 4883–4896. [Google Scholar] [CrossRef]
  48. Falcão, C.J.L.M.; Duarte, S.M.D.A.; Da Silva Veloso, A. Estimating potential soil sheet Erosion in a Brazilian semiarid county using USLE, GIS, and remote sensing data. Environ. Monit. Assess. 2020, 192, 47. [Google Scholar] [CrossRef]
  49. Bouzeria, H.; Tachi, S.E.; Bouguerra, H.; Derdous, O.; Benmamar, S. Evaluating the Effect of Land Use Land Cover Changes on Soil Loss Distribution in the Seybouse Basin, Northeastern Algeria. Dokl. Earth Sci. 2023, 510, 335–348. [Google Scholar] [CrossRef]
  50. Derdous, O.; Tachi, S.E.; Bouguerra, H. Spatial distribution and evaluation of aridity indices in Northern Algeria. Arid Land Res. Manag. 2021, 35, 1–14. [Google Scholar] [CrossRef]
  51. Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
  52. Mathanker, S.K.; Weckler, P.R.; Bowser, T.J.; Wang, N.; Maness, N.O. AdaBoost classifiers for pecan defect classification. Comput. Electron. Agric. 2011, 77, 60–68. [Google Scholar] [CrossRef]
  53. Solomatine, D.P.; Shrestha, D.L. AdaBoost. RT: A boosting algorithm for regression problems. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, Budapest, Hungary, 25–29 July 2004; Volume 2, pp. 1163–1168. [Google Scholar]
  54. Margineantu, D.D.; Dietterich, T.G. Pruning adaptive boosting. ICML 1997, 97, 211–218. [Google Scholar]
  55. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 31, 1–11. Available online: https://proceedings.neurips.cc/paper_files/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdf (accessed on 4 May 2023).
  56. Jabeur, S.B.; Gharib, C.; Mefteh-Wali, S.; Arfi, W.B. CatBoost model and artificial intelligence techniques for corporate failure prediction. Technol. Forecast. Soc. Change 2021, 166, 120658. [Google Scholar] [CrossRef]
  57. Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: Gradient boosting with categorical features support. arXiv 2018, arXiv:1810.11363. [Google Scholar]
  58. Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef] [Green Version]
  59. O’Shea, K.; Nash, R. An introduction to convolutional neural networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]
  60. Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a convolutional neural network. In Proceedings of the 2017 International Conference on Engineering and Technology, Antalya, Turkey, 21–23 August 2017; pp. 1–6. [Google Scholar]
  61. Nguyen, K.A.; Chen, W.; Lin, B.S.; Seeboonruang, U. Comparison of ensemble machine learning methods for soil erosion pin measurements. ISPRS Int. J. Geo-Inf. 2021, 10, 42. [Google Scholar] [CrossRef]
  62. Williams, B.K. Adaptive management of natural resources—Framework and issues. J. Environ. Manag. 2011, 92, 1346–1353. [Google Scholar] [CrossRef]
  63. Visa, S.; Ramsay, B.; Ralescu, A.L.; Van Der Knaap, E. Confusion matrix-based feature selection. MAICS 2011, 710, 120–127. [Google Scholar]
  64. Lasko, T.A.; Bhagwat, J.G.; Zou, K.H.; Ohno-Machado, L. The use of receiver operating characteristic curves in biomedical informatics. J. Biomed. Inform. 2005, 38, 404–415. [Google Scholar] [CrossRef] [Green Version]
  65. Akgün, A.; Türk, N. Mapping erosion susceptibility by a multivariate statistical method: A case study from the Ayvalık region, NW Turkey. Comput. Geosci. 2011, 37, 1515–1524. [Google Scholar] [CrossRef]
  66. Hosmer, D.W.; Lemesbow, S. Goodness of fit tests for the multiple logistic regression model. Commun. Stat.-Theory Methods 1980, 9, 1043–1069. [Google Scholar] [CrossRef]
  67. Mitchell, J.F.B.; Manabe, S.; Meleshko, V.; Tokioka, T. Equilibrium climate change and its implications for the future. Clim. Chang. IPCC Sci. Assess. 1990, 131, 172. [Google Scholar]
  68. Nunes, A.N.; De Almeida, A.C.; Coelho, C.O. Impacts of land use and cover type on runoff and soil erosion in a marginal area of Portugal. Appl. Geogr. 2011, 31, 687–699. [Google Scholar] [CrossRef]
  69. Abdelsamie, E.A.; Abdellatif, M.A.; Hassan, F.O.; El Baroudy, A.A.; Mohamed, E.S.; Kucher, D.E.; Shokr, M.S. Integration of RUSLE Model, Remote Sensing and GIS Techniques for Assessing Soil Erosion Hazards in Arid Zones. Agriculture 2022, 13, 35. [Google Scholar] [CrossRef]
  70. Arnoldus, J.M.J. Methodology used to determine the maximum potential average annual soil loss due to sheet and rill erosion in Morocco. Food Agric. Organ. Soils Bull. 1977, 34, 39–51. [Google Scholar]
  71. Krishnamurthy, J.; Venkatesa Kumar, N.; Jayaraman, V.; Manivel, M. An approach to demarcate ground water potential zones through remote sensing and a geographical information system. Int. J. Remote Sens. 1996, 17, 1867–1884. [Google Scholar] [CrossRef]
  72. Krishnamurthy, J.; Mani, A.; Jayaraman, V.; Manivel, M. Groundwater resources development in hard rock terrain-an approach using remote sensing and GIS techniques. Int. J. Appl. Earth Obs. Geoinf. 2000, 2, 204–215. [Google Scholar] [CrossRef]
  73. Bhuiyan, C.; Singh, R.P.; Flügel, W.A. Modelling of ground water recharge-potential in the hard-rock Aravalli terrain, India: A GIS approach. Environ. Earth Sci. 2009, 59, 929–938. [Google Scholar] [CrossRef]
  74. Maizi, D.; Boufekane, A.; AitOuali, K.; Aoudia, M. Identification of potential area of recharge using geospatial and multi-criteria decision analysis in the Macta watershed (Western Algeria). Arab. J. Geosci. 2020, 13, 127. [Google Scholar] [CrossRef]
  75. Sabins, F.F. Remote sensing for mineral exploration. Ore Geol. Rev. 1999, 14, 157–183. [Google Scholar] [CrossRef]
  76. Van der Meer, F.D.; Van der Werff, H.M.; Van Ruitenbeek, F.J.; Hecker, C.A.; Bakker, W.H.; Noomen, M.F.; Van Der Meijde, M.; Carranza, E.J.M.; De Smeth, J.B.; Woldai, T. Multi-and hyperspectral geologic remote sensing: A review. Int. J. Appl. Earth Obs. Geoinf. 2012, 14, 112–128. [Google Scholar] [CrossRef]
  77. Taye, G.; Poesen, J.; Wesemael, B.V.; Vanmaercke, M.; Teka, D.; Deckers, J.; Goosse, T.; Maetens, W.; Nyssen, J.; Hallet, V.; et al. Effects of land use, slope gradient, and soil and water conservation structures on runoff and soil loss in semi-arid Northern Ethiopia. Phys. Geogr. 2013, 34, 236–259. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Location map of the study area.
Figure 1. Location map of the study area.
Sustainability 15 10388 g001
Figure 2. Flowchart methodology for erosion susceptibility mapping.
Figure 2. Flowchart methodology for erosion susceptibility mapping.
Sustainability 15 10388 g002
Figure 3. Conditioning factors: (a) elevation, (b) slope, (c) aspect, and (d) land use/land cover.
Figure 3. Conditioning factors: (a) elevation, (b) slope, (c) aspect, and (d) land use/land cover.
Sustainability 15 10388 g003
Figure 4. Conditioning factors: (a) distance from river, (b) distance from roads, (c) modified Fournier index, and (d) topographic wetness index.
Figure 4. Conditioning factors: (a) distance from river, (b) distance from roads, (c) modified Fournier index, and (d) topographic wetness index.
Sustainability 15 10388 g004
Figure 5. Conditioning factors: (a) clay mineral ratio, (b) lithology, and (c) geology.
Figure 5. Conditioning factors: (a) clay mineral ratio, (b) lithology, and (c) geology.
Sustainability 15 10388 g005
Figure 6. ROC-AUC curves.
Figure 6. ROC-AUC curves.
Sustainability 15 10388 g006
Figure 7. Erosion susceptibility maps using (a) AdaBoost, (b) CatBoost, (c) CNN, and (d) stacking ensemble.
Figure 7. Erosion susceptibility maps using (a) AdaBoost, (b) CatBoost, (c) CNN, and (d) stacking ensemble.
Sustainability 15 10388 g007
Table 1. Statistics of erosion rate in semi-arid regions around the world.
Table 1. Statistics of erosion rate in semi-arid regions around the world.
AreaMethodErosion RateReference
Tunisian Dorsal, TunisiaReservoir siltation measurementAverage rate 14.5 t ha−1 year−1
Maximum rate 36.4 t ha−1 year−1
[45]
AndipattiTaluk, IndiaRUSLEAverage rate 5.26 t ha−1 year−1
Maximum rate 95.54 t ha−1 year−1
[46]
Madhya Pradesh, IndiaRUSLEAverage rate 6.42 t ha−1 year−1
Maximum rate 179.9 t ha−1 year−1
[47]
Machados County, BrazilUSLEAverage rate 8.11 t ha−1 year−1
Maximum rate above 20 t ha−1 year−1
[48]
Seybouse basin, AlgeriaRUSLEAverage rate (20 y): 13 t ha−1 year−1
Maximum rate:over 50 t ha−1 year−1
[49]
Table 2. Distribution of land use/land cover area in the Macta basin.
Table 2. Distribution of land use/land cover area in the Macta basin.
LULC ClassArea (km2)Area (%)
Grasslands874960.51
Croplands437630.27
Forest6074.20
Urbanization4322.99
Bare lands2801.94
Water Bodies130.09
Total14,458100
Table 3. Distribution of soil types in the Macta basin.
Table 3. Distribution of soil types in the Macta basin.
Soil TypesArea (km2)Area (%)
Calcisols6116.7542.307
Luvisols4047.2327.993
Vertisols2062.4314.265
Leptosols1623.9211.232
Cambisols325.162.249
Kastanozems149.211.032
Phaeozems51.760.358
Regosols50.890.352
Fluvisols15.610.108
Acrisols14.460.100
Solonchaks0.580.004
Total14,458100
Table 4. Geospatial data sources.
Table 4. Geospatial data sources.
ParameterSourceLinkSpatial ResolutionTemporal Periods
MFINational Agency for Hydraulic Resources (ANRH)--1980–2015
Soil ClassSoil Gridshttps://soilgrids.org/ (accessed on 4 May 2022)190 m2016
LULCEsri Sentinel-2https://livingatlas.arcgis.com/landcover/ (accessed on 4 May 2022)10 m2022
DEMUSGS Earth Explorerhttps://earthexplorer.usgs.gov/ (accessed on 4 May 2022)1 Arc-Second2014
Satellite ImageryLandsat 8 OLI/TIRShttps://earthexplorer.usgs.gov/ (accessed on 4 May 2022)30 m05/2022
Topographic and Geologic MapsNational Institute of Cartography-1/50,000-
Table 5. Example of a confusion matrix for n = 2.
Table 5. Example of a confusion matrix for n = 2.
Predicted NegativePredicted Positive
Actual negativeab
Actual positivecd
where a = number of correct negative predictions, b = number of incorrect positive predictions, c = number of incorrect negative predictions, d = number of correct positive predictions.
Table 6. Feature selection using Convolutional Neural Network (CNN).
Table 6. Feature selection using Convolutional Neural Network (CNN).
RankingFeaturesCorrelationImportance (%)
1Slope 59.65
2LULC 5.31
3Lithology 4.36
4TWI 3.68
5MFI 3.55
6Geology 3.00
7D_F_Roads 2.49
8CMR 2.25
9D_F_Rivers 1.94
10Elevation 1.89
11Aspect 1.87
12Stream_Den 1.64
13HillShade 1.63
14Soil_Type 1.59
15NDVICMR = 76%1.44
16TRI 1.40
17Curvature 1.07
18SPISTI = 82%0.81
19STI 0.43
Table 7. Evaluation metrics of the machine learning models.
Table 7. Evaluation metrics of the machine learning models.
StatisticsCatBoostAdaBoostCNNStacking
TP 82796978
TN64627278
FP4942
FN1010152
Sensitivity0.890.890.820.98
Specificity 0.940.870.950.98
F1 score0.920.890.880.98
Recall0.890.890.820.98
Precision 0.950.900.950.98
Table 8. Statistics of Erosion prone area in different Land Use/Land Cover classes.
Table 8. Statistics of Erosion prone area in different Land Use/Land Cover classes.
ModelLULC ClassGrasslands Croplands Forest UrbanizationBare Lands Water Bodies Total
AdaBoostVery Low Risk5.9311.480.391.240.600.0519.69
Low Risk10.657.900.740.900.470.0120.66
Moderate Risk13.335.260.950.480.370.0120.41
High Risk14.833.601.050.230.300.0020.01
Very High Risk15.872.001.050.120.190.0019.23
CatBoostVery Low Risk2.8614.570.331.240.290.0319.31
Low Risk9.459.140.781.010.600.0220.99
Moderate Risk14.423.530.910.380.460.0119.71
High Risk16.732.131.100.210.270.0120.44
Very High Risk17.150.881.070.120.330.0019.55
CNNVery Low Risk2.6215.190.281.290.240.0319.66
Low Risk8.948.570.990.950.610.0220.09
Moderate Risk14.533.891.080.420.480.0120.42
High Risk17.001.800.930.200.290.0120.23
Very High Risk17.500.790.890.100.320.0119.60
StakingVery Low Risk3.1714.170.281.320.400.0419.38
Low Risk8.968.670.770.950.500.0219.86
Moderate Risk14.374.411.100.410.450.0120.74
High Risk16.402.111.050.190.290.0120.04
Very High Risk17.700.900.980.100.290.0019.97
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bouguerra, H.; Tachi, S.E.; Bouchehed, H.; Gilja, G.; Aloui, N.; Hasnaoui, Y.; Aliche, A.; Benmamar, S.; Navarro-Pedreño, J. Integration of High-Accuracy Geospatial Data and Machine Learning Approaches for Soil Erosion Susceptibility Mapping in the Mediterranean Region: A Case Study of the Macta Basin, Algeria. Sustainability 2023, 15, 10388. https://doi.org/10.3390/su151310388

AMA Style

Bouguerra H, Tachi SE, Bouchehed H, Gilja G, Aloui N, Hasnaoui Y, Aliche A, Benmamar S, Navarro-Pedreño J. Integration of High-Accuracy Geospatial Data and Machine Learning Approaches for Soil Erosion Susceptibility Mapping in the Mediterranean Region: A Case Study of the Macta Basin, Algeria. Sustainability. 2023; 15(13):10388. https://doi.org/10.3390/su151310388

Chicago/Turabian Style

Bouguerra, Hamza, Salah Eddine Tachi, Hamza Bouchehed, Gordon Gilja, Nadir Aloui, Yacine Hasnaoui, Abdelmalek Aliche, Saâdia Benmamar, and Jose Navarro-Pedreño. 2023. "Integration of High-Accuracy Geospatial Data and Machine Learning Approaches for Soil Erosion Susceptibility Mapping in the Mediterranean Region: A Case Study of the Macta Basin, Algeria" Sustainability 15, no. 13: 10388. https://doi.org/10.3390/su151310388

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop