Soil Erosion Type and Risk Identification from the Perspective of Directed Weighted Complex Network

Ping Tu; Qianqian Zhou; Meng Qi

doi:10.3390/su15031939

,

and

¹

Key Laboratory of Spatial Data Mining & Information Sharing, Ministry of Education, Fuzhou 350108, China

²

The Academy of Digital China (Fujian), Fuzhou University, Fuzhou 350108, China

³

College of Computer and Data Science, Fuzhou University, Fuzhou 350108, China

^*

Author to whom correspondence should be addressed.

Sustainability2023, 15(3), 1939;https://doi.org/10.3390/su15031939

Version Notes

Order Reprints

Abstract

Identifying the geographic distribution and erosion risks of various soil erosion regions are critical inputs to the implementation of extensive and effective land protection planning. To obtain more accurate and sufficient erosion information on a large scope, this paper introduced the complex network theory to quantitatively simulate the topographic spatial structure and topological relationship of the erosion area. The watershed was selected as the basic study unit and the directed weighted complex network (DWCN) of each watershed was constructed from DEM data. The directed weighted complex network factor (DWCNF) of each watershed was calculated by the DWCN. After combining DWCNFs with existing SEEF, the soil erosion types and risks of sample areas in the Chinese Loess Plateau were identified by the random forest model. The results show that in both typical and atypical sample areas, the identification performance of soil erosion by combining DWCNFs with existing SEEFs was performed better than that by employing only the DWCNFs or SEEFs dataset. It is suggested that the quantitative description of the spatial structure and topological relationship of the watershed from the perspective of a complex network contributes to obtaining more accurate soil erosion information. The DWCNF of structural entropy, betweenness centrality, and degree centrality were of high importance, which can reliably and effectively identify the types and risks of soil erosion, thus providing a broader factor reference for relevant research. The method proposed in this paper of vectoring terrain into complex network structures is also a novel sight for geological research under complex terrain conditions.

Keywords:

soil erosion; directed weighted complex network; directed weighted complex network factor; digital elevation model; soil erosion effective factor

1. Introduction

Soil erosion is a global problem that seriously threatens soil and water resources [1,2,3]. Its destructive impact may be implicit in the short term, but its impact cannot be ignored from a long-term perspective [4]. This is because the failure to adopt corresponding and appropriate measures in time to process soil erosion may cause varied adverse consequences, such as decreased soil fertility and crop production [5,6], muddy floods and siltation of reservoirs [7], loss of vital ecosystem services [8], and associated economic costs [9]. These environmental and socioeconomic consequences not only result in extreme property and land productivity losses [10] but also hamper efforts to achieve food security and improve livelihoods [11,12,13].

Given the harmful effects of soil erosion, managing and reducing these consequences is essential for sustainable development goals. Based on this, the study of soil erosion risk has attracted worldwide attention for decades. With the rise in computing power and geographical information system (GIS) capabilities, the highly automated process-based model is widely applied in soil erosion assessment [14].

Consequently, machine learning-based models are gaining popularity for automatic modeling given their superior predictive capacity for modeling nonlinear correlations between GIS data [15,16,17]. For instance, Conoscenti et al. [18] evaluated the gully erosion sensitivity of Sicily using the logistic regression model. Pourghasemi and Kerle [19] proposed an integrated model of an artificial neural network (ANN) and a support vector machine (SVM), which achieved agreeable effects in gully erosion sensitivity mapping (i.e., AUC train = 0.897 and AUC test = 0.879). Rehman et al. [16] conducted a novel intelligent modeling of the k-value of sandy soil by employing various machine learning-based algorithms on a large dataset. These models have the advantages of high precision, high automation, and high calculation speed.

Although the existing studies have achieved outstanding results in the study of soil erosion sensitivity and risk assessment, most study areas are still focusing on the small-scope assessment of land plots or watersheds [6,20,21,22,23,24,25,26]. Thus, the scarcity of reliable large-scope erosion assessments has forced researchers to carry out these studies expeditiously [11,13,27,28]. Recently, Borrelli et al. [29] provided a pan-European wind erosion assessment that delineates the spatial patterns of land susceptibility. Quinton et al. [30] assessed the impact of soil erosion on carbon, nitrogen, and phosphorus cycles based on a global dataset. Such studies provide a more detailed perspective on the spatial pattern of large-scale soil erosion, which also illustrates its necessity [31,32].

However, the majority of existing studies emphasize the assessment and prediction of single erosion type sensitivity and risk, while little attention has been paid to the identification of multiple erosion types and risks. The ongoing scarcity of complete and detailed soil type and risk information on a large scope represents a major challenge to national or regional planning toward reducing the soil erosion threat, and it is also likely to weaken the willingness of international organizations and policymakers to invest in decisions that contribute to alleviating soil erosion [33]. Therefore, measuring the geographical distribution of various soil erosion types on a large scope and determining the key areas most vulnerable to erosion are critical inputs to the implementation of extensive and effective land protection planning.

To obtain soil erosion information on a large scope, the first step is to solve the problem of the basic study unit. Previous studies on a small scope generally select equal-size grids as the basic study unit, calculate multiple terrain derivatives and erosion factors based on neighborhood windows analysis, quantify erosion characteristics, and then employ unsupervised classification or machine learning models for sensitivity and risk prediction [19,22,29,34]. It should be noted that the scope of grid-based observations is limited, thus it is hardly contributing to accurately measuring and quantifying the risk of soil erosion on a large regional scope [35]. In addition, the measurement model is sensitive to the resolution (i.e., grid size) [36]. However, TOPMODEL [37] and MIKE-SHE [38] have achieved acceptable experimental results by setting the grid to a strictly equal size. Vázquez et al. [39] pointed out that the model can achieve better performance in the grid size of 600 m than that of 300 m and 1200 m. Garosi et al. [40] obtained five datasets with pixel sizes ranging from 2 m to 30 m for a comparative study and found that the model performed best when the grid size was 10 m. These illustrate that the grid size setting for various regions is an unavoidable and crucial issue in grid-based research. Given this, a feasible way is to select the map unit as the basic unit, such as plot, field, and watershed [36,41]. Arabameri et al. [42] applied the GWR model to segment the study area into homogeneous map units with spatial autocorrelation and confirmed that appropriate map units contributed to model accuracy. Compared with other map units, the watershed has evident geographical implications in the surface form, which is the snapshot of geology development [43,44,45]. Zhao et al. [46] also proved that the watershed-based recognition strategy showed better performance than that of the object-based.

However, the scarcity of erosion data is also the decisive problem that cannot identify erosion types in a broad scope [11,35]. It is obvious that selecting watersheds as a basic study unit cannot solve this problem. Although land use, lithology, topographic humidity index, rainfall, sediment concentration, and soil texture are effective factors in the assessment of sensitivity and risk level of soil erosion [40,47,48,49,50,51,52], the occurrence and development mechanism of soil erosion is diverse, and the spatial distribution characteristics of it are evidently various [53]. These suggested that determining a large scope of erosion risk only by the topographic/surface features represented by the traditional soil erosion effective factors (SEEF) has an explicit improvement space. Considering that the surface runoff and the gully form have a profound impact on soil erosion [54], soil erosion also shows a strong correlation to the spatial structure of topography. Given this, to improve the prediction or identification accuracy of erosion types and risks, it is necessary to deeply understand and quantitatively express the spatial structure and topological relationship of the terrain in the area where the soil erosion is located.

To solve this problem, this paper introduces the complex network method. As a mature method to simulate natural and social phenomena, the main advantage of the complex network method is that it can emphasize the relationship between structural characteristics and dynamics [55,56]. Its application is growing rapidly, which has attracted the attention of researchers in the field of geosciences and achieved fruitful results [57,58,59,60]. One should note that the differentiable spatial structure, topological relationship, and watershed composition possessed by each watershed is clear, which is suitable for complex network construction.

In sum, to obtain accurate and sufficient soil erosion information (i.e., risk, type) on a large scope, this paper selected the Chinese Loess Plateau as the study area and the watershed as the basic study unit. Each watershed was regarded as a directed weighted complex network (DWCN), and the topographic elements in the watershed were regarded as complex network elements: the selected gully feature node as the network node, the runoff as the edge, and the runoff direction as the direction and the elevation between nodes as the weight. Based on this, the corresponding directed weighted complex network factors (DWCNFs) of each watershed were calculated, then combined with the traditional soil erosion effective factors (SEEFs) and input into different machine learning models (i.e., artificial neural network, light gradient boosting machine, and the random forest and extreme gradient boosting algorithms) as datasets, and the identification accuracy of different factor datasets were compared by evaluating metrics. Different from the existing methods that require obtaining soil erosion information from extensive and sufficient SEEFs, the DWCN proposed in this paper was constructed only by DEM data, which provides a broader factor selection for soil erosion and improves the data availability.

2. Materials and Methodology

2.1. Materials

2.1.1. Study Area

For the identification of erosion types, the limited scope of the study area/plot size is hardly contributing to covering typical multiple soil erosion types (i.e., water erosion, wind erosion, and freeze-thaw erosion) sample areas, which makes it incapable to provide sufficient training samples and validation samples for research. Based on this, this paper chose the Loess Plateau as the large-scope study area. It has the highest soil erosion rate in the world where over a billion tons of soil are annually washed into the Yellow River [61]. Simultaneously, this paper chose the watershed as the basic study unit. The differentiable spatial structure and diverse composition that the watershed owned are the major reasons for its selection, and it can also reflect various state changes during the development and evolution of the soil erosion area.

2.1.2. Soil Erosion Type Mapping

The soil erosion inventory map (SEIM) is a 1:500,000 scale map that includes information on the erosion risks and types of the Loess Plateau, which was obtained digitally based on the results of the field survey. This map mainly possesses three types: water erosion, wind erosion, and freeze-thaw erosion, while the risk can be classified into three levels: high, medium, and low.

According to the SEIM, soil erosion samples of various types (i.e., water, wind, and freeze-thaw) and various risks (i.e., low, medium, and high) were selected (in Table 1). Given the concentrated and small-scale spatial distribution of freeze-thaw erosion in the Loess Plateau, three types of freeze-thaw erosion samples with a sample size of 80 for low, medium, and high risk were selected. The specific spatial location of the sample distribution of the Loess Plateau is shown in Figure 1.

Table 1. Soil erosion typical sample.

Figure 1. Spatial location and sample distribution of the Loess Plateau. (Note: the site with the stark colorful mark is a typical sample site, which obtained the soil erosion information for the SEIM in advance, and the test site with the black dot mark is the atypical sample site selected at random, which also obtained the information from SEIM).

2.1.3. Soil Erosion Effective Factors

Selecting reasonable and appropriate soil erosion effective factors (SEEFs) is the critical input to erosion risk and type identification, which critically influence the identification accuracy. According to existing research [44,45,46,47,48], this paper selects 13 SEEFs for which data are available, including aspect, slope, surface roughness, plan curvature, profile curvature, surface cutting depth (SCD), normalized difference vegetation index (NDVI), clay content, silt content, sand content, soil type, land use type, and annual mean rainfall. The descriptions and sources of these data can be viewed in Table 2. Note that the data format of these factors is the grid. To obtain the erosion factors’S value of each watershed, the zonal statistics tool provided by ArcPy was employed for calculation. In addition, land use type and soil type count the value that occurs most in each watershed.

Table 2. Data description and source.

2.2. Methodology

The flowchart of the methodology is presented in Figure 2. As shown, the flowchart can be divided into four steps:

Figure 2. Flowchart employed in this paper.

Watershed polygon extraction;
Calculation of DWCNFs;
Calculation of SEEFs;
Soil erosion and risk identification based on SEEFs and DWCNFs;

2.2.1. Watershed Polygon Extraction

From the perspective of geomorphology, a geomorphic area can be regarded as a whole system consisting of a large number of fundamental watershed units. Watershed delineation can be performed using a hydrological approach based on watershed critical area thresholds. Determining the minimum critical area of the watershed is critical to the identification results of soil erosion because the watershed is used as the basic study area in this paper. When the watershed area is excessively large, it may contain more than two soil erosion types, and such mixed erosion sample areas may introduce errors in the identification accuracy. Given this, the area of each watershed is controlled at 10 km² in this paper for the reason that studies have shown that the topographic index of the Loess Plateau tends to stabilize after a threshold of 10 km² [56]. Then, the watershed polygons were extracted by hydrological operations such as depression detention, flow direction calculation, and the flow accumulations calculation in Figure 2.

2.2.2. DWCNFs Calculation

Other research employing plenty of erosion factors to obtain soil erosion information show promise, but could be difficult to scale across a large scope given their poor data availability [57]. To overcome this problem, a DWCN was applied in this paper. This is because the acquisition of the DWCN and its factors is only obtained from DEM data, which reduces the data availability requirement. In addition, this paper abstracted the morphology of the watershed, mainly gullies, into a series of points, lines, and surfaces, and used a DWCN to construct it. Therefore, a DWCN can also achieve the effect of quantitatively measuring the topological structure and spatial relationship of topographic elements within the watershed.

The construction process of the DWCN and the calculation of its factors for each watershed can be divided into three steps: (1) delineation of gullies in each watershed; (2) DWCN construction; (3) calculation of DWCNFs.

Delineation of Gullies in Each Watershed

After the operations of depression filling, flow direction calculation, and flow calculation on DEM data, the watershed was extracted according to the equal area. Then, the gully terrain feature points and gully lines of each watershed were extracted. The extraction process is as follows: (1) The mean change point method was used to calculate the optimal grid threshold of each watershed, and then the value was used to extract the gully lines. (2) The Strahler method [58] was used to classify the gully lines. (3) The topological relationship between the intersection points on the gully lines was determined, and the gully feature node was extracted (i.e., the gully confluence node, the gully source node, and the gully outlet node).

Among them, the determination of the optimal grid threshold in Step (2) is crucial to the construction of the DWCN, because it is directly related to the sparsity of the network edge and node density. As Figure 3 shows, the mean change point method was employed to compare the relationship between gully lines’ sparsity and the grid threshold, thus determining the optimal grid threshold of the gully. The calculation steps of the mean change point method were as follows:

Figure 3. Mean change point difference of grid threshold.

The sequence

T

of gully lines’ density was calculated under different grid thresholds (50, 100, 150, …1000),

T \in (T_{1}, T_{1}, \dots, T_{N}),

where

N

is the number of grid thresholds.

A new sequence

X_{i}

was obtained from the logarithm of

T

, and then the mean value

\bar{X}

of

X_{i}

and the discrete statistic

S

was calculated. The equation (Equation (1)) is as follows:

S = \sum_{i = 1}^{N} {(X_{i} - \bar{X})}^{2}

(1)

The sequence X_i was divided into

X_{k 1} \in (X_{1}, X_{2}, \dots, X_{k - 1})

and

X_{k 2}

∈(

X_{k}

,

X_{k + 1}

, …, X_n) according to the value of

k

(n > k > 1), and then the new discrete statistic S_k was calculated. The equation is as follows:

S_{k} = \sum_{i = 1}^{k - 1} {(X_{i} - \bar{X_{k 1}})}^{2} + \sum_{i = 1}^{n} {(X_{i} - \bar{X_{k 2}})}^{2}

(2)

In Equation (2),

\bar{X_{k_{1}}}

and

\bar{X_{k_{2}}}

represent the mean value of sequences

X_{k_{1}}

and

X_{k_{2}}

. The difference between S and

S_{k}

is the mean change point difference. When the difference is the largest, the corresponding grid threshold is optimal, because it can retain the topographic features of the watershed to the greatest extent. As shown in Figure 3, when the grid threshold was 250, the difference was the largest, thus, 250 was selected as the optimal grid threshold to extract the gully lines.

To show the effect of gully line extraction with different thresholds, the undersized and oversized thresholds were visualized in Figure 4. When the number of grids was 50 (Figure 4c), the gully lines were excessively dense, and when the number was 1000 (Figure 4a), the gully lines were excessively sparse. When the number was 250, the extraction effect of the gully lines in the watershed was appropriate (Figure 4b).

Figure 4. Gully lines under different thresholds. (Note: the oversize threshold is 1000, the undersize threshold is 50, and the optimal threshold is 250.) (a) Oversize threshold. (b) Optimal threshold. (c) Undersize threshold.

DWCN Construction

The DWCN contains four elements: node, edge, direction, and weight, which were extracted from the topographic elements in the watershed. The specific steps are as follows:

Step 1: point extraction. The number, location and relationship (e.g., topology and connectivity) of the gully feature points determine the geometric form and runoff trend of the watershed. Therefore, as shown in Figure 5a, the gully feature points (i.e., gully confluence nodes, gully source points, and gully outlet points) were selected as the network nodes of the DWCN. Suppose there were

i

gully confluence nodes,

j - 1

gully source points, and a gully outlet point in

K

network nodes. Consider K as a set of network nodes

V

and it can be expressed as

V = (V_{c}, V_{s}, V_{k}) = (V_{c_{1}}, V_{c_{2}}, \dots, V_{s_{i - 1}}, V_{s_{i}}, V_{s_{i + 1}}, \dots, V_{s_{j}}, V_{k})

, where

V_{c}, V_{s}, V_{k}

represents the gully confluence nodes, gully source points, and gully outlet points.

Figure 5. Flow chart of DWCN construction.

Step 2: edge and direction extraction. As shown in Figure 5b, the gully lines passing through the gully feature points were abstracted as the edge of the DWCN and the runoff direction of the edge was abstracted as the direction. The edge can be denoted by

E D = (E D_{1}, E D_{2}, \dots, E D_{m}), E D \in ℝ^{2 \times m}

.

E D

contains three direction expressions: from the gully source point to the gully confluence node

E D_{i} = (V_{s_{i}}, V_{c_{j}})

from the gully confluence node to the gully confluence node

E D_{i + 1} = (V_{c_{j - 1}}, V_{c_{j}})

, and from the gully confluence node to the gully outlet point

E D_{m} = (V_{c_{j}}, V_{k})

.

Step 3: edge weight extraction.

E D_{i}

is the direction vector containing two gully feature nodes, thus the elevation difference corresponding to the two gully feature points in

E D_{i}

was used as the weight of the edge. The weight set

W

can be denoted by

W

=

(W_{11}, W_{12}, \dots, W_{i j}), W_{i j} > 0, i j \in V .

Step 4: DWCN construction. The DWCN for each watershed can be expressed as

D W C N = (V, E D, M)

, where

V

is the gully feature node-set,

E D

is the direction edge set, and

W

is the weight set.

DWCNF Calculation

The calculation formula of nine DWCNFs selected in this paper is shown in Table 3, and the selection basis of the DWCNFs is as follows: Node density and edge density can measure the sparsity of the node and edge. Structural entropy was used to measure the evolution degree of soil erosion for each watershed. The more it tends to the later evolution stage, the more stable the watershed system is, and the smaller the entropy value is [53]. Complex networks use centrality to estimate the importance of nodes in the network [59]. Thus, node degree centrality, betweenness centrality, and closeness centrality were selected.

Table 3. Factors multi-collinearity and decision.

Among them, the degree centrality represents the tightness between the gully feature nodes. The betweenness centrality measures a node’s ability to control the transmission of information (i.e., flow accumulation). The closeness centrality measures the accessibility of a node to other nodes, while the edge betweenness measures the importance of edges to the network. The assortativity coefficient [60] examines connectivity trends of nodes with similarity values, while the average neighborhood degree [61] calculates the average degree of the neighborhood of each node. The formulas for node density (ND) and edge density (ED) are as follows:

N D = \frac{k}{A}

(3)

E D = \frac{m}{A}

(4)

In Equations (3) and (4),

k

is the sum of nodes,

A

is the watershed area and

m

is the sum of edges. The formula for degree centrality (

C_{D}

) is as follows:

D (V_{i}) = \sum_{n, m \in V} d_{n m} C_{D} (V) = \frac{1}{k - 1} \sum_{i = 1}^{k} D (V_{i})

(5)

In Equation (5), when the node

V_{n}

and node

V_{m}

are connected, the value of

d_{n m}

is 1, else it is 0.

V

is the set of nodes, and

k

is the number of nodes in

V

.

D (V_{i})

represents the degree of node

i

, which means the number of nodes connected to node

i

. The formula for structural entropy (SE) is as follows:

H_{i} = \frac{D (V_{i})}{\sum_{i = 1}^{k} D (V_{i})} S E = - \sum_{i = 1}^{k} H_{i} \ln (H_{i})

(6)

In Equation (6),

H_{i}

represents the importance of node

i

and the

D (V_{i})

has the identical meaning as Equation (5). The formula for closeness centrality (CC) is as follows:

C C = \sum_{i = 1}^{k} \frac{k - 1}{\sum_{j = 1}^{k} P_{i j}}

(7)

In Equation (7),

P_{i j}

is the minimum distance between node

i

and node

j

. The formula for betweenness centrality (BC) is as follows:

\begin{array}{l} B C (i) = \sum_{n}^{k} \sum_{m}^{k} \frac{P N_{n m} (i) / P N_{n m}}{(k - 1) (k - 2) / 2}, n \neq m \neq k, n < k \\ B C = \sum_{i = 1}^{k} B C (i) \end{array}

(8)

In Equation (8),

P N_{n m} (i)

is the number of shortest paths passing through node

i

between nodes

n

and

m

.

P N_{n m}

is the number of shortest paths for nodes

n

and

m

. The formula for the assortativity coefficient (r) is as follows:

r = \frac{k^{- 1} \sum D (V_{i}) D (V_{j}) - {| k^{- 1} \sum D (V_{i}) + D (V_{j}) |}^{2}}{k^{- 1} \sum \frac{1}{2} (D (V_{i}) + D (V_{j})) - {| k^{- 1} \sum \frac{1}{2} (D (V_{i}) + D (V_{j})) |}^{2}}

(9)

In Equation (9),

k

is the sum of edges. When

r

> 0, the DWCN has assortativity, and when

r

< 0 it has disassortatvity. The

D (V_{i})

has the identical meaning as Equation (5). The formula for the average neighbor degree (

N D_{A}

) is as follows:

\begin{array}{l} N D_{A} (i) = \frac{1}{| V (i) |} \sum_{j \in V, j \neq i} w_{i j} D (V_{i}) \\ N D_{A} = \frac{1}{k} \sum_{i = 1}^{k} N D (i) \end{array}

(10)

In Equation (10),

w_{i j}

is the weight between node

i

and

j

.

N D (i)

is the neighbor degree of node

i

. The formula for edge betweenness (EB) is as follows:

E B = \frac{1}{k} \sum_{i = 1}^{k} \sum_{n, m \in V (n \neq m)}^{k} \frac{P N_{n m} (i)}{P N_{n m}}

(11)

In Equation (11),

k

is the sum of edges.

P N_{n m} (i)

has the identical meaning as Equation (8).

2.2.3. Soil Erosion Type and Risk Identification Based on SEEFs and DWCNFs

Multicollinearity Diagnostics

Multi-collinearity refers to the scarcity of independence of factors and the strong associations between them, which can confuse the analysis and thus result in the instability of the identification results. To solve this problem, the collinearity between erosion factors is controlled by the variance inflation factor (VIF) and tolerance (TOL). If the VIF was greater than 10 and the TOL value was less than 0.100, the factor was rejected, otherwise, it was confirmed [40,62,63].

Machine Learning Method

Given the comprehensive effects of topographic features, soil textures, climate conditions, and other factors on soil erosion, simulating and calculating the nonlinear relationship between these factors and soil erosion is a critical input to soil erosion type and risk identification. To solve this problem, various machine learning methods were selected, which can also eliminate the subjective factors caused by the manual weighting, thus reflecting the real situation of the evaluated objects to a great extent.

According to existing research [31,60,61,62,63], this paper selected four models (i.e., artificial neural network, light gradient boosting machine, and the random forest and extreme gradient boosting algorithms), which are suitable for quantitative simulation of soil erosion risk. DWCNFs, SEEFs, and the combination of two factors were input into four machine learning methods to identify erosion types and risks, and their identification performance was compared through various evaluation metrics. Given the robustness of machine learning methods, four-fold cross-validation was conducted in the training process.

(1): Artificial Neural Network (ANN)

The network layer of an ANN adopts error forward propagation, and all network layers are fully connected. The network structure of an ANN includes an input layer, hidden layer, and output layer.

(2): Light Gradient Boosting Machine (LGBM)

A LGBM is a novel gradient-enhanced decision tree framework, which has the advantages of high efficiency, fast training speed and distributed support. Given its promising performance in different machine learning tasks such as sorting, classification, and regression, it has been widely employed in different academic fields.

(3): Random Forest (RF)

RF is a method based on a classification tree. The method uses bootstrapping to extract multiple samples from the original samples, and then combine the predictions of multiple decision trees to obtain the final classification result through voting.

(4): Extreme Gradient Boosting algorithm (XGBoost)

The central idea of the XGBoost is to continuously split the features and grow into several trees. Each generated tree is a new function to fit the last residual. Finally, the calculated values of each leaf node are added to the final prediction result.

Evaluation Metrics

The confusion matrix is a matrix to measure the prediction of the classification problem model. It is one of the most fundamental and intuitive methods to show the performance of the algorithm [59]. In the confusion matrix, true positive (TP) represents predicting a positive class as a positive class; true negative (TN) means to predict a negative class as a negative class; false positive (FP) means to predict a negative class as a positive class; false negative (FN) means that the positive class is predicted to be negative. The confusion matrix can be extended to the following evaluation indicators:

Precision refers to the proportion of correct classification among all positive results predicted by the model. The formula is as follows:

p r e c i s i o n = \frac{T P}{T P + F P}

(12)

Recall refers to the proportion of correct classification in all real samples. The formula is as follows:

r e c a l l = \frac{T P}{T P + F N}

(13)

The F1 score is a weighted average of the precision and the F1 score, which also is a comprehensive index to consider the balance between the two metrics. The formula is as follows:

F 1 = \frac{2 \times p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l}

(14)

Accuracy refers to the probability that the classified result is consistent with the actual results. The formula is as follows:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(15)

The Kappa coefficient is a metric used to test whether the predicted results of the model are consistent with the actual results. The formula is as follows:

K a p p a = \frac{P \sum_{i = 1}^{n} T_{i i} - \sum_{i = 1}^{n} T_{i} T_{+ i}}{P^{2} - \sum_{i = 1}^{n} T_{i +} \times T_{+ i}}

(16)

In Equations (12)–(16), TP, FP, TN, and FN are the metric in the confusion matrix.

P

is the total number of samples,

n

is the total number of types,

T_{i i}

is the number of samples correctly classified (i.e., the sum of diagonal elements of the confusion matrix),

T_{i +}

is the sum of the elements in row

i

, and

T_{+ i}

is the sum of the elements in column

i

.

3. Results

3.1. Optimal Erosion Factors Determination

If the TOL is less than 0.1 and VIF is greater than 10, the two factors are strongly correlated and require to be rejected [47,63]. The selection of SEEFs and DWCNFs is shown in Table 3. The VIF of the 5 factors of surface cut depth, edge betweenness, node density, closeness centrality, and standard deviation of elevation was greater than 10 and the TOL was less than 0.1. These factors were screened out in the factor’s dataset.

3.2. Optimal Machine Learning Method Determination

The input dataset was the typical samples in Table 1. Among them, the number of test datasets is 276 (30% of the total samples), and the number of training datasets is 644 (70% of the total samples). To compare the performance of different machine learning methods, the training dataset and test dataset of each model were consistent. The optimal model was found by grid search with four cross-validations. The confusion matrix of the identification results of various methods is shown in Figure 6. The higher the diagonal value of the confusion matrix, the better the performance of the method.

Figure 6. Confusion matrix of identification results under different model. (Note: A, B and C represent water erosion, wind erosion and freeze-thaw soil erosion type; 1, 2 and 3 represent low, medium and high soil erosion risk. For example, A1 represents low-risk water erosion). (a) RF. (b) LGBM. (c) ANN. (d) XGBoost.

Among them, the accuracy of RF was 0.873, and the Kappa coefficient was 0.845; the LGBM has an accuracy of 0.813 and a Kappa coefficient of 0.848; the accuracy of ANN was 0.773, and the Kappa coefficient of ANN was 0.723; the accuracy of XGBoost was 0.575, and the Kappa coefficient was 0.477. In summary, RF was the optimal machine-learning method for soil erosion type and risk identification. It can also be obtained from Figure 6 that various methods have fewer misclassifications for soil erosion types, and the majority of misidentifications were soil erosion risk identification errors.

3.3. Comparison of Identification Performance

3.3.1. Comparison of Various Evaluation Metrics on Typical Samples

To determine whether the DWCNFs contributed to improving the accuracy of watershed erosion type and risk identification, the optimal machine learning model (RF) was applied to calculate the evaluation metrics of various factors. Factors can be divided into three categories: SEEF only, DWCNF only, and the combination of the two above factors.

The input dataset was the typical samples (in Table 1) and the precision, recall, and F1 score were calculated for performance comparison. Precision can report the false alarm rate of the model (in Figure 7a), the combination of two factors has achieved a better identification effect than the SEEF and DWCNF in the results except for C2 and C3. The calculation result of recall is shown in Figure 7b. The combination of the two factors performed better than the SEEFs and DWCNFs in the identification results except for B2. F1 is the weighted average of precision and recall, which can reflect the comprehensive performance of the factor. As shown in Figure 7c, the comprehensive performance of the combination of two factors is better than the SEEF and DWCNF in the majority of identification results. Among them, the mean F1 score of the combination of two factors was 0.793, which identification performance was best. The SEEF is relatively poor with an F1 score of 0.663. The DWCNF performed worse with an F1 score of 0.385.

Figure 7. Comparison of identification performance of different factors in typical sites.

Besides, compared with the SEEF and DWCNF, the combination of the two factors also has better performance in overall accuracy and Kappa coefficient, in which accuracy was improved by 10.5% than SEEF and by 33.10% to DWCNF. These suggested that a quantitative description of spatial structure and topological relationship of the watershed from the complex network contributed to obtaining more accurate soil erosion information.

3.3.2. Comparison of Various Evaluation Metrics on Atypical Samples

To determine whether the DWCNF contributes to improving the accuracy of watershed erosion types and risk identification, in addition to conducting experiments on soil erosion samples selected in Table 1, this paper also randomly selected 130 sample sites on the Loess Plateau to further verify the performance of DWCNFs in identification. The spatial distribution of these sites is shown in Figure 1. The RF model was employed to identify, which input datasets were the SEEFs, the DWCNFs, and the combination of two factors. The results are illustrated in Figure 8.

Figure 8. Comparison of identification results of different factors in atypical sites. (Note: Type misidentified was the wrong type identification but correct risk identification; Risk misidentified is reverse of the Type misidentified; All misidentified was the wrong results for both type and risk; Correctly identified is reverse of All misidentified). (a) The result of the combination of two factors dataset. (b) The result of SEEFs. (c) The result of DWCNFs.

In Figure 8a, the correctly identified results of the combination of two factors accounted for 70.76%, including 6 type misidentified sites, 28 risk misidentified sites, and 4 all misidentified sites. Its accuracy was 0.707 and the Kappa coefficient was 0.625. In Figure 8b, the correctly identified results of the SEEFs accounted for 61.54%, including 4 type misidentified sites, 36 risk misidentified sites, and 10 all misidentified sites. Its accuracy was 0.615 and the Kappa coefficient was 0.504. In Figure 8c, the correctly identified results of DWCNFs accounted for 26.92%, including 22 type misidentified sites, 25 risk misidentified sites, and 48 all misidentified sites. Its accuracy was 0.269 and the Kappa coefficient was 0.126.

It suggested that in the atypical test site, the combination of two factors still performed better than only SEEFs and only DWCNFs, which also confirms the applicability of the DWCN proposed in this paper in various regions.

3.4. Importance of Assessment of Erosion Factors

In the same way that different models have different classification performances, different erosion factors cannot make equal contributions to soil erosion. To quantitatively measure the importance of various erosion factors on the accuracy of soil erosion type and risk identification, the Boruta algorithm [45,47] built-in RF model was adopted to evaluate.

The results of measurement are shown in Figure 9, slope (14.83), NDVI (12.39), structural entropy (9.855), and betweenness centrality (9.012) were the factors of great importance. Followed by soil type (8.21), degree centrality (7.65), land use type (7.19), and sand percent (6.56), which were also greater than the average importance of erosion factors (5.56). However, annual mean rainfall (1.81), surface roughness (1.78), aspect (1.64), plan curvature (1.49), edge density (1.29), assortativity coefficient (1.09) and profile curvature (1.06) have a low effect in soil erosion, of which their importance was less than 2%.

Figure 9. Importance ranking of erosion factors.

4. Discussion

4.1. Contribution and Availability of DWCN

Although existing studies have specifically quantified the risk or sensitivity of soil erosion from a small scope based on the grid, it is crucial to obtain soil erosion information from a large scope. This is because the spatial heterogeneity of soil erosion is difficult to reflect in a small scope [63].

To obtain soil erosion information on a large scope, the first step is determining the basic study unit. The current research units can be concluded into two types: grid/pixel-based and map-unit-based. If the former was selected as the large-scope study unit, the key problem to be solved was the sensitivity of the grid size to the accuracy of the model. Given this, this paper selected the watershed as the basic map unit as it has a differentiable structure and stark boundary.

In addition, the traditional SEEF emphasizes describing the basic characteristics of terrain or surface, which loses sight of the spatial structure and topological relationship of the watershed. The identification of soil erosion risks and types only by the existing SEEFs was less contributed to improving the result accuracy, and the research will be limited by the extremely high demand for data [11,41].

To solve this problem, this paper introduces the concept of a complex network: taking the dynamic watershed as the whole network system, the topographic elements, gully lines, and runoff direction of each watershed were extracted only based on DEM data, which was used to construct a DWCN and calculate its factors, which broadens the optional range of soil erosion factors.

Whether in typical or atypical sample sites (see Section 3.3.1 and Section 3.3.2), the methods proposed in this paper were successful and can effectively improve the identification accuracy of erosion types and risks. That this may prove fruitful is motivated by the following reasons: For the spatial relationship, the DWCN emphasizes the attribute characteristics and spatial topological relationship of various gully feature points. The differences in feature points in spatial relationships represent different spatial structures and the development of the watershed. For the network structure, the internal structure and interrelationship of the watershed system are quantitatively analyzed in the form of a network, and the structural characteristics of gully feature points are re-examined.

In addition, the machine learning-based model used in this paper also contributes to a more stable identification result. For example, in the comparison of the identification of typical samples, the identification accuracy of the method using the combination of two factors and the optimal machine learning-based model is up to 80.11%. The same superior identification performance is also observed in the results of atypical samples, where the accuracy is also maintained at 70% and above, indicating that the identification of soil erosion type and risk using machine learning with the combination of DWCNFs and SEEFs is effective. Simultaneously, in the comparison of the identification results of the combination of two factors and SEEFs, the machine learning-based model is also capable of bridging the information gap between the erosion factors in some cases, thus, the performance of the SEEFs compared to the combination of two factors may not be significantly degraded, as shown in Figure 7a for types A1 and C1.

4.2. Importance of DWCNF

In Section 3.4, structural entropy, degree centrality and betweenness centrality were at the relative forefront of the importance ranking (Figure 9). It suggested that the DWCNFs, which quantitatively describe the spatial structure of the watershed, have agreeable performance for the identification of soil erosion types and risks.

For the network centrality, degree centrality and betweenness centrality were employed to describe. However, the importance of betweenness centrality (9.012) is greater than the degree centrality (7.65). For the different importance presented by different DWCNFs, a possible explanation for this is that for nodes with the same degree, the degree centrality is unable to distinguish these nodes, while betweenness centrality can handle this according to the shortest path. As Figure 10 shows, node A is connected with nodes X, Y, and C, and the degree is 3. Node C is connected with nodes A, B, and D, and its degree is the same as node A. Measured by degree centrality, the importance of A is consistent with that of C. However, the actual situation is that if the runoff flows from the gully source node (X, Y, J, and K) to the gully outlet node (D), it must through node C, thus, the load (i.e., flow accumulation) carried by C is higher than that of A. In this case, betweenness centrality can distinguish this confusing condition while degree centrality cannot, thus, it is more important than degree centrality.

Figure 10. The relationship between nodes.

For the edge density and assortativity coefficient with extremely low importance, Gaussian fitting was employed to fit their frequency distribution. In Figure 11a, the fitting coefficient (

R^{2}

) of edge density can reach 0.9896, indicating that Gaussian fitting can explain 98.96% of the frequency distribution. Simultaneously, 91.12% of the edge density is extremely concentrated between [1.75, 3.75], indicating that the change in edge density is unapparent, thus, its response to soil erosion is not sensitive. These situations are understandable because this paper ensures that the area of each watershed was consistent when it was extracted, and the edge of the DWCN, i.e., the gully lines, has determined the optimal threshold through the mean change point method. Through the above two steps, the numerical variation of edge density will be comparatively stable.

Figure 11. Frequency distribution of erosion factors. (a) Edge density. (b) Assortativity coefficient.

For the assortativity coefficient, as shown in Figure 11b, the fitting coefficient (

R^{2}

) can reach 0.8996, explaining that the fitting degree of the Gaussian curve to the matching coefficient was also high. The numerical variation of the assortativity coefficient was more obvious than the edge density, but its importance was lower than the edge density. The reason for this is how the assortativity coefficient is measured: if the nodes in the network tend to be connected with similar nodes, the network is assortative, and the assortativity coefficient is greater than 0, otherwise, it is disassortative. In this paper, the connectivity between nodes of each DWCN shows no difference, including two kinds of disassortative connections: from the gully source node to the gully confluence node and from the gully confluence node to the gully outlet node, and an assortative connection from the gully confluence node to gully outlet node. Based on this, the assortativity coefficient of the DWCN in 95.72% of watersheds is less than 0, which not only proves the disassortativity of the DWCN but also explains the low importance of the assortativity coefficient, because it was unable to distinguish the node connection differences of each DWCN.

4.3. Limitations

Comparing the atypical sample sites (in Section 3.3.1) with the typical sample sites (in Section 3.3.2), it can be found that the identification accuracy of the atypical sample (0.707) is lower than that of the typical sample (0.845). The reason is partly driven by the random selection of atypical sample sites. Although each watershed area was controlled in an equally small scope as much as possible, each watershed will inevitably cover other erosion types and risks.

In other words, there may exist various risks of wind erosion, water erosion, and other mixed landforms in a watershed. Given this, this paper employed the type and risk with the highest proportion of grids in each watershed as the sample value of the atypical sample area, which result in the accuracy gap between the atypical sample area and the typical sample area. However, further improvements are possible. For example, in a watershed where multiple erosion types are mixed, a wider range of erosion types, such as wind and water mixed erosion, can be selected as the sample value.

5. Conclusions

To obtain accurate and sufficient soil erosion information in a large scope, this paper selected the differentiable watershed in map units as the basic study unit. From the perspective of spatial structure and the topological relationship of watershed landform, the directed weighted complex network was proposed to simulate and quantify, and its corresponding factor, the DWCNF, was calculated. Based on the machine learning model, the soil erosion types and risks of typical and atypical sample areas in the Loess Plateau were identified by the DWCNF and existing the SEEF. The main conclusion is as follows:

In the typical sample area, the identification performance of the combination of two factors was better than that of the dataset only using the DWCNF or SEEF. Compared with SEEF and DWCNF combination, the overall accuracy of the combination of two factors was improved by 10.5% and 33.1%, indicating that the quantitative description of watershed spatial structure and topological relationship from the perspective of the complex network contributed to obtaining a more accurate soil erosion information.
In the randomly selected atypical sample areas, the combination of two factors still shows better identification accuracy than only the SEEF and only the DWCNF, which reflects the regional applicability of the DWCN.
The RF model performed better than other models and was suitable for soil erosion type and risk identification based on the DWCN.
In the importance assessment, structural entropy, betweenness centrality, and degree centrality were part of the factors with high importance in the DWCNFs, which can reliably and effectively identify the types and risks of soil erosion, thus providing an extensive and sufficient selection of factors for soil erosion.

In summary, from the perspective of watershed spatial structure and composition, the method proposed in this paper is relatively novel. Different from existing studies that are dependent on plentiful data sources, the method proposed in this paper has the following two contributions: The acquisition and construction of the DWCNF was only obtained from DEM data, which can provide a more extensive factor selection for areas with insufficient data. Second, DWCNFs can quantitatively measure the spatial structure and topological relationship of the watershed, and combine them with the traditional SEEFs, which contribute to obtaining more accurate soil erosion information (i.e., type, risk). The application of complex network theory also provides novel insights for soil erosion research.

Author Contributions

Conceptualization, Q.Z. and P.T.; methodology, Q.Z. and P.T; software, Q.Z.; validation, P.T.; formal analysis, P.T.; investigation, Q.Z.; resources, Q.Z. and M.Q.; data curation, Q.Z. and P.T.; writing—original draft preparation, Q.Z. and P.T.; writing—review and editing, Q.Z. and P.T.; visualization, M.Q.; supervision, Q.Z.; project administration, Q.Z. and P.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number 41771423).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank the editors and the anonymous reviewers for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Montgomery, D.R. Soil erosion and agricultural sustainability. Proc. Natl. Acad. Sci. USA 2007, 104, 13268–13272. [Google Scholar] [CrossRef] [PubMed]
Zhao, G.; Mu, X.; Wen, Z.; Wang, F.; Gao, P. Soil Erosion, Conservation, and Eco-Environment Changes in the Loess Plateau of China. Land Degrad. Dev. 2013, 24, 499–510. [Google Scholar] [CrossRef]
Montanarella, L.; Pennock, D.J.; McKenzie, N.; Badraoui, M.; Chude, V.; Baptista, I.; Mamo, T.; Yemefack, M.; Singh Aulakh, M.; Yagi, K.; et al. World’s soils are under threat. Soil 2016, 2, 79–82. [Google Scholar] [CrossRef]
Singh, O.; Singh, J. Soil Erosion Susceptibility Assessment of the Lower Himachal Himalayan Watershed. J. Geol. Soc. India 2018, 92, 157–165. [Google Scholar] [CrossRef]
Haregeweyn, N.; Poesen, J.; Deckers, J.; Nyssen, J.; Haile, M.; Govers, G.; Verstraeten, G.; Moeyersons, J. Sediment-bound nutrient export from micro-dam catchments in Northern Ethiopia. Land Degrad. Dev. 2008, 19, 136–152. [Google Scholar] [CrossRef]
Okoba, B.O.; Sterk, G. Catchment-level evaluation of farmers’ estimates of soil erosion and crop yield in the Central Highlands of Kenya. Land Degrad. Dev. 2010, 21, 388–400. [Google Scholar] [CrossRef]
Haregeweyn, N.; Poesen, J.; Nyssen, J.; De Wit, J.; Haile, M.; Govers, G.; Deckers, S. Reservoirs in Tigray (Northern Ethiopia): Characteristics and sediment deposition problems. Land Degrad. Dev. 2006, 17, 211–230. [Google Scholar] [CrossRef]
Lal, R. Soil conservation and ecosystem services. Int. Soil Water Conserv. Res. 2014, 2, 36–47. [Google Scholar] [CrossRef]
Pimentel, D.; Harvey, C.; Resosudarmo, P.; Sinclair, K.; Kurz, D.; McNair, M.; Crist, S.; Shpritz, L.; Fitton, L.; Saffouri, R.; et al. Environmental and economic costs of soil erosion and conservation benefits. Science 1995, 267, 1117–1123. [Google Scholar] [CrossRef]
Elnashar, A.; Zeng, H.; Wu, B.; Fenta, A.A.; Nabil, M.; Duerler, R. Soil erosion assessment in the Blue Nile Basin driven by a novel RUSLE-GEE framework. Sci. Total Environ. 2021, 793, 148466. [Google Scholar] [CrossRef]
Borrelli, P.; Robinson, D.A.; Fleischer, L.R.; Lugato, E.; Ballabio, C.; Alewell, C.; Meusburger, K.; Modugno, S.; Schutt, B.; Ferro, V.; et al. An assessment of the global impact of 21st century land use change on soil erosion. Nat. Commun. 2017, 8, 2013. [Google Scholar] [CrossRef]
Lal, R. Erosion-crop productivity relationships for soils of Africa. Soil Sci. Soc. Am. J. 1995, 59, 661–667. [Google Scholar] [CrossRef]
Liang, Y.; Li, D.C.; Lu, X.X.; Yang, X.; Pan, X.Z.; Mu, H.; Shi, D.M.; Zhang, B. Soil Erosion Changes over the Past Five Decades in the Red Soil Region of Southern China. J. Mt. Sci. 2010, 7, 92–99. [Google Scholar] [CrossRef]
Meten, M.; Bhandary, N.P.; Yatabe, R. GIS-based frequency ratio and logistic regression modelling for landslide susceptibility mapping of Debre Sina area in central Ethiopia. J. Mt. Sci. 2015, 12, 1355–1372. [Google Scholar] [CrossRef]
Zhou, Q.; Chen, N.; Lin, S. FASTNN: A Deep Learning Approach for Traffic Flow Prediction Considering Spatiotemporal Features. Sensors 2022, 22, 6921. [Google Scholar] [CrossRef]
Rehman, Z.u.; Khalid, U.; Ijaz, N.; Mujtaba, H.; Haider, A.; Farooq, K.; Ijaz, Z. Machine learning-based intelligent modeling of hydraulic conductivity of sandy soils considering a wide range of grain sizes. Eng. Geol. 2022, 311, 106899. [Google Scholar] [CrossRef]
Zhou, Q.; Chen, N.; Lin, S. A Poverty Measurement Method Incorporating Spatial Correlation: A Case Study in Yangtze River Economic Belt, China. Isprs Int. J. Geo-Inf. 2022, 11, 50. [Google Scholar] [CrossRef]
Conoscenti, C.; Angileri, S.; Cappadonia, C.; Rotigliano, E.; Agnesi, V.; Märker, M. Gully erosion susceptibility assessment by means of GIS-based logistic regression: A case of Sicily (Italy). Geomorphology 2014, 204, 399–411. [Google Scholar] [CrossRef]
Pourghasemi, H.R.; Kerle, N. Random forests and evidential belief function-based landslide susceptibility assessment in Western Mazandaran Province, Iran. Environ. Earth Sci. 2016, 75, 185. [Google Scholar] [CrossRef]
Ebabu, K.; Tsunekawa, A.; Haregeweyn, N.; Adgo, E.; Meshesha, D.T.; Aklog, D.; Masunaga, T.; Tsubo, M.; Sultan, D.; Fenta, A.A.; et al. Analyzing the variability of sediment yield: A case study from paired watersheds in the Upper Blue Nile basin, Ethiopia. Geomorphology 2018, 303, 446–455. [Google Scholar] [CrossRef]
Fenta, A.A.; Yasuda, H.; Shimizu, K.; Haregeweyn, N.; Negussie, A. Dynamics of Soil Erosion as Influenced by Watershed Management Practices: A Case Study of the Agula Watershed in the Semi-Arid Highlands of Northern Ethiopia. Environ. Manage. 2016, 58, 889–905. [Google Scholar] [CrossRef]
Ganasri, B.P.; Ramesh, H. Assessment of soil erosion by RUSLE model using remote sensing and GIS—A case study of Nethravathi Basin. Geosci. Front. 2016, 7, 953–961. [Google Scholar] [CrossRef]
Haregeweyn, N.; Tsunekawa, A.; Poesen, J.; Tsubo, M.; Meshesha, D.T.; Fenta, A.A.; Nyssen, J.; Adgo, E. Comprehensive assessment of soil erosion risk for better land use planning in river basins: Case study of the Upper Blue Nile River. Sci. Total Environ. 2017, 574, 95–108. [Google Scholar] [CrossRef] [PubMed]
Nyssen, J.; Clymans, W.; Poesen, J.; Vandecasteele, I.; De Baets, S.; Haregeweyn, N.; Naudts, J.; Hadera, A.; Moeyersons, J.; Haile, M.; et al. How soil conservation affects the catchment sediment budget—A comprehensive study in the north Ethiopian highlands. Earth Surf. Process. Landf. 2009, 34, 1216–1233. [Google Scholar] [CrossRef]
Yibeltal, M.; Tsunekawa, A.; Haregeweyn, N.; Adgo, E.; Meshesha, D.T.; Masunaga, T.; Tsubo, M.; Billi, P.; Ebabu, K.; Fenta, A.A.; et al. Morphological characteristics and topographic thresholds of gullies in different agro-ecological environments. Geomorphology 2019, 341, 15–27. [Google Scholar] [CrossRef]
Nosrati, K.; Collins, A.L. A soil quality index for evaluation of degradation under land use and soil erosion categories in a small mountainous catchment, Iran. J. Mt. Sci. 2019, 16, 2577–2590. [Google Scholar] [CrossRef]
Borrelli, P.; Ballabio, C.; Panagos, P.; Montanarella, L. Wind erosion susceptibility of European soils. Geoderma 2014, 232–234, 471–478. [Google Scholar] [CrossRef]
He, J.-J.; Cai, Q.-G.; Cao, W.-Q. Wind tunnel study of multiple factors affecting wind erosion from cropland in agro-pastoral area of Inner Mongolia, China. J. Mt. Sci. 2013, 10, 68–74. [Google Scholar] [CrossRef]
Borrelli, P.; Panagos, P.; Ballabio, C.; Lugato, E.; Weynants, M.; Montanarella, L. Towards a Pan-European Assessment of Land Susceptibility to Wind Erosion. Land Degrad. Dev. 2016, 27, 1093–1105. [Google Scholar] [CrossRef]
Quinton, J.N.; Govers, G.; Van Oost, K.; Bardgett, R.D. The impact of agricultural soil erosion on biogeochemical cycling. Nat. Geosci. 2010, 3, 311–314. [Google Scholar] [CrossRef]
Panagos, P.; Borrelli, P.; Poesen, J.; Ballabio, C.; Lugato, E.; Meusburger, K.; Montanarella, L.; Alewell, C. The new assessment of soil loss by water erosion in Europe. Environ. Sci. Policy 2015, 54, 438–447. [Google Scholar] [CrossRef]
Panagos, P.; Meusburger, K.; Ballabio, C.; Borrelli, P.; Alewell, C. Soil erodibility in Europe: A high-resolution dataset based on LUCAS. Sci. Total Environ. 2014, 479–480, 189–200. [Google Scholar] [CrossRef] [PubMed]
Tamene, L.; Le, Q.B. Estimating soil erosion in sub-Saharan Africa based on landscape similarity mapping and using the revised universal soil loss equation (RUSLE). Nutr. Cycl. Agroecosys. 2015, 102, 17–31. [Google Scholar] [CrossRef]
Pourghasemi, H.R.; Yousefi, S.; Kornejady, A.; Cerda, A. Performance assessment of individual and ensemble data-mining techniques for gully erosion modeling. Sci. Total Environ. 2017, 609, 764–775. [Google Scholar] [CrossRef]
Fenta, A.A.; Tsunekawa, A.; Haregeweyn, N.; Poesen, J.; Tsubo, M.; Borrelli, P.; Panagos, P.; Vanmaercke, M.; Broeckx, J.; Yasuda, H.; et al. Land susceptibility to water and wind erosion risks in the East Africa region. Sci. Total Environ. 2020, 703, 135016. [Google Scholar] [CrossRef] [PubMed]
Jetten, V.; Govers, G.; Hessel, R. Erosion models: Quality of spatial predictions. Hydrol. Process. 2003, 17, 887–900. [Google Scholar] [CrossRef]
Beven, K.; Freer, J. A dynamic TOPMODEL. Hydrol. Process. 2001, 15, 1993–2011. [Google Scholar] [CrossRef]
Singh, R.; Subramanian, K.; Refsgaard, J.C. Hydrological modelling of a small watershed using MIKE SHE for irrigation planning. Agric. Water Manag. 1999, 41, 149–166. [Google Scholar] [CrossRef]
Vázquez, R.F.; Feyen, L.; Feyen, J.; Refsgaard, J.C. Effect of grid size on effective parameters and model performance of the MIKE-SHE code. Hydrol. Process. 2002, 16, 355–372. [Google Scholar] [CrossRef]
Garosi, Y.; Sheklabadi, M.; Pourghasemi, H.R.; Besalatpour, A.A.; Conoscenti, C.; Van Oost, K. Comparison of differences in resolution and sources of controlling factors for gully erosion susceptibility mapping. Geoderma 2018, 330, 65–78. [Google Scholar] [CrossRef]
Takken, I.; Beuselinck, L.; Nachtergaele, J.; Govers, G.; Poesen, J.; Degraer, G. Spatial evaluation of a physically-based distributed erosion model (LISEM). Catena 1999, 37, 431–447. [Google Scholar] [CrossRef]
Arabameri, A.; Pradhan, B.; Rezaei, K. Gully erosion zonation mapping using integrated geographically weighted regression with certainty factor and random forest models in GIS. J. Environ. Manag. 2019, 232, 928–942. [Google Scholar] [CrossRef] [PubMed]
Caratti, J.F.; Nesser, J.A.; Maynard, C.L. Watershed classification using canonical correspondence analysis and clustering techniques: A cautionary note. J. Am. Water Resour. Assoc. 2004, 40, 1257–1268. [Google Scholar] [CrossRef]
de Vente, J.; Poesen, J. Predicting soil erosion and sediment yield at the basin scale: Scale issues and semi-quantitative models. Earth-Sci. Rev. 2005, 71, 95–125. [Google Scholar] [CrossRef]
Cao, M.; Tang, G.A.; Zhang, F.; Yang, J. A cellular automata model for simulating the evolution of positive–negative terrains in a small loess watershed. Int. J. Geogr. Inf. Sci. 2013, 27, 1349–1363. [Google Scholar] [CrossRef]
Zhao, W.F.; Xiong, L.Y.; Ding, H.; Tang, G.A. Automatic recognition of loess landforms using Random Forest method. J. Mt. Sci. 2017, 14, 885–897. [Google Scholar] [CrossRef]
Achten, W.M.J.D.; Mugogo, S.; Kafiriti, E.; Poesen, J.; Deckers, J.; Muys, B. Gully erosion in South Eastern Tanzania: Spatial distribution and topographic thresholds. Z. Geomorphol. 2008, 52, 225–235. [Google Scholar] [CrossRef]
Amiri, M.; Pourghasemi, H.R.; Ghanbarian, G.A.; Afzali, S.F. Assessment of the importance of gully erosion effective factors using Boruta algorithm and its spatial modeling and mapping using three machine learning algorithms. Geoderma 2019, 340, 55–69. [Google Scholar] [CrossRef]
Conforti, M.; Aucelli, P.P.C.; Robustelli, G.; Scarciglia, F. Geomorphology and GIS analysis for mapping gully erosion susceptibility in the Turbolo stream catchment (Northern Calabria, Italy). Nat. Hazards 2010, 56, 881–898. [Google Scholar] [CrossRef]
Lei, X.; Chen, W.; Avand, M.; Janizadeh, S.; Kariminejad, N.; Shahabi, H.; Costache, R.; Shahabi, H.; Shirzadi, A.; Mosavi, A. GIS-Based Machine Learning Algorithms for Gully Erosion Susceptibility Mapping in a Semi-Arid Region of Iran. Remote Sens. 2020, 12, 2478. [Google Scholar] [CrossRef]
Luffman, I.E.; Nandi, A.; Spiegel, T. Gully morphology, hillslope erosion, and precipitation characteristics in the Appalachian Valley and Ridge province, southeastern USA. Catena 2015, 133, 221–232. [Google Scholar] [CrossRef]
Shen, H.-O.; Wen, L.-L.; He, Y.-F.; Hu, W.; Li, H.-L.; Che, X.-C.; Li, X. Rainfall and inflow effects on soil erosion for hillslopes dominated by sheet erosion or rill erosion in the Chinese Mollisol region. J. Mt. Sci. 2018, 15, 2182–2191. [Google Scholar] [CrossRef]
Khalid, U.; Rehman, Z.U.; Mujtaba, H.; Farooq, K. 3D response surface modeling based in-situ assessment of physico-mechanical characteristics of alluvial soils using dynamic cone penetrometer. Transp. Geotech. 2022, 36, 100781. [Google Scholar] [CrossRef]
Li, Z.; Zhang, Y.; Zhu, Q.; Yang, S.; Li, H.; Ma, H. A gully erosion assessment model for the Chinese Loess Plateau based on changes in gully length and area. Catena 2017, 148, 195–203. [Google Scholar] [CrossRef]
Mates, W.C.; Frazzon, E.M.; Hartmann, J.; Mayerle, S.F. A graph model for the integrated scheduling of intermodal transport operations in global supply chains. In Dynamics in Logistics; Springer: Berlin/Heidelberg, Germany, 2013; pp. 301–311. [Google Scholar]
Scholz-Reiter, B.; Hartmann, J.; Makuschewitz, T.; Frazzon, E.M. A generic approach for the graph-based integrated production and intermodal transport scheduling with capacity restrictions. In Proceedings of the 46th CIRP Conference on Manufacturing Systems (CIRP CMS), Setubal, Portugal, 29–30 May 2013; pp. 109–114. [Google Scholar]
Abe, S.; Suzuki, N. Complex-network description of seismicity. Nonlinear Process. Geophys. 2006, 13, 145–150. [Google Scholar] [CrossRef]
Lin, J.; Ban, Y. Complex Network Topology of Transportation Systems. Transp. Rev. 2013, 33, 658–685. [Google Scholar] [CrossRef]
Lin, S.W.; Chen, N.; He, Z.W. Automatic Landform Recognition from the Perspective of Watershed Spatial Structure Based on Digital Elevation Models. Remote Sens. 2021, 13, 3926. [Google Scholar] [CrossRef]
Poulter, B.; Goodall, J.L.; Halpin, P.N. Applications of network analysis for adaptive management of artificial drainage systems in landscapes vulnerable to sea level rise. J. Hydrol. 2008, 357, 207–217. [Google Scholar] [CrossRef]
Dang, Y.; Ren, W.; Tao, B.; Chen, G.; Lu, C.; Yang, J.; Pan, S.; Wang, G.; Li, S.; Tian, H. Climate and land use controls on soil organic carbon in the loess plateau region of China. PLoS ONE 2014, 9, e95548. [Google Scholar] [CrossRef]
Dewitte, O.; Daoudi, M.; Bosco, C.; Van Den Eeckhaut, M. Predicting the susceptibility to gully initiation in data-poor regions. Geomorphology 2015, 228, 101–115. [Google Scholar] [CrossRef]
Arabameri, A.; Pradhan, B.; Rezaei, K.; Yamani, M.; Pourghasemi, H.R.; Lombardo, L. Spatial modelling of gully erosion using evidential belief function, logistic regression, and a new ensemble of evidential belief function-logistic regression algorithm. Land Degrad. Dev. 2018, 29, 4035–4049. [Google Scholar] [CrossRef]

Figure 1. Spatial location and sample distribution of the Loess Plateau. (Note: the site with the stark colorful mark is a typical sample site, which obtained the soil erosion information for the SEIM in advance, and the test site with the black dot mark is the atypical sample site selected at random, which also obtained the information from SEIM).

Figure 2. Flowchart employed in this paper.

Figure 3. Mean change point difference of grid threshold.

Figure 4. Gully lines under different thresholds. (Note: the oversize threshold is 1000, the undersize threshold is 50, and the optimal threshold is 250.) (a) Oversize threshold. (b) Optimal threshold. (c) Undersize threshold.

Figure 5. Flow chart of DWCN construction.

Figure 6. Confusion matrix of identification results under different model. (Note: A, B and C represent water erosion, wind erosion and freeze-thaw soil erosion type; 1, 2 and 3 represent low, medium and high soil erosion risk. For example, A1 represents low-risk water erosion). (a) RF. (b) LGBM. (c) ANN. (d) XGBoost.

Figure 7. Comparison of identification performance of different factors in typical sites.

Figure 8. Comparison of identification results of different factors in atypical sites. (Note: Type misidentified was the wrong type identification but correct risk identification; Risk misidentified is reverse of the Type misidentified; All misidentified was the wrong results for both type and risk; Correctly identified is reverse of All misidentified). (a) The result of the combination of two factors dataset. (b) The result of SEEFs. (c) The result of DWCNFs.

Figure 9. Importance ranking of erosion factors.

Figure 10. The relationship between nodes.

Figure 11. Frequency distribution of erosion factors. (a) Edge density. (b) Assortativity coefficient.

Table 1. Soil erosion typical sample.

Erosion Type	Erosion Risk Level	Sample Size
Water erosion	Low	140
	Medium	140
	High	140
Wind erosion	Low	140
	Medium	140
	High	140
Freeze-thaw erosion	Low	80
	Medium	80
	High	80

Table 2. Data description and source.

Data	Description	Source
DEM	From Shuttle Radar Topographic Mission	Geospatial Data Cloud (http://www.gscloud.cn/ (accessed on 18 June 2018))
Land use type	Including six land use types (e.g., arable land, grassland, and woodland)	Geospatial Information Monitoring Cloud Platform (http://www.dsac.cn/ (accessed on 1 July 2021))
NDVI	Annual average normalized difference vegetation index	National Earth system science Datacenter (http://www.geodata.cn/ (accessed on 1 July 2021))
Clay content	Percentage of clay in the soil	Resource and environment science and data center (https://www.resdc.cn/ (accessed on 1 July 2021))
Silt content	Percentage of silt in the soil	Resource and environment science and data center (https://www.resdc.cn/ (accessed on 1 July 2021))
Sand content	Percentage of sand in the soil	Resource and environment science and data center (https://www.resdc.cn/ (accessed on 1 July 2021))
Soil type	Including 12 soil types (e.g., leaching soil, semi-leaching soil, and arid soil)	National Earth system science Datacenter (http://www.geodata.cn/ (accessed on 1 July 2021))
Annual mean rainfall	From daily observation data of the Meteorological Observatory	National Meteorological Information Center (http://data.cma.cn/ (accessed on 12 November 2020))

Table 3. Factors multi-collinearity and decision.

Factors	Type	TOL	VIF	Decision
Edge density	DWCNF	0.350	2.855	Confirmed
Structural entropy	DWCNF	0.250	4.006	Confirmed
Degree centrality	DWCNF	0.252	3.961	Confirmed
Betweenness centrality	DWCNF	0.432	2.314	Confirmed
Assortativity coefficient	DWCNF	0.780	1.282	Confirmed
Average neighbor degree	DWCNF	0.343	2.916	Confirmed
Aspect	SEEF	0.467	2.141	Confirmed
Slope	SEEF	0.835	1.198	Confirmed
Surface roughness	SEEF	0.173	5.770	Confirmed
Plan curvature	SEEF	0.973	1.028	Confirmed
Profile curvature	SEEF	0.718	1.392	Confirmed
Annual mean rainfall	SEEF	0.161	6.212	Confirmed
Land use type	SEEF	0.485	2.061	Confirmed
NDVI	SEEF	0.361	2.773	Confirmed
Clay content	SEEF	0.210	4.762	Confirmed
Silt content	SEEF	0.198	5.038	Confirmed
Sand content	SEEF	0.234	4.531	Confirmed
Soil type	SEEF	0.669	1.494	Confirmed
Surface cut depth	SEEF	0.020	49.331	Rejected
Edge betweenness	DWCNF	0.057	17.978	Rejected
Node density	DWCNF	0.007	13.961	Rejected
Closeness centrality	DWCNF	0.093	10.781	Rejected
The standard deviation of elevation	SEEF	0.020	49.311	Rejected

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Soil Erosion Type and Risk Identification from the Perspective of Directed Weighted Complex Network

Abstract

1. Introduction

2. Materials and Methodology

2.1. Materials

2.1.1. Study Area

2.1.2. Soil Erosion Type Mapping

2.1.3. Soil Erosion Effective Factors

2.2. Methodology

2.2.1. Watershed Polygon Extraction

2.2.2. DWCNFs Calculation

Delineation of Gullies in Each Watershed

DWCN Construction

DWCNF Calculation

2.2.3. Soil Erosion Type and Risk Identification Based on SEEFs and DWCNFs

Multicollinearity Diagnostics

Machine Learning Method

Evaluation Metrics

3. Results

3.1. Optimal Erosion Factors Determination

3.2. Optimal Machine Learning Method Determination

3.3. Comparison of Identification Performance

3.3.1. Comparison of Various Evaluation Metrics on Typical Samples

3.3.2. Comparison of Various Evaluation Metrics on Atypical Samples

3.4. Importance of Assessment of Erosion Factors

4. Discussion

4.1. Contribution and Availability of DWCN

4.2. Importance of DWCNF

4.3. Limitations

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics