Figure 1.
Location of the study area. The blue boundary delineates Heilongjiang Province; the red boundary delineates the study area.
Figure 1.
Location of the study area. The blue boundary delineates Heilongjiang Province; the red boundary delineates the study area.
Figure 2.
Typical gully erosion features observed during the Xunke field campaign. Panels (a–c) show untreated, naturally evolving gullies: (a) linear gully distribution in a transitional hillslope–cropland zone; (b) vegetation conditions surrounding an active gully; (c) a gully formed after a rainstorm event. Panel (d) shows a rehabilitated gully on cultivated land where engineering check-dam infill and surface stabilization have been applied.
Figure 2.
Typical gully erosion features observed during the Xunke field campaign. Panels (a–c) show untreated, naturally evolving gullies: (a) linear gully distribution in a transitional hillslope–cropland zone; (b) vegetation conditions surrounding an active gully; (c) a gully formed after a rainstorm event. Panel (d) shows a rehabilitated gully on cultivated land where engineering check-dam infill and surface stabilization have been applied.
Figure 3.
Spatial distributions of the 16 environmental factors used for gully erosion susceptibility modeling. (a–d) Topographic factors: elevation, slope, aspect, and relief; (e–g) Topographic indices: TWI, TPI, and curvature; (h,i) Climatic factors: mean annual temperature and annual precipitation; (j–l) Vegetation, land cover, and soil: NDVI, LULC, and soil type; (m–p) Anthropogenic and proximity factors: Human Footprint Index, and distances to buildings, roads, and streams.
Figure 3.
Spatial distributions of the 16 environmental factors used for gully erosion susceptibility modeling. (a–d) Topographic factors: elevation, slope, aspect, and relief; (e–g) Topographic indices: TWI, TPI, and curvature; (h,i) Climatic factors: mean annual temperature and annual precipitation; (j–l) Vegetation, land cover, and soil: NDVI, LULC, and soil type; (m–p) Anthropogenic and proximity factors: Human Footprint Index, and distances to buildings, roads, and streams.
Figure 4.
Methodological flowchart of the gully erosion susceptibility assessment framework.
Figure 4.
Methodological flowchart of the gully erosion susceptibility assessment framework.
Figure 5.
Spatial distribution of the centroids of the 4020 mapped gully polygons across the six administrative districts of the Heihe region. Each red dot represents the centroid of one gully polygon. Gully clusters are concentrated in the central and southwestern agricultural plains.
Figure 5.
Spatial distribution of the centroids of the 4020 mapped gully polygons across the six administrative districts of the Heihe region. Each red dot represents the centroid of one gully polygon. Gully clusters are concentrated in the central and southwestern agricultural plains.
Figure 6.
Comparison of performance metrics (Accuracy, Kappa, AUC, Precision, Recall, and F1-Score) across RF, XGBoost, and GBM.
Figure 6.
Comparison of performance metrics (Accuracy, Kappa, AUC, Precision, Recall, and F1-Score) across RF, XGBoost, and GBM.
Figure 7.
ROC curves for all four models including the LR baseline. The dashed diagonal line represents the random classifier baseline (AUC = 0.5). All tree-based models achieve AUC > 0.93, while LR reaches 0.85.
Figure 7.
ROC curves for all four models including the LR baseline. The dashed diagonal line represents the random classifier baseline (AUC = 0.5). All tree-based models achieve AUC > 0.93, while LR reaches 0.85.
Figure 8.
Normalized variable importance ranking across the three models (RF, XGBoost, and GBM).
Figure 8.
Normalized variable importance ranking across the three models (RF, XGBoost, and GBM).
Figure 9.
SHAP analysis for the XGBoost model: global feature importance ranking and SHAP summary plot showing the direction and magnitude of each feature’s contribution to susceptibility predictions.
Figure 9.
SHAP analysis for the XGBoost model: global feature importance ranking and SHAP summary plot showing the direction and magnitude of each feature’s contribution to susceptibility predictions.
Figure 10.
Gully erosion susceptibility map produced by RF, classified into five levels using Jenks natural breaks.
Figure 10.
Gully erosion susceptibility map produced by RF, classified into five levels using Jenks natural breaks.
Figure 11.
Gully erosion susceptibility map produced by XGBoost, classified into five levels using Jenks natural breaks.
Figure 11.
Gully erosion susceptibility map produced by XGBoost, classified into five levels using Jenks natural breaks.
Figure 12.
Gully erosion susceptibility map produced by GBM, classified into five levels using Jenks natural breaks.
Figure 12.
Gully erosion susceptibility map produced by GBM, classified into five levels using Jenks natural breaks.
Figure 13.
Model uncertainty map showing the spatial distribution of pixel-level standard deviation of susceptibility predictions across the three models.
Figure 13.
Model uncertainty map showing the spatial distribution of pixel-level standard deviation of susceptibility predictions across the three models.
Figure 14.
District-level analysis: (a) spatial distribution of priority treatment zones; (b) proportion of high-susceptibility areas (Jenks classes 4–5) by district across the three models.
Figure 14.
District-level analysis: (a) spatial distribution of priority treatment zones; (b) proportion of high-susceptibility areas (Jenks classes 4–5) by district across the three models.
Figure 15.
Comparison of district-level gully erosion metrics: (a) gully erosion ratio by district; (b) multi-model susceptibility comparison across administrative districts.
Figure 15.
Comparison of district-level gully erosion metrics: (a) gully erosion ratio by district; (b) multi-model susceptibility comparison across administrative districts.
Figure 16.
High-susceptibility area (km2) under baseline and two NDVI enhancement scenarios (+10% and +20%) for the three ensemble models.
Figure 16.
High-susceptibility area (km2) under baseline and two NDVI enhancement scenarios (+10% and +20%) for the three ensemble models.
Figure 17.
Spatial distribution of susceptibility change (Susceptibility) under the NDVI +10% (a) and +20% (b) scenarios (ensemble mean). Green: reduced susceptibility; red: increased susceptibility relative to baseline.
Figure 17.
Spatial distribution of susceptibility change (Susceptibility) under the NDVI +10% (a) and +20% (b) scenarios (ensemble mean). Green: reduced susceptibility; red: increased susceptibility relative to baseline.
Table 1.
Satellite data used in this study.
Table 1.
Satellite data used in this study.
| Satellite | Sensor | Spatial Resolution | Acquisition Period | Bands Used |
|---|
| Sentinel-2 | MSI | 10/20/60 m | May–October 2023 | B2, B3, B4, B8, B11 |
| Gaofen-1 | PMS/WFV | 2/8/16 m | 2023 | Pan, R, G, B, NIR |
| Gaofen-2 | PMS | 0.8/4 m | 2023 | Pan, R, G, B, NIR |
Table 2.
Environmental factors used for gully erosion susceptibility modeling.
Table 2.
Environmental factors used for gully erosion susceptibility modeling.
| Category | Factor | Data Source | Resolution |
|---|
| Topography | DEM, Slope, Aspect, Relief, TPI, TWI, Curvature | ALOS DEM | 12.5 m |
| Climate | Mean Annual Temperature (MAT) | ERA5-Land | ∼10 km |
| Annual Precipitation | TerraClimate | ∼4 km |
| Vegetation & land cover | NDVI | Landsat surface reflectance | 30 m |
| LULC | Project-compiled from Sentinel-2 | 25 m |
| Soil | Soil type | HWSD2 | 1 km |
| Anthropogenic & proximity | Human Footprint Index (HFI) | Mu et al. (global annual HFI, Figshare) | 1 km |
| Distance to buildings, roads, streams | National geographic databases | – |
Table 3.
Random-search ranges and best hyperparameter configuration per model (50 candidates per model, 5-fold CV-AUC objective on the 70% training set; final test AUC reported on the held-out 30% test set with the best configuration re-fitted on the full training set).
Table 3.
Random-search ranges and best hyperparameter configuration per model (50 candidates per model, 5-fold CV-AUC objective on the 70% training set; final test AUC reported on the held-out 30% test set with the best configuration re-fitted on the full training set).
| Model | Hyperparameter | Search Range | Best Value |
|---|
| RF | num.trees | | 1000 |
| mtry | | 4 |
| min.node.size | | 3 |
| XGBoost | (learning rate) | | 0.076 |
| max_depth | | 7 |
| subsample | | 0.76 |
| colsample_bytree | | 0.91 |
| min_child_weight | | 1 |
| gamma () | | 0.12 |
| nrounds (early stop, patience = 30) | up to 500 | 429 |
| GBM | n.trees | | 1000 |
| interaction.depth | | 5 |
| shrinkage | | 0.092 |
| bag.fraction | | 0.79 |
| Best-config performance summary | CV-AUC (mean ± SD) | Test AUC |
| RF (best config) | | 0.95 |
| XGBoost (best config) | | 0.95 |
| GBM (best config) | | 0.94 |
Table 4.
Variance inflation factor (VIF) values for the 16 environmental factors.
Table 4.
Variance inflation factor (VIF) values for the 16 environmental factors.
| Variable | VIF | Variable | VIF |
|---|
| TWI | 8.406 | D_Building | 1.443 |
| Slope | 8.039 | Curvature | 1.320 |
| DEM | 3.942 | Soil Types | 1.274 |
| MAT | 2.188 | D_Stream | 1.272 |
| Relief | 2.023 | NDVI | 1.171 |
| HFI | 1.673 | D_Road | 1.124 |
| TPI | 1.601 | LULC | 1.096 |
| Aspect | 1.045 | Precipitation | 1.009 |
Table 5.
Gully count and density by administrative unit in the Heihe region.
Table 5.
Gully count and density by administrative unit in the Heihe region.
| Administrative Unit | Area (km2) | Gully Count | Density (/km2) | Proportion (%) |
|---|
| Nenjiang City | 15,217 | 1044 | 0.069 | 25.9 |
| Xunke County | 10,685 | 1025 | 0.096 | 25.4 |
| Wudalianchi City | 15,086 | 781 | 0.052 | 19.4 |
| Beian City | 7190 | 594 | 0.083 | 14.7 |
| Sunwu County | 4775 | 500 | 0.105 | 12.4 |
| Aihui District | 13,911 | 76 | 0.005 | 1.9 |
| Total | 66,865 | 4020 | 0.060 | 100.0 |
Table 6.
Per-LULC-class gully frequency ratio (FR) on the analysis grid (CLUD-style class labels). : class is overrepresented among gully cells relative to its areal share; : underrepresented.
Table 6.
Per-LULC-class gully frequency ratio (FR) on the analysis grid (CLUD-style class labels). : class is overrepresented among gully cells relative to its areal share; : underrepresented.
| Rank | LULC Class (Code) | Area Cells | Area (%) | Gully Cells | Gully (%) | FR |
|---|
| 1 | Sparse Woodland (5) | 40,418,581 | 37.71 | 105,842 | 93.91 | 2.490 |
| 2 | Moderate Grassland (8) | 50,394 | 0.05 | 61 | 0.05 | 1.151 |
| 3 | Water Body (11) | 10,094,352 | 9.42 | 5170 | 4.59 | 0.487 |
| 4 | Paddy Field (1) | 1,177,779 | 1.10 | 87 | 0.08 | 0.070 |
| 5 | Dryland Cropland (2) | 54,153,565 | 50.53 | 1513 | 1.34 | 0.027 |
| 6 | Dense Grassland (7) | 1,173,430 | 1.09 | 29 | 0.03 | 0.023 |
| 7 | Shrubland (4) | 109,044 | 0.10 | 0 | 0.00 | 0.000 |
| 8 | Sparse Grassland (9) | 1610 | 0.00 | 0 | 0.00 | 0.000 |
| Total | | 107,178,755 | 100.00 | 112,702 | 100.00 | — |
Table 7.
Slope-gradient class gully frequency ratio (FR). Slope bins follow geomorphic interpretation: <2° near-flat, 2–5° gentle, 5–10° moderate, 10–20° moderately steep, >20° steep.
Table 7.
Slope-gradient class gully frequency ratio (FR). Slope bins follow geomorphic interpretation: <2° near-flat, 2–5° gentle, 5–10° moderate, 10–20° moderately steep, >20° steep.
| Rank | Slope Class | Area Cells | Area (%) | Gully Cells | Gully (%) | FR |
|---|
| 1 | 0–2° | 37,012,470 | 34.53 | 58,411 | 51.83 | 1.501 |
| 2 | 2–5° | 45,260,313 | 42.23 | 48,858 | 43.35 | 1.027 |
| 3 | 5–10° | 19,157,358 | 17.87 | 5175 | 4.59 | 0.257 |
| 4 | 10–20° | 5,273,552 | 4.92 | 253 | 0.22 | 0.046 |
| 5 | >20° | 475,062 | 0.44 | 5 | 0.00 | 0.010 |
| Total | | 107,178,755 | 100.00 | 112,702 | 100.00 | — |
Table 8.
Performance comparison of the four susceptibility models on the independent test set.
Table 8.
Performance comparison of the four susceptibility models on the independent test set.
| Model | AUC | Accuracy | Kappa | Precision | Sensitivity | Specificity | F1 (Gully) |
|---|
| LR | 0.85 | 0.76 | 0.45 | 0.64 | 0.60 | 0.84 | 0.62 |
| RF | 0.95 | 0.88 | 0.72 | 0.79 | 0.83 | 0.90 | 0.81 |
| XGBoost | 0.95 | 0.89 | 0.74 | 0.81 | 0.84 | 0.91 | 0.82 |
| GBM | 0.94 | 0.86 | 0.68 | 0.78 | 0.79 | 0.89 | 0.79 |
Table 9.
Leave-one-district-out spatial cross-validation results (mean ± SD across 6 folds).
Table 9.
Leave-one-district-out spatial cross-validation results (mean ± SD across 6 folds).
| Model | AUC | Accuracy | Kappa | F1 (Gully) |
|---|
| LR | 0.80 ± 0.07 | 0.74 ± 0.10 | 0.31 ± 0.16 | 0.50 ± 0.21 |
| RF | 0.84 ± 0.05 | 0.76 ± 0.12 | 0.29 ± 0.22 | 0.43 ± 0.26 |
| XGBoost | 0.84 ± 0.05 | 0.75 ± 0.13 | 0.29 ± 0.21 | 0.44 ± 0.25 |
| GBM | 0.84 ± 0.06 | 0.76 ± 0.12 | 0.34 ± 0.23 | 0.50 ± 0.28 |
Table 10.
Jenks natural break thresholds for the three susceptibility models.
Table 10.
Jenks natural break thresholds for the three susceptibility models.
| Class Boundary | RF | XGBoost | GBM |
|---|
| Very Low/Low | 0.085 | 0.090 | 0.093 |
| Low/Moderate | 0.242 | 0.272 | 0.260 |
| Moderate/High | 0.419 | 0.488 | 0.451 |
| High/Very High | 0.607 | 0.708 | 0.658 |
Table 11.
Susceptibility class area statistics for the three models.
Table 11.
Susceptibility class area statistics for the three models.
| Susceptibility | RF | XGBoost | GBM |
|---|
| km2 | % | km2 | % | km2 | % |
|---|
| Very Low | 37,772 | 58.30 | 46,668 | 72.03 | 41,948 | 64.75 |
| Low | 9664 | 14.92 | 6555 | 10.12 | 7821 | 12.07 |
| Moderate | 6923 | 10.69 | 4766 | 7.36 | 5827 | 8.99 |
| High | 6408 | 9.89 | 3791 | 5.85 | 5306 | 8.19 |
| Very High | 4020 | 6.21 | 3007 | 4.64 | 3886 | 6.00 |
Table 12.
Frequency ratio (FR) validation: ratio of observed gully proportion to area proportion within each susceptibility class. FR > 1 indicates gully overrepresentation.
Table 12.
Frequency ratio (FR) validation: ratio of observed gully proportion to area proportion within each susceptibility class. FR > 1 indicates gully overrepresentation.
| Susceptibility Class | FR (RF) | FR (XGBoost) | FR (GBM) |
|---|
| Very Low | 0.01 | 0.02 | 0.02 |
| Low | 0.13 | 0.37 | 0.34 |
| Moderate | 0.60 | 1.15 | 0.98 |
| High | 1.97 | 3.23 | 2.84 |
| Very High | 11.57 | 14.55 | 10.48 |
Table 13.
Gully erosion and susceptibility statistics by administrative district.
Table 13.
Gully erosion and susceptibility statistics by administrative district.
| District | Area (km2) | Gullies | Density (/km2) | RF High (%) | XGB High (%) | GBM High (%) |
|---|
| Aihui | 13,911 | 76 | 0.005 | 1.1 | 0.6 | 1.0 |
| Xunke | 10,685 | 1025 | 0.096 | 13.2 | 9.9 | 11.9 |
| Sunwu | 4775 | 500 | 0.105 | 19.5 | 14.7 | 17.3 |
| Beian | 7190 | 594 | 0.083 | 33.4 | 21.3 | 29.0 |
| Wudalianchi | 15,086 | 781 | 0.052 | 17.1 | 11.4 | 16.6 |
| Nenjiang | 15,217 | 1044 | 0.069 | 21.3 | 12.3 | 17.2 |
Table 14.
Partial Spearman correlation and four-group variance partitioning for the gully indicator (joint training/test sample, ). Top panel: partial vs. marginal Spearman correlation of HFI with the binary gully indicator. Bottom panel: adjusted fractions from a four-group variance partitioning (groups: anth = HFI, LULC, D_Building, D_Road; topo = DEM, Slope, Aspect, Curvature, Relief, TWI, TPI; clim = MAT, Total PPT; sv = Soil Types, NDVI, D_Stream). Only unique fractions and the four-way shared fraction are shown; pairwise and three-way shared fractions (summing to 0.132) are omitted for brevity.
Table 14.
Partial Spearman correlation and four-group variance partitioning for the gully indicator (joint training/test sample, ). Top panel: partial vs. marginal Spearman correlation of HFI with the binary gully indicator. Bottom panel: adjusted fractions from a four-group variance partitioning (groups: anth = HFI, LULC, D_Building, D_Road; topo = DEM, Slope, Aspect, Curvature, Relief, TWI, TPI; clim = MAT, Total PPT; sv = Soil Types, NDVI, D_Stream). Only unique fractions and the four-way shared fraction are shown; pairwise and three-way shared fractions (summing to 0.132) are omitted for brevity.
| Top: Spearman correlation of HFI with gully indicator |
| Metric | | p-value | Controlled for |
| Partial Spearman | 0.205 | | DEM, Slope, MAT, Total PPT |
| Marginal Spearman | 0.467 | ≈0 | — |
| Bottom: 4-group variance partitioning (adjusted ) |
| Fraction | Adj. | Interpretation |
| [a] anthropogenic unique | 0.030 | HFI/LULC/D_Build/D_Road only |
| [d] soil–vegetation unique | 0.017 | Soil/NDVI/D_Stream only |
| [b] topographic unique | 0.015 | DEM–TPI block only |
| [c] climatic unique | 0.012 | MAT/Total PPT only |
| [o] shared across all four | 0.080 | co-variation among all four groups |
| Total explained (4 groups) | 0.286 | joint adjusted |
| Residual | 0.714 | unexplained |