Next Article in Journal
The Quantification of Carbon Footprints in the Agri-Food Sector and Future Trends for Carbon Sequestration: A Systematic Literature Review
Previous Article in Journal
Analysis of Key Injury-Causing Factors of Object Strike Incident in Construction Industry Based on Data Mining Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Interpretation and Spatiotemporal Analysis of Terraces in the Yellow River Basin Based on Machine Learning

College of Forestry and Prataculture, Ningxia University, Yinchuan 750021, China
*
Author to whom correspondence should be addressed.
Sustainability 2023, 15(21), 15607; https://doi.org/10.3390/su152115607
Submission received: 13 September 2023 / Revised: 24 October 2023 / Accepted: 1 November 2023 / Published: 3 November 2023

Abstract

:
The Yellow River Basin (YRB) is a crucial ecological zone and an environmentally vulnerable region in China. Understanding the temporal and spatial trends of terraced-field areas (TRAs) and the factors underlying them in the YRB is essential for improving land use, conserving water resources, promoting biodiversity, and preserving cultural heritage. In this study, we employed machine learning on the Google Earth Engine (GEE) platform to obtain spatial distribution images of TRAs from 1990 to 2020 using Landsat 5 (1990–2010) and Landsat 8 (2015–2020) remote sensing data. The GeoDa software (software version number is 1.20.0.) platform was used for spatial autocorrelation analysis, revealing distinct spatial clustering patterns. Mixed linear and random forest models were constructed to identify the driving force factors behind TRA changes. The research findings reveal that TRAs were primarily concentrated in the upper and middle reaches of the YRB, encompassing provinces such as Shaanxi, Shanxi, Qinghai, and Gansu, with areas exceeding 40,000 km2, whereas other provinces had TRAs of less than 30,000 km2 in total. The TRAs exhibited a relatively stable trend, with provinces such as Gansu, Qinghai, and Shaanxi showing an overall upward trajectory. Conversely, Shanxi and Inner Mongolia demonstrated an overall declining trend. When compared with other provinces, the variations in TRAs in Ningxia, Shandong, Sichuan, and Henan appeared to be more stable. The linear mixed model (LMM) revealed that farmland, shrubs, and grassland had significant positive effects on the TRAs, explaining 41.6% of the variance. The random forest model also indicated positive effects for these factors, with high R2 values of 0.984 and 0.864 for the training and testing sets, respectively, thus outperforming the LMM. The findings of this study can contribute to the restoration of the YRB’s ecosystem and support sustainable development. The insights gained will be valuable for policymaking and decision support in soil and water conservation, agricultural planning, and environmental protection in the region.

1. Introduction

Terraced fields are one of the important soil and water conservation measures and play an important role in agricultural production, water resource management, ecological balance, etc. Particularly in arid areas, they can modify the terrain and improve soil conditions, making more efficient use of rainfall and protecting water and soil [1,2], optimizing land use [3,4], increasing crop yields, and promoting sustainability [5]. In addition, terraces also serve as a diverse habitat for organisms [6] and are a cultural and historical heritage [7,8]. Studying the spatiotemporal changes in terraced-field areas (TRAs) and the factors driving these changes allows us to gain a better understanding and manage water resources, reduce soil erosion, and protect water quality, thereby promoting the sustainable development of terraces [9].
In this regard, researchers such as Austin et al. [10] have utilized airborne optical detection and ranging (LiDAR) technology to acquire high-resolution digital elevation models (DEMs) for TRAs. Through an in-depth analysis of terrain indices, including slope, aspect, and curvature, along with other land features, they have inferred and interpreted the spatial distribution and morphological characteristics of terraced fields. Huang et al. [11,12] extracted and analyzed long-term series of changes in vegetation and surface water body area trends from Landsat data. Hu et al. [13] utilized remote sensing data and satellite images to extract land use in the Amu Darya River basin. Furthermore, several scholars have extensively employed time-series remote sensing data to investigate the temporal and spatial trends in TRAs as well as their driving force factors. In a study by Tian et al. [1], the distribution of terraced fields was identified using Google Earth imagery, and the effect of vegetation restoration and terraced fields on soil erosion was estimated using the universal soil loss equation (USLE) model. By analyzing remote sensing imagery from 2000 to 2018, their research revealed the significant role of terraced fields in controlling soil erosion and emphasized the impact of vegetation changes on soil erosion. Yu et al. [14] studied terraced fields for mountainous rice cultivation and discovered that forests played a crucial role in shaping terraced rice fields.
Although the existing methods for terraced-field interpretation have reached a relatively mature stage, they often come with inherent limitations. Many of these methods require extensive image data downloads, involve complex processing procedures, demand high computational resources, and are time-consuming. Addressing these challenges through the utilization of machine learning on the Google Earth Engine (GEE) platform can facilitate efficient terraced-field interpretation while maintaining an acceptable level of accuracy. Furthermore, significant research gaps exist in the study of the temporal–spatial patterns and influencing factors of TRAs in the Yellow River Basin (YRB). Consequently, there is an urgent need to delve deeper into investigating the spatial distribution, temporal changes, and factors driving change in terraced fields in this region.
Despite the challenges associated with extensive image data acquisition, intricate processing procedures, elevated computational resource requirements, and protracted time investment, this study aimed to conduct a comprehensive exploration of TRAs within the YRB. This investigation encompassed their spatial distribution, temporal trends, and the factors driving these changes and aimed at bridging existing research gaps. By deploying an array of strategies, we harnessed the capabilities of the GEE remote sensing cloud platform and implemented inventive remote-sensing land cover and land use classification data [15,16,17] to develop meticulous interpretational methodologies. These methodologies integrated digital elevation models (DEMs) [18] for the categorization of terraced fields. This study aims to unveil the spatial distribution patterns, spatiotemporal change trends, and driving forces influencing the changes in TRAs within the YRB. Its innovation lies in the fusion of remote sensing data and terrain indices, which is facilitated by the utilization of the GEE platform coupled with modifications to models and parameters to enhance the precision and efficiency of TRA interpretation. Our research findings contribute to a deeper understanding of the alterations in terraced-field patterns, facilitate the formulation of scientifically robust strategies for soil and water conservation, encourage sustainable agricultural progress, advance environmental safeguarding, and also provide constructive recommendations for sustainable development, ecological protection, and restoration in the YRB.

2. Study Area

The Yellow River, the second longest river in China, originates from the Bayan Har Mountains and flows into the Bohai Sea. It traverses provinces including Qinghai, Sichuan, Gansu, Ningxia, Inner Mongolia, Shaanxi, Shanxi, Henan, and Shandong (Figure 1a,b). The YRB spans between 95°53′ E and 119°12′ E longitude and between 32°9′ N and 41°50′ N latitude, covering an area of 7.5 × 105 km2 [19]. This region encompasses four distinct geographical units from west to east: the Qinghai–Tibet Plateau, the Inner Mongolian Plateau, the Loess Plateau, and the Huang–Huai–Hai Plain [20]. The majority of the YRB experiences a semi-arid climate characterized by limited natural water resources and an annual average precipitation of less than 450 mm. The terrain across the entire YRB varies, with higher elevations in the west and lower elevations in the east, ranging from −3 m to 5939 m above sea level (Figure 1a). The YRB belongs to a warm temperate continental monsoon climate. The distribution of vegetation in the YRB follows a trend similar to that of precipitation, transitioning from sparse shrub grassland to grassland, broadleaf forest, and crops as one moves from west to east. The area is known for its diverse soil types, including meadow soil, chestnut calcareous soil, yellow loam, and brown soil. The annual average temperature ranges from −3.5 °C to 15 °C [21,22]. The land use in the YRB includes grassland, farmland, woodland, barren, and sparsely vegetated areas (Figure 1c). However, the YRB has become one of China’s most ecologically fragile regions due to excessive water resource development and escalating environmental issues [23].

3. Materials and Methods

3.1. Data

3.1.1. Vegetation Index

The normalized difference vegetation index (NDVI) data employed in this research were computed using remote sensing data from Landsat 5 (operated from 1984 to 2013) for the years 1990–2010 and Landsat 8 (operating since 2013) for the years 2015–2020, in both cases obtained via the GEE platform (https://earthengine.google.com/, accessed on 5 June 2022).

3.1.2. Datasets of Driving Force Factors

The temperature data for this study were obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) Fifth Generation Reanalysis (ERA5) monthly datasets (https://www.ecmwf.int/, accessed on 5 August 2022). ERA5 is a global atmospheric reanalysis that provides hourly estimates of atmospheric variables, including temperature, from 1979 to the present, at an approximate spatial resolution of 31 km.
Precipitation data were sourced from the Climate Hazards Group InfraRed Precipitation with Station (CHIRPS) daily dataset from the University of California, Santa Barbara, Climate Hazards Group (UCSB-CHG) (https://www.chc.ucsb.edu/data/chirps/, accessed on 5 August 2022). CHIRPS is a high-resolution, quasi-global precipitation dataset that merges satellite and station observations and has been delivering daily estimates of precipitation since 1981 at an approximate spatial resolution of 0.05 degrees (around 5 km).
Nighttime light data were obtained from the National Qinghai–Tibet Plateau Data Center (http://data.tpdc.ac.cn, accessed on 25 June 2022). The data, with a spatial resolution of approximately 100 m–1 km, were produced using a nighttime light convolutional long short-term memory (NTLSTM) network and were applied to generate the world’s first artificial nighttime light dataset (PANDA) for China, spanning the years 1984–2020.
The global population dataset World Pop (https://www.worldpop.org, accessed on 27 June 2022), with a spatial resolution of 100 m, served as the source of national population data. This dataset, available on the GEE platform, is provided by the World Pop project, an open-source initiative offering global population distribution data.

3.1.3. Land Types

The land use data utilized in this study came from the China Land Cover Dataset (CLCD) (https://doi.org/10.5281/zenodo.5816591, accessed on 5 August 2022). The CLCD provides land cover information for China from 1985 to 1990 and then annually up to 2020. It has a spatial resolution of 30 m and is based on 335,709 Landsat scenes processed on GEE.

3.1.4. Labeled Samples of Terraced Fields

Terraced-field coordinates in the YRB were collected every five years from 1990 to 2020 using Google Earth Pro.

3.2. Terraced-Field Interpretation

This study utilized the image review feature in Google Earth Pro to display high-resolution images from 1990 to 2020. Terraced and non-terraced fields were labeled in these images, and the labeled coordinates were imported into GEE. Subsequently, these labeled coordinates were used as a feature set in GEE. The sample collection was divided into two categories: terraced fields were assigned a value of 1, and non-terraced-field areas, including water bodies, buildings, wetlands, grasslands, forests, barrens, shrubs, and more, were assigned a value of 0. There were 500 sampling points for each of the years, with 250 terraced fields and 250 non-terraced-field areas. The training set and test set were divided in a ratio of 60% to 40%.
The model was trained on the GEE platform, based on the vector boundary of the YRB and using the annotated data and Landsat 5 and Landsat 8 images; image interpretation was performed for the five-year intervals from 1990 to 2020. The GEE platform was utilized to export interpreted images. Subsequently, QGIS was employed for zoning statistics based on the public vector boundaries of China’s provincial and municipal administrative regions. The area of each province and city was calculated, and the results were organized into tables. Further analysis of the images was then interpreted.

3.3. Spatial Autocorrelation Analysis

Spatial autocorrelation is the analysis of the spatial distribution characteristics of spatial units based on the matching of positional similarity and attribute similarity [24,25]. If the values at nearby locations are similar, positive spatial autocorrelation occurs; if they are dissimilar, negative spatial autocorrelation occurs. The methods used in this study included the global Moran’s index, local Moran’s index, and Moran’s scatter plot [26].

3.3.1. Global Spatial Autocorrelation

The global spatial autocorrelation test examines the presence or absence of spatial correlation in the attribute values of adjacent or nearby spatial units [27]. The expression for the global Moran’s I index is as follows:
I = n i = 1 n j = 1 n w i j ( x i x ¯ ) ( x j x ¯ ) i = 1 n j = 1 n w i j i = 1 n ( x i x ) 2 = i = 1 n j i n w i j ( x i x ¯ ) ( x j x ¯ ) S 2 i = 1 n j i n w i j  
where x i is the attribute value of feature i, x ¯ is its mean value from 1990 to 2020, w i j is the spatial weight between features i and j, n is the total number of features, and S 2 = 1 n i ( x i x ¯ ) 2 .

3.3.2. Local Spatial Autocorrelation

Because there are differences in spatial autocorrelation levels between different spatial units and their neighborhoods within the study area, global evaluation cannot accurately indicate the specific spatial location of aggregation or anomaly occurrence [28]. To overcome this deficiency, a local spatial autocorrelation analysis must be performed. The main methods are the local indicators of spatial association (LISAs) [29] and Moran scatter plots [30].
Local indicators of spatial association (LISAs) are used to evaluate the degree of similarity or difference between the attribute values of the observation unit and its surrounding units. LISAs include the local Moran’s index and local Geary’s index, and the expression for the local Moran’s index is as follows:
I i = n ( x i x ¯ ) j w i j ( x j x ¯ ) i ( x i x ) 2 = n z i j w i j z j z T z = z i j w i j z j
where   z i and   z j are standardized observation values.
The Moran scatter plot reflects the local spatial autocorrelation of spatial location attributes [28] and is used to express the existence of concentrated aggregation or anomaly features within a local region. It is presented in the form of coordinates in four quadrants, namely high–high (HH) in the first quadrant, high–low (HL) anomaly in the second quadrant, low–low (LL) aggregation in the third quadrant, and low–high (LH) anomaly in the fourth quadrant, to characterize the local spatial connectivity of the four types of aggregation between the region units and their neighbors.

3.4. Durbin–Watson Test

In regression analysis, the independence of variables within the model is a challenge that requires careful attention. To assess this independence, the classical Durbin–Watson test is a widely utilized method. In this study, we conducted Durbin–Watson tests using SPSS 26 software to investigate the independence of residuals within our model. Specifically, we used TRAs as the dependent variable and incorporated influential factors such as FA, GA, SA, and WA as independent variables within the framework of regression analysis.

3.5. Analysis Methods for Driving Force Factors

3.5.1. Spearman Correlation Analysis

Spearman rank correlation coefficient is a method for studying the correlation of variables. In this study, the Spearman rank correlation coefficient was used to examine the relationship between TRAs, FA, SA, GA, WA, SIA, BA, CP, IA, POP, PRE, TEMP, NDVI, and NL in the YRB to determine the reasons for differences in TRA distribution. The calculation method was as follows:
  ρ = i ( x i x ¯ ) ( y i y ¯ ) i ( x i x ¯ ) 2 i ( y i y ¯ ) 2
where x i   and y i represent the values of two variables for each data point, and x ¯ and y ¯ denote the means of the two variables.

3.5.2. Linear Mixed Model (LMM) Analysis

In this study, we employed an LMM to analyze the factors driving changes in TRAs. Initially, we systematically selected driving force variables one by one from a set of potential factors. This involved conducting Durbin–Watson tests for independence, Spearman correlation analysis, and multicollinearity evaluation to ensure the independence and correlation among variables. Subsequently, we utilized an LMM to establish a fitting model for TRAs. This approach was used to enhance the robustness of our analysis and enable accurate inference of the relationships between variables in the context of TRA prediction. The basic form of the LMM is as follows:
  y i j = β 0 + β 1 x i j + b 0 i + e i j
where y i j is the j th observation in the i th group, x i j is one or more predictor variables for the observation, β 0 and β 1 are the fixed-effect coefficients, b 0 i   is the random-effect coefficient for the i th group, and e i j is the random error term for the observation. In this model, β 0 and β 1 represent the average intercept and slope of the entire dataset, whereas β 0 represents the deviation for each group. The model assumes that the random effect b 0 i follows a normal distribution and the random error   e i j   also follows a normal distribution. The objective of this model is to estimate the values for a terraced area.

3.5.3. Random Forest Model Analysis

In this part of the study, mirroring the process of constructing the LMM, we systematically selected driving variables one by one from a set of potential factors. This procedure encompassed conducting Durbin–Watson tests for independence, Spearman correlation analysis, and multicollinearity evaluation to ensure the qualification of variables incorporated into the model. Subsequently, we employed the random forest model to establish a fitted model for TRAs. Specifically, we first shuffled the collected 406 pieces of data and set the training ratio to 0.7 for random forest regression prediction. Cross-validation parameters were set at 3, with each tree having a maximum depth of 10 and a maximum of 50 leaf nodes. A total of 100 decision trees were employed, utilizing the bootstrapping method for model construction. Additionally, we refitted the model for the top three variables of importance, resulting in the final fitted model. Finally, based on the research methods outlined above, create a flow chart of the research (Figure 2).
Figure 2. Flow chart of the research. The definitions of the abbreviated variables can be found in Table 1.
Figure 2. Flow chart of the research. The definitions of the abbreviated variables can be found in Table 1.
Sustainability 15 15607 g002

4. Results

4.1. Evaluation of the Interpretation Accuracy of TRAs

According to the analysis of Table 2. This model demonstrated very high overall accuracy, as it was measured by the following average accuracy metrics: an overall accuracy of 0.92, a Kappa coefficient of 0.84, a precision of 0.98, a recall of 0.86, and an F1 score of 0.92. In the time range from 1990 to 2020, various accuracy indicators were similar to the average value, and there were no outliers. The accuracy in 2005 was the highest, while the accuracy in 1990 was relatively low, and the accuracy in other years was not much different.

4.2. Trend Analysis of TRA Changes Every 5 Years in a 30-Year Period

4.2.1. Analysis of Spatiotemporal Changes

Based on the analysis of Figure 3a,c, it was evident that over the past 30 years, Gansu and Qinghai had featured larger absolute values of TRAs, whereas Sichuan and Shandong exhibited relatively smaller absolute values of TRAs. The trend lines in the graph revealed that TRAs in provinces such as Gansu, Qinghai, and Shaanxi showed overall upward trends. Conversely, Shanxi and Inner Mongolia exhibited overall downward trends. When compared with other provinces, the TRA variations in Ningxia, Shandong, Sichuan, and Henan appeared to be more stable. According to TRAs, the provinces within YRB, namely Shaanxi, Shanxi, Gansu, and Qinghai, had TRAs exceeding 40,000 km2, while other regions had TRAs of less than 30,000 km2. Shifting to the analysis of Figure 3b,c, it was revealed that Gansu province accounted for approximately 25% of the total TRAs within the YRB, whereas Sichuan and Shandong possessed the smallest TRAs, comprising only around 1% of the total area. Disparities in the total TRAs among the provinces within the YRB existed, where provinces such as Shaanxi, Shanxi, Qinghai, and Gansu (belonging to the upstream region) exhibited larger TRAs, whereas other provinces tended to have smaller TRAs. It is worth noting that the relatively smaller TRAs in certain provinces, such as Sichuan and Shandong, were due to their smaller land areas within the YRB.
From the data in Table 3, it was evident that there were differences in the degree of TRA variability coefficients among different provinces within the YRB. Shaanxi and Gansu provinces exhibited relatively stable TRA variability coefficients of 10.65% and 10.69%, respectively. Shandong and Shanxi provinces had coefficients of 16.09% and 17.63%, respectively, showing relatively steady fluctuations in their total TRAs. Provinces such as Ningxia, Qinghai, Inner Mongolia, and Henan displayed coefficients ranging from 19.99% to 30.17%, indicating a moderate degree of TRA fluctuation. Notably, Sichuan province stood out with the high coefficient of 41.87%, suggesting significant fluctuations in its total terraced-rice area over the past thirty years.

4.2.2. Spatial Autocorrelation Analysis

Based on the analysis of Figure 4a, it was observed that the global Moran’s I index for the TRAs in the YRB gradually decreased from 0.386 in 1990 to 0.355 in 2005, indicating a gradual reduction in regional disparities. However, from 2005 to 2020, the fluctuation of the Moran’s I index slightly increased to 0.403, suggesting a gradual increase in regional disparities in total TRAs during this period. According to the analysis results from Figure 4b, most scatter points were close to the y = x line, but there were still a small number of scatter points irregularly distributed on both sides of the y=x line. This indicated a high predictive accuracy for the model, but there was still a slight bias.
Local indicator of spatial association (LISA) cluster maps helped visualize the clustering of regions with similar values on a map, highlighting spatial patterns and revealing potential relationships between neighboring areas. Based on the analysis results from Figure 4c, it was found that the terraced areas in the YRB had exhibited significant spatial clustering patterns over the past 30 years rather than being randomly distributed. These clustering patterns remained relatively stable over time and space, with minimal changes observed from 1990 to 2020. At the level of city-level administrative regions, the central part of the TRAs and certain western regions exhibited a higher degree of clustering, whereas the eastern regions of Henan and Shandong, as well as the western region of Sichuan, showed a lower degree of clustering. This indicated that the HH clustering zone was primarily located in the middle and upper reaches of the Yellow River, including Qinghai, Gansu, and the intersection area of Sichuan. In the middle reaches, it included the Hetao region but excluded the northern arid and sandy areas. The downstream regions (such as Henan and Shandong) exhibited a lower level of clustering, constituting the LL clustering zone. Additionally, in the western part of Sichuan, there were adjacent areas characterized by alternating low and high clustering of terraced fields that belonged to the LH terraced-field clustering zone. Finally, no HL clustering zone was observed for the past 30 years.

4.3. Driving Force Analysis

4.3.1. Durbin–Watson Test

The purpose of the Durbin–Watson test was to examine whether there was independence among model residuals. The Durbin–Watson statistic ranged from 0 to 4, with the following interpretations: a value close to 2 indicated relatively high independence among residuals, whereas a deviation from 2 suggested a lack of independence among residuals. This study rigorously applied the Durbin–Watson test to determine the independence of model residuals, a crucial step in ensuring the robustness of the regression analysis results. Due to the nature of data collection in this study, which involved surveys conducted at the same location but at different times, the Durbin–Watson test yielded a low score. As shown in Table 4, the Durbin–Watson score was 0.814, indicating a lack of independence among data points. Consequently, traditional linear regression was not suitable for analysis. To address this issue, this study simultaneously employed an LMM and a random forest model for accurate prediction. These models offered enhanced robustness and flexibility by better fitting the data and considering potential confounding variables that might have affected the outcomes. By leveraging these advanced modeling techniques, this study aimed to comprehensively understand the relationship between predictive variables and focal outcomes.
For the Durbin–Watson test on independent variables, the independent variables were (constant), NDVI, CP, NL, WA, FA, TEMP, SA, GA, PRE, IA, POP, BA, and SIA. The dependent variable was the total TRAs.

4.3.2. Spearman Correlation Analysis

As seen in Figure 5, we observed significant correlations between TRAs and other variables at a significance level of 0.01. Specifically, the correlation coefficients between TRAs and CP, FA, SA, GA, WA, SIA, BA, and POP were all positive, indicating a positive correlation. The strongest correlation coefficient was between the total TRAs and SA, indicating a very strong positive correlation between these two variables. Based on the above analysis, some conclusions could be drawn; for example, there is a strong correlation between the total TRAs and SA, GA, and FA. In this section, Spearman correlation analysis was conducted, and we considered that when the absolute value of the correlation coefficient exceeded 0.7, it indicated a strong relationship between variables, which could suggest the presence of multicollinearity issues and provide an initial reference for variable selection in the subsequent analysis.

4.3.3. Multicollinearity Diagnosis

Based on the multicollinearity diagnosis conducted using SPSS 26, we determined that the variables retained after removing the non-compliant driving force factors (SIA, BA, Pop) met the criteria. In Table 5, the “Collinearity Statistics” column displays the tolerance and VIF values, indicating the absence of significant multicollinearity among the variables. All VIF values of the independent variables were below five, and the tolerance values were all above 0.1. Consequently, we concluded that there were no noteworthy collinearity issues in the driving force analysis of this study that necessitated special attention. This study diligently analyzed and managed collinearity issues to ensure the reliability and precision of the results.

4.4. Linear Mixed Model (LMM) Analysis

Following the multicollinearity evaluation using SPSS 26, the driving force factors that passed the test were taken as independent variables, whereas TRAs served as the dependent variable and the years were utilized as the repeated factors for the LMM. Upon analyzing the results in Table 6, CP, SA, and GA were selected, with all three having a significance level of less than 0.05. This selection process streamlined the equation and enhanced its practicality. Subsequently, these chosen variables were refitted, leading to the outcomes presented in Table 7. Based on the analysis of the data in Table 7, the variables CP, SA, and GA exerted a significant positive influence on the dependent variable within the model, demonstrated by their positive coefficients. Furthermore, the R-squared value of the model stood at 0.416, indicating that these variables accounted for 41.6% of the variability in the dependent variable. Consequently, it can be concluded that CP, SA, and GA hold a substantial positive impact on the dependent variable when the total TRAs are used as the outcome variable. It is worth noting that the R-squared value of the model was not particularly high, prompting this study to consider employing a random forest model, as elaborated in the upcoming section.

4.5. Random Forest Model

4.5.1. Optimizing the Model through Variable Importance Assessment

To enhance the precision of the study, both an LMM and a random forest model were employed for analysis. Given the relatively low R-squared value of the linear mixed model, a deliberate effort was made to bolster the study’s reliability by means of comparative analysis involving multiple models. Initially, a rigorous process involving the screening of variables for multicollinearity was carried out, followed by diagnostic assessments. Subsequently, the importance of each variable was evaluated through simulation using the random forest regression model, with the results depicted in Figure 6a. Based on these outcomes, the three most critical variables, namely CP, SA, and GA, were selected. Subsequently, an in-depth simulation using the random forest model was conducted to explore their impact on the study, with the corresponding results presented in Figure 6b. Specifically, the significance of SA accounted for 43.0%, GA accounted for 44.0%, and CP accounted for 13.0%.

4.5.2. Evaluation of Accuracy Metrics for the Random Forest Model

This study employed standardized data in the random forest regression model. As shown in Figure 7, the evaluation results of the random forest model indicated that both the training and testing sets exhibited relatively low values for mean squared error (MSE), root mean squared error (RMSE), and mean absolute error (MAE), suggesting minimal predictive errors for the model. Additionally, the mean absolute percentage error (MAPE) was also relatively low, indicating modest prediction errors in relation to the actual values. The R2 (coefficient of determination) values for the training and testing sets were 0.984 and 0.864, respectively. In comparison with the R2 value of 0.416 obtained by the LMM model, the random forest model demonstrated superior performance.

5. Discussion

5.1. Advantages of Utilizing GEE for Terrace Interpretation and Analyzing Spatiotemporal Variability of TRAs

In this study, we utilize the GEE platform to address the limitations of traditional remote sensing monitoring of terraces. Its integration with Google servers simplifies online processing and eliminates time-consuming data preparation [31]. GEE provides direct access to global Landsat reflectance data, providing real-time updates and simplified preprocessing. Its powerful server can efficiently process large-scale remote sensing data [32] and is supported by user-friendly built-in tools [33]. GEE overcomes spatiotemporal challenges and maintains high accuracy, precision, and recall. In this study, we built a total of seven models covering the period from 1990 to 2020. These models are generally similar in accuracy, although there are slight differences. Among them, the 2015 model performed the best, while the 1990 model had relatively low accuracy. This difference can be attributed to the fact that the model in this study was trained on manually collected samples and used Landsat 5 satellite data from 1990 to 2010, while Landsat 7 satellite data were used in 2015 and 2020.
Cao et al. [34], using GEE and multi-source data, created a high-resolution terraced-field map for all of China. This achievement informed our research. Yang et al. [35] combined random forest classifiers and phenological data to enhance accuracy and depict historical land use changes. Our study maximized GEE capabilities by merging DEM and Landsat data to calculate indices such as NDVI and KNDWI. Using the random forest approach, we improved TRA identification. GEE streamlined TRA extraction, advancing our research. Our study applied the random forest approach in machine learning to identify TRAs in the YRB. Our model displayed impressive predictive accuracy, which is crucial for informed decisions and strategies. It excelled in the measures of precision, recall, and F1 score [36], affirming its robustness [37]. Effective feature selection and training contributed to this success. Rigorous preprocessing and validation ensured reliability. Overall, our study demonstrated the model’s reliability.

5.2. Spatiotemporal Variation Patterns of TRAs in the YRB

When examining the spatiotemporal shifts depicted in Figure 8, we can discern diverse trends in the total TRA size across distinct regions over the preceding decades. Notably, certain areas experienced TRA growth within specific periods, followed by subsequent decline, whereas other regions showed distinctive evolving patterns. Terraced fields, as a historically significant form of agriculture, appear to have been influenced by various aspects of modern agricultural development. According to the research of Pepe et al. [38], Yang et al. [39], and Claessens et al. [40], fluctuations in terraced-field areas (TRAs) seem to be associated with diverse factors, including local natural conditions, land use patterns, agricultural techniques, and economic development levels. For instance, certain locales may have implemented strategies for terraced-field conservation and restoration during particular timeframes, resulting in TRA expansion. Furthermore, Figure 3 shows the continuity of spatial and temporal terraced-field patterns within the YRB over the last few decades. This resilience underscores the traditional essence and relative stability of terraced cultivation in the region. As shown in Table 3, the coefficient of variation in TRA distribution was used to elucidate the disparities in TRA variation among different provinces in China. Dong et al. [41] discovered that regions with higher stability may hold relative advantages in agricultural management and policy implementation, whereas areas with greater volatility might require more adaptive coping strategies. Consequently, in regions such as Shanxi and Gansu, there could be better performance in agricultural management and policy execution.
Through spatial autocorrelation analysis, we found a certain degree of spatial correlation in the distribution of terraced fields in the YRB, indicating that the distribution of TRAs was not entirely random. Particularly in the midstream areas of the YRB, a high–high (HH) clustering trend exists, where adjacent areas have relatively larger TRA values. However, in the downstream regions of the YRB, the spatial correlation of the TRAs is lower and there are significant differences in TRA values among neighboring areas. According to the research findings of Wang et al. [42] on land use in the YRB, this phenomenon is likely attributable to various factors including geography, climate, and land use. Based on observations from the Moran scatter plots and cluster maps, our model demonstrated a high level of predictive accuracy, with most scatter points closely aligned with the y=x line, indicating that the predicted results aligned well with actual observations and further confirming the model’s strong accuracy.

5.3. Driving Force Factors of TRA Changes in the YRB

The comparison between the random forest model and the LMM highlights the clear advantages of the random forest approach. In terms of predictive performance, the R-squared value of the LMM model was relatively modest at 0.416. In contrast, the random forest model excelled with significantly higher R-squared values of 0.983 on the training set and 0.860 on the testing set. This substantial improvement in predictive accuracy on both datasets underscores the effectiveness of the random forest methodology in capturing complex relationships within the data [43]. It suggests that the random forest model’s ability to handle non-linearities and interactions leads to a more accurate representation of the underlying patterns compared with the LMM. Consequently, the random forest model emerged as a robust choice for modeling the given dataset and achieving superior predictive outcomes.
Through our model analysis, we drew the conclusion that the primary factors influencing TRA size included SA, GA, and CP. These findings provide crucial insights for a deeper understanding and interpretation of the driving forces behind TRA changes. Additionally, in the study conducted by Yang et al. [35], it was revealed that over the past thirty years, the proportions of rice cultivation in the Honghe Hani terraced fields were 10.651%, 8.810%, and 5.711%. Notably, these areas underwent conversion into forests, shrublands, or grasslands, aligning harmoniously with the conclusions of our study. Furthermore, the research by Zhou et al. [44] highlighted that land consolidation predominantly contributed to the expansion of cultivated terraced-land areas. These findings collectively substantiate the conclusions of our study, indicating that farmland, shrubbery, and grasslands exert a positive impact on the extent of terraced terrain.
Firstly, the impact of SA on TRAs may be related to root characteristics. The root systems of shrubs contribute to stabilizing soil, preventing soil erosion, and facilitating water penetration and retention, thereby providing a dependable water resource for TRAs. Additionally, SA offers shading, reducing the evaporative water loss from TRAs, promoting water retention capacity, and enhancing crop growth [45]. Secondly, GA plays a crucial role within the terraced ecosystem. The coverage of GA vegetation effectively prevents soil erosion, maintaining the integrity of the terraced structures. GA also aids in the accumulation of organic matter, improving soil quality, enhancing water retention capacity, and providing abundant nutrients for TRAs. Lastly, the influence of CP on TRAs could be attributed to the fact that agricultural activities, including cultivation and planting, directly increase the extent of TRAs. The development and effective utilization of CP expands the coverage of a TRA, thereby enlarging its extent.

6. Conclusions

In summary, the integration of the GEE platform and the random forest model enabled a rapid and efficient interpretation of terraced fields. Furthermore, a comprehensive analysis of the spatiotemporal variations in TRAs within the YRB revealed that, overall, TRAs in the YRB exhibited a relatively stable trend. Provinces such as Gansu, Qinghai, Shaanxi, and Shanxi showed an increasing trend for TRAs, whereas others displayed a decreasing trend. Spatial autocorrelation analysis indicated that between 1990 and 2005, the Moran’s index for TRAs within the YRB gradually decreased, suggesting a reduction in regional disparities; however, this trend was reversed, and the values increased from 2005 to 2020. The distribution of terraced fields in the YRB exhibited a certain degree of spatial correlation, implying that the distribution of TRAs was not entirely random. Additionally, no high–low (HL) clusters were observed in the past 30 years. By using an LMM and the random forest model to fit the TRA data, it was found that the random forest model outperformed the LMM in all accuracy indicators. In terms of the R2 value, the random forest model performed better, with a value of 0.983; it reached 0.860 on the testing set. This suggests that the random forest model provides a more effective explanation of the data. These practical findings can play a pivotal role in policy formulation and decision making, offering a scientific foundation for the pursuit of regional ecological balance and sustainable development objectives.

Author Contributions

Conceptualization, Z.L. and J.T.; methodology, Z.L. and Q.Y.; validation, Z.L., J.T. and Q.Y.; formal analysis, Z.L. and X.F.; investigation, Z.L., Y.R. and G.W.; resources, Z.L., J.T. and Y.W.; data curation, Z.L. and Q.Y.; writing—original draft preparation, Z.L.; writing—review and editing, Z.L. and J.T.; visualization, Z.L.; supervision, J.T.; project administration, J.T.; funding acquisition, J.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Project No. 31960330 and Project No. 31560232).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on reasonable request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Tian, P.; Tian, X.; Geng, R.; Zhao, G.; Yang, L.; Mu, X.; Gao, P.; Sun, W.; Liu, Y. Response of soil erosion to vegetation restoration and terracing on the Loess Plateau. Catena 2023, 227, 107103. [Google Scholar] [CrossRef]
  2. Fang, H. Effect of soil conservation measures and slope on runoff, soil, TN, and TP losses from cultivated lands in northern China. Ecol. Indic. 2021, 126, 107677. [Google Scholar] [CrossRef]
  3. Arigaw, A.K.; Bedewi, S.A.; Yohannes, M.D. Sediment yield responses to land use land cover change and developing best management practices in the upper Gidabo dam watershed. Sustain. Water Resour. Manag. 2023, 9, 68. [Google Scholar]
  4. Jayanta, L.; Anup, D.; Kumar, G.P.; Krishnappa, R.; Rattan, L.; Gandhiji, I.R.; Prasad, N.C.; Utpal, D. Double no-till and rice straw retention in terraced sloping lands improves water content, soil health and productivity of lentil in Himalayan foothills. Soil Tillage Res. 2022, 221, 105381. [Google Scholar]
  5. Shi, X.; Song, X.; Yang, J.; Zhao, Y.; Yuan, Z.; Zhao, G.; Abbott, L.K.; Zhang, F.; Li, F.-M. Yield benefits from joint application of manure and inorganic fertilizer in a long-term field pea, wheat and potato crop rotation. Field Crops Res. 2023, 294, 108873. [Google Scholar] [CrossRef]
  6. Wu, D.; Wei, W.; Li, Z.; Zhang, Q. Coupling Effects of Terracing and Vegetation on Soil Ecosystem Multifunctionality in the Loess Plateau, China. Sustainability 2023, 15, 1682. [Google Scholar] [CrossRef]
  7. Ma, M.; Lei, E.; Wang, T.; Meng, H.; Zhang, W.; Lu, B. Genetic Diversity and Association Mapping of Grain-Size Traits in Rice Landraces from the Honghe Hani Rice Terraces System in Yunnan Province. Plants 2023, 12, 1678. [Google Scholar] [CrossRef]
  8. Tarolli, P.; Preti, F.; Romano, N. Terraced landscapes: From an old best practice to a potential hazard for soil degradation due to land abandonment. Anthropocene 2014, 6, 10–25. [Google Scholar] [CrossRef]
  9. Wang, W.; Straffelini, E.; Tarolli, P. Steep-slope viticulture: The effectiveness of micro-water storage in improving the resilience to weather extremes. Agric. Water Manag. 2023, 286, 108398. [Google Scholar] [CrossRef]
  10. Hopkins, A.J.; Snyder, N.P. Performance evaluation of three DEM-based fluvial terrace mapping methods. Earth Surf. Process. Landf. 2016, 41, 1144–1152. [Google Scholar] [CrossRef]
  11. Huang, W.; Duan, W.; Nover, D.; Sahu, N.; Chen, Y. An integrated assessment of surface water dynamics in the Irtysh River Basin during 1990–2019 and exploratory factor analyses. J. Hydrol. 2021, 593, 125905. [Google Scholar] [CrossRef]
  12. Kumar, R.; Nath, A.J.; Nath, A.; Sahu, N.; Pandey, R. Landsat-based multi-decadal spatio-temporal assessment of the vegetation greening and browning trend in the Eastern Indian Himalayan Region. Remote Sens. Appl. Soc. Environ. 2022, 25, 100695. [Google Scholar] [CrossRef]
  13. Hu, Y.; Duan, W.; Chen, Y.; Zou, S.; Kayumba, P.M.; Sahu, N. An integrated assessment of runoff dynamics in the Amu Darya River Basin: Confronting climate change and multiple human activities, 1960–2017. J. Hydrol. 2021, 603, 126905. [Google Scholar] [CrossRef]
  14. Yu, M.; Li, Y.; Luo, G.; Yu, L.; Chen, M. Agroecosystem composition and landscape ecological risk evolution of rice terraces in the southern mountains, China. Ecol. Indic. 2022, 145, 109625. [Google Scholar] [CrossRef]
  15. Yang, J.; Huang, X. The 30 m annual land cover dataset and its dynamics in China from 1990 to 2019. Earth Syst. Sci. Data 2021, 13, 3907–3925. [Google Scholar] [CrossRef]
  16. Winzeler, H.E.; Owens, P.R.; Kharel, T.; Ashworth, A.; Libohova, Z. Identification and Delineation of Broad-Base Agricultural Terraces in Flat Landscapes in Northeastern Oklahoma, USA. Land 2023, 12, 486. [Google Scholar] [CrossRef]
  17. Li, F.-B.; Lu, G.-D.; Zhou, X.-Y.; Ni, H.-X.; Xu, C.-C.; Yue, C.; Yang, X.-M.; Feng, J.-F.; Fang, F.-P. Elevation and Land Use Types Have Significant Impacts on Spatial Variability of Soil Organic Matter Content in Hani Terraced Field of Yuanyang County, China. Rice Sci. 2015, 22, 27–34. [Google Scholar]
  18. Rocha, J.; Duarte, A.; Fabres, S.; Quintela, A.; Serpa, D. Influence of DEM Resolution on the Hydrological Responses of a Terraced Catchment: An Exploratory Modelling Approach. Remote Sens. 2023, 15, 169. [Google Scholar] [CrossRef]
  19. Zhang, Y.; Wang, Y.; Sun, S.; Chen, X. Quantifying Interregional Flows of Ecosystem Services to Enhance Water Security in the Yellow River Basin, China. J. Water Resour. Plan. Manag. 2023, 149, 04023018. [Google Scholar] [CrossRef]
  20. Wang, L.; Yao, W.; Xiao, P.; Hou, X. The Spatiotemporal Characteristics of Flow-Sediment Relationships in a Hilly Watershed of the Chinese Loess Plateau. Int. J. Environ. Res. Public Health 2022, 19, 9089. [Google Scholar] [CrossRef]
  21. Sun, P. The Effect of a Small Initial Distortion of the Basic Flow on the Subcritical Transition in Plane Poiseuille Flow. Q. Appl. Math. 2001, 59, 667–699. [Google Scholar] [CrossRef]
  22. Wang, X.; Shi, S.; Zhao, X.; Hu, Z.; Hou, M.; Xu, L. Detecting Spatially Non-Stationary between Vegetation and Related Factors in the Yellow River Basin from 1986 to 2021 Using Multiscale Geographically Weighted Regression Based on Landsat. Remote Sens. 2022, 14, 6276. [Google Scholar] [CrossRef]
  23. Rong, T.; Zhang, P.; Li, G.; Wang, Q.; Zheng, H.; Chang, Y.; Zhang, Y. Spatial correlation evolution and prediction scenario of land use carbon emissions in the Yellow River Basin. Ecol. Indic. 2023, 154, 110701. [Google Scholar] [CrossRef]
  24. Fan, W.; Song, X.; Liu, M.; Shan, B.; Ma, M.; Liu, Y. Spatio-temporal evolution of resources and environmental carrying capacity and its influencing factors -A case study of shandong peninsula urban agglomeration. Environ. Res. 2023, 234, 116469. [Google Scholar] [CrossRef] [PubMed]
  25. Wang, J.; Wang, J.; Zhang, J. Spatial distribution characteristics of natural ecological resilience in China. J. Environ. Manag. 2023, 342, 118133. [Google Scholar] [CrossRef] [PubMed]
  26. Qi, H.; Shen, X.; Long, F.; Liu, M.; Gao, X. Spatial-temporal characteristics and influencing factors of county-level carbon emissions in Zhejiang Province, China. Environ. Sci. Pollut. Res. Int. 2022, 30, 10136–10148. [Google Scholar] [CrossRef] [PubMed]
  27. Xiong, L.; Wang, F.; Cheng, B.; Yu, C. Identifying factors influencing the forestry production efficiency in Northwest China. Resour. Conserv. Recycl. 2018, 130, 12–19. [Google Scholar] [CrossRef]
  28. Yang, Q.; Pu, L.; Jiang, C.; Gong, G.; Tan, H.; Wang, X.; He, G. Unveiling the spatial-temporal variation of urban land use efficiency of Yangtze River Economic Belt in China under carbon emission constraints. Front. Environ. Sci. 2023, 10, 1096087. [Google Scholar] [CrossRef]
  29. Wu, Y.H. Analysis on Spatial Difference of the Rural Resident’s per Capita Net Income in Qinhuangdao City Based on ESDA. Adv. Mater. Res. 2014, 955, 3893–3898. [Google Scholar] [CrossRef]
  30. Zhu, M.; Tang, H.; Elahi, E.; Khalid, Z.; Wang, K.; Nisar, N. Spatial-Temporal Changes and Influencing Factors of Ecological Protection Levels in the Middle and Lower Reaches of the Yellow River. Sustainability 2022, 14, 14888. [Google Scholar] [CrossRef]
  31. Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
  32. Cui, J.; Guo, Y.; Xu, Q.; Li, D.; Chen, W.; Shi, L.; Ji, G.; Li, L. Extraction of Information on the Flooding Extent of Agricultural Land in Henan Province Based on Multi-Source Remote Sensing Images and Google Earth Engine. Agronomy 2023, 13, 355. [Google Scholar] [CrossRef]
  33. Paulina, Z.; Ewa, D.; Kamil, M. Accuracy of the evaluation of forest areas based on Landsat data using free software. Folia For. Pol. 2023, 65, 76–85. [Google Scholar]
  34. Cao, B.; Yu, L.; Naipa, V.; Ciais, P.; Li, W.; Zhao, Y.; Wei, W.; Chen, D.; Liu, Z.; Gong, P. A 30 m terrace mapping in China using Landsat 8 imagery and digital elevation model based on the Google Earth Engine. Earth Syst. Sci. Data 2021, 13, 2437–2456. [Google Scholar] [CrossRef]
  35. Yang, J.; Xu, J.; Zhou, Y.; Zhai, D.; Chen, H.; Li, Q.; Zhao, G. Paddy Rice Phenological Mapping throughout 30-Years Satellite Images in the Honghe Hani Rice Terraces. Remote Sens. 2023, 15, 2398. [Google Scholar] [CrossRef]
  36. Raza, M.; Hussain, F.K.; Hussain, O.K.; Zhao, M.; ur Rehman, Z. A comparative analysis of machine learning models for quality pillar assessment of SaaS services by multi-class text classification of users’ reviews. Future Gener. Comput. Syst. 2019, 101, 341–371. [Google Scholar] [CrossRef]
  37. Kynkäänniemi, T.; Karras, T.; Laine, S.; Lehtinen, J.; Aila, T. Improved Precision and Recall Metric for Assessing Generative Models. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar] [CrossRef]
  38. Pepe, G.; Mandarino, A.; Raso, E.; Scarpellini, P.; Brandolini, P.; Cevasco, A. Investigation on farmland abandonment of terraced slopes using multitemporal data sources comparison and its implication on hydro-geomorphological processes. Water 2019, 11, 1552. [Google Scholar] [CrossRef]
  39. Yang, K.; Lu, C. Evaluation of land-use change effects on runoff and soil erosion of a hilly basin—The Yanhe River in the Chinese Loess Plateau. Land Degrad. Dev. 2018, 29, 1211–1221. [Google Scholar] [CrossRef]
  40. Claessens, L.; Stoorvogel, J.; Antle, J. Exploring the impacts of field interactions on an integrated assessment of terraced crop systems in the Peruvian Andes. J. Land Use Sci. 2010, 5, 259–275. [Google Scholar] [CrossRef]
  41. Dong, S.; Xin, L.; Li, S.; Xie, H.; Zhao, Y.; Wang, X.; Li, X.; Song, H.; Lu, Y. Extent and spatial distribution of terrace abandonment in China. J. Geogr. Sci. 2023, 33, 1361–1376. [Google Scholar] [CrossRef]
  42. Wang, S.-Y.; Liu, J.-S.; Ma, T.-B. Dynamics and changes in spatial patterns of land use in Yellow River Basin, China. Land Use Policy 2010, 27, 313–323. [Google Scholar] [CrossRef]
  43. Mantena, S.; Mahammood, V.; Rao, K.N. Prediction of soil salinity in the Upputeru river estuary catchment, India, using machine learning techniques. Environ. Monit. Assess. 2023, 195, 1006. [Google Scholar] [CrossRef] [PubMed]
  44. Zhou, J.; Li, C.; Chu, X.; Luo, C. Is Cultivated Land Increased by Land Consolidation Sustainably Used in Mountainous Areas? Land 2022, 11, 2236. [Google Scholar] [CrossRef]
  45. Yu, H.; Song, G.; Li, T.; Liu, Y. Spatial Pattern Characteristics and Influencing Factors of Green Use Efficiency of Urban Construction Land in Jilin Province. Complexity 2020, 2020, 5637530. [Google Scholar] [CrossRef]
Figure 1. Overview of the research area: (a) spatial distribution of digital elevation model (DEM), (b) geographical distribution of the YRB in China, (c) spatial distribution of land use, (d) spatial distribution of provinces within the YRB (with area percentages in parentheses).
Figure 1. Overview of the research area: (a) spatial distribution of digital elevation model (DEM), (b) geographical distribution of the YRB in China, (c) spatial distribution of land use, (d) spatial distribution of provinces within the YRB (with area percentages in parentheses).
Sustainability 15 15607 g001
Figure 3. Spatiotemporal analysis of provincial TRAs and proportion of the total in the YRB (1990–2020): (a) TRA time-series plot, (b) variation in the percentage of TRAs relative to total area across the provinces in the YRB, (c) distribution of TRAs in the YRB in the last 30 years.
Figure 3. Spatiotemporal analysis of provincial TRAs and proportion of the total in the YRB (1990–2020): (a) TRA time-series plot, (b) variation in the percentage of TRAs relative to total area across the provinces in the YRB, (c) distribution of TRAs in the YRB in the last 30 years.
Sustainability 15 15607 g003
Figure 4. Spatial autocorrelation analysis of TRAs in the YRB: (a) bar chart of global Moran’s I index from 1990 to 2020, (b) Moran scatter map of municipal TRAs in the YRB, (c) local indicator of spatial association (LISA) cluster map.
Figure 4. Spatial autocorrelation analysis of TRAs in the YRB: (a) bar chart of global Moran’s I index from 1990 to 2020, (b) Moran scatter map of municipal TRAs in the YRB, (c) local indicator of spatial association (LISA) cluster map.
Sustainability 15 15607 g004
Figure 5. Spearman correlation heat map of influencing factors and the total TRAs in the YRB. The crosses in the graph indicate that there is no significance between the variables at the 0.01 level. The definitions for the abbreviated variables can be found in Table 1.
Figure 5. Spearman correlation heat map of influencing factors and the total TRAs in the YRB. The crosses in the graph indicate that there is no significance between the variables at the 0.01 level. The definitions for the abbreviated variables can be found in Table 1.
Sustainability 15 15607 g005
Figure 6. This figure illustrates the feature importance in random forest regression: (a) importance of variables in the model established with variables identified through multicollinearity evaluation, (b) importance of the top three variables when only using the three variables to establish the model. The definitions of the abbreviated variables can be found in Table 1.
Figure 6. This figure illustrates the feature importance in random forest regression: (a) importance of variables in the model established with variables identified through multicollinearity evaluation, (b) importance of the top three variables when only using the three variables to establish the model. The definitions of the abbreviated variables can be found in Table 1.
Sustainability 15 15607 g006
Figure 7. Prediction results and accuracy metrics for the random forest model used with the training and testing sets. The dashed line in Figure 7 is used to distinguish between the testing set and the training set.
Figure 7. Prediction results and accuracy metrics for the random forest model used with the training and testing sets. The dashed line in Figure 7 is used to distinguish between the testing set and the training set.
Sustainability 15 15607 g007
Figure 8. Fluctuations in TRA size at the provincial level in the YRB from 1990 to 2020.
Figure 8. Fluctuations in TRA size at the provincial level in the YRB from 1990 to 2020.
Sustainability 15 15607 g008
Table 1. Driving force factors and data sources.
Table 1. Driving force factors and data sources.
Value (Abbreviation)UnitData Source
1Terrace-field areas (TRAs)km2This study
2Normalized difference vegetation index (NDVI)/USGS Landsat 5 Level 2, Collection 2, Tier 1 (1990–2010)
https://earthengine.google.com/, accessed on 5 June 2022
USGS Landsat 8 Level 2, Collection 2, Tier 1 (2015–2020)
https://earthengine.google.com/, accessed on 5 June 2022
3Forest (FA)km2CLCD from Jie Yang; Xin Huang
https://doi.org/10.5281/zenodo.5816591/, accessed on 5 August 2022
4Shrub (SA)km2
5Grassland (GA)km2
6Water (WA)km2
7Snow/ice (SIA)km2
8Barren (BA)km2
9Cropland (CP)km2
10Impervious (IA)km2
11Precipitation (PRE)mmClimate Hazards Group InfraRed Precipitation with Station Data
https://www.chc.ucsb.edu/data/chirpson/, accessed on 5 August 2022
12Temperature (TEMP)°CERA5 Monthly Aggregates
https://www.ecmwf.int/, accessed on 5 August 2022
13Population (POP)/National Science & Technology Infrastructure of China
https://www.worldpop.org/, accessed on 27 June 2022
14Night light (NL)/Third Pole Environment Data Center
http://data.tpdc.ac.cn/, accessed on 25 June 2022
Table 2. Interpretation accuracy assessment for TRAs.
Table 2. Interpretation accuracy assessment for TRAs.
YearConfusion MatrixOverall AccuracyKappaPrecisionRecallF1
1990[[90, 4], [19, 98]]0.890.780.960.830.89
1995[[94, 1], [12, 106]]0.930.870.990.890.94
2000[[108, 4], [18, 92]]0.900.800.960.860.91
2005[[67, 1], [7, 87]]0.950.890.990.910.95
2010[[70, 0], [11, 133]]0.940.881.000.860.93
2015[[67, 1], [15, 103]]0.910.820.990.820.90
2020[[64, 1], [9, 86]]0.900.870.980.880.92
Average value0.920.840.980.860.92
Table 3. Variability coefficients of the TRAs in YRB provinces from 1990 to 2020.
Table 3. Variability coefficients of the TRAs in YRB provinces from 1990 to 2020.
ProvinceCoefficient of Variation
1Shaanxi10.65%
2Gansu10.69%
3Shandong16.09%
4Shanxi17.63%
5Ningxia19.99%
6Qinghai22.37%
7Inner Mongolia29.44%
8Henan30.17%
9Sichuan41.87%
Table 4. Machine learning accuracy metrics.
Table 4. Machine learning accuracy metrics.
RR2Adjusted R2Errors in Standard EstimatesDurbin–Watson Test
0.9390.8810.8772078.9840.814
Table 5. Multicollinearity evaluation results. The definitions of the abbreviated variables can be found in Table 1.
Table 5. Multicollinearity evaluation results. The definitions of the abbreviated variables can be found in Table 1.
ModelUnstandardized CoefficientsStandardized
Coefficients
tpCollinearity Statistics
BSEBetaToleranceVIF
(Constant)−698.683435.9 −1.6030.110
CP5.6640.7280.2437.7830.0000.3143.185
FA−0.3830.698−0.015−0.5490.5840.4012.493
SA18.2271.3940.41513.0790.0000.3053.281
GA4.4670.1690.83426.4320.0000.3083.248
WA−30.7282.105−0.382−14.5950.0000.4482.231
IA−7.9707.819−0.036−1.0190.3090.2513.979
NL0.9330.5130.0611.820.0690.2743.647
PRE0.0161.1220.0000.0140.9890.2843.516
TEMP8.16629.9660.0080.2730.7850.3732.680
NDVI−578.4821329.254−0.015−0.4350.6640.2593.862
Table 6. Effective variable selection and LMM fitting results. The definitions of the abbreviated variables can be found in Table 1.
Table 6. Effective variable selection and LMM fitting results. The definitions of the abbreviated variables can be found in Table 1.
CoefficientEstimationStandard ErrorDegrees of
Freedom
tSignificanceConfidence Interval
Lower LimitUpper Limit
Intercept−549.804831.32592.159−0.6610.510−2200.8501101.242
CP6.0201.346118.4604.4720.0003.3548.685
SA7.6132.061227.0153.6940.0003.55211.675
GA1.0400.36990.4562.8160.0060.3061.774
FA1.8981.99658.8030.9510.346−2.0965.894
IA7.5728.877205.4940.8530.395−9.93025.076
NL0.5120.715194.3840.7160.475−0.8981.923
NDVI−726.0501420.014285.714−0.5110.610−3521.0702068.965
PRE−0.3691.043308.927−0.3540.724−2.4221.683
TEMP−13.73461.226134.584−0.2240.823−134.824107.355
WA−0.1394.23397.060−0.0330.974−8.5418.262
Table 7. Top three significant variables from the initial model and results of the LMM fitting. The definitions of the abbreviated variables can be found in Table 1.
Table 7. Top three significant variables from the initial model and results of the LMM fitting. The definitions of the abbreviated variables can be found in Table 1.
CoefficientEstimationStandard ErrorDegrees of
Freedom
tSignificanceConfidence Interval
Lower LimitUpper Limit
Intercept−759.448543.90151.552−1.3960.169−1851.090332.197
CP5.7961.191118.4194.8690.0003.4398.154
SA8.6411.717197.6855.0320.0005.25412.027
GA1.2160.31161.8013.9130.0000.5951.837
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Z.; Tian, J.; Ya, Q.; Feng, X.; Wang, Y.; Ren, Y.; Wu, G. Interpretation and Spatiotemporal Analysis of Terraces in the Yellow River Basin Based on Machine Learning. Sustainability 2023, 15, 15607. https://doi.org/10.3390/su152115607

AMA Style

Li Z, Tian J, Ya Q, Feng X, Wang Y, Ren Y, Wu G. Interpretation and Spatiotemporal Analysis of Terraces in the Yellow River Basin Based on Machine Learning. Sustainability. 2023; 15(21):15607. https://doi.org/10.3390/su152115607

Chicago/Turabian Style

Li, Zishuo, Jia Tian, Qian Ya, Xuejuan Feng, Yingxuan Wang, Yi Ren, and Guowei Wu. 2023. "Interpretation and Spatiotemporal Analysis of Terraces in the Yellow River Basin Based on Machine Learning" Sustainability 15, no. 21: 15607. https://doi.org/10.3390/su152115607

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop