A Random Forest-Based Approach to Map Soil Erosion Risk Distribution in Hickory Plantations in Western Zhejiang Province , China

Increasing agroforestry areas with improper management has produced serious environmental problems, such as soil erosion. It is necessary to rapidly predict the spatial distribution of such erosion risks in a large area, but there is a lack of approaches that are suitable for mountainous regions. The objective of this research was to develop an approach that can effectively employ remotely-sensed and ancillary data, to map soil erosion risks in an agroforestry ecosystem in a mountainous region. This research employed field survey data, soil-type maps, digital elevation model data, weather station data, and Landsat imagery, for extraction of potential variables. It used the random forest approach to identify eight key variables—slope, slope of slope, normalized difference greenness index at leaf-on season, soil organic matter, fractional vegetation at leaf-on season, fractional soil at leaf-off season, precipitation in June, and percent of soil clay—for mapping soil erosion risk distribution in hickory plantations in Western Zhejiang Province, China. The results showed that an overall accuracy of 89.8% was obtained for three levels of soil erosion risk. Approximately one-fourth of hickory plantations were at high-risk, requiring the owners or decision makers to take proper measures to reduce the soil erosion problem. This research provides a new approach to predict soil erosion risk, based on the primary variables that can be extracted directly from remotely-sensed data and ancillary data. This proposed approach will be valuable for other agroforestry and plantations, such as Torreya grandis, eucalyptus, and the rubber tree, that are playing important roles in improving economic conditions for the local farmers but face soil erosion problems.


Introduction
Agroforestry area has continuously expanded globally, due to the requirement of human well-being and improved standard of living.However, intensively and improperly managed agroforestry often resulted in serious soil erosion and environmental problems [1][2][3][4], and further affected sustainability of agroforestry ecosystems [5].In general, agroforestry has some common characteristics-single tree species and simple stand structure, lack of shrubs and grass under Remote Sens. 2018, 10, 1899 2 of 20 tree canopy, and serious soil erosion problems [3,4].Of the many agroforestry types, hickory (Carya cathayensis) was grown, originally, in mountainous regions with steep slopes and relatively poor soil conditions, but was expanded rapidly in the past three decades in subtropical regions of China, because of its high economic value for local people [6,7].For the sake of intensive management and easy collection of hickory nuts, farmers often remove shrubs and grass, at least twice per year, in spring, before the addition of nutrients, and in early September, before the nut harvest (personal communications with farmers in Zhejiang Province during fieldwork).These hickory plantations have serious soil erosion problems because there is almost no understory, most of the year, as shown in Figure 1.The soil erosion is especially serious in the rainy season, during May and June, and in the leaf-off season from November to March/April, in the Zhejiang Province, requiring a suitable approach to quickly evaluate the soil erosion problem [5].
sustainability of agroforestry ecosystems [5].In general, agroforestry has some common characteristics-single tree species and simple stand structure, lack of shrubs and grass under tree canopy, and serious soil erosion problems [3,4].Of the many agroforestry types, hickory (Carya cathayensis) was grown, originally, in mountainous regions with steep slopes and relatively poor soil conditions, but was expanded rapidly in the past three decades in subtropical regions of China, because of its high economic value for local people [6,7].For the sake of intensive management and easy collection of hickory nuts, farmers often remove shrubs and grass, at least twice per year, in spring, before the addition of nutrients, and in early September, before the nut harvest (personal communications with farmers in Zhejiang Province during fieldwork).These hickory plantations have serious soil erosion problems because there is almost no understory, most of the year, as shown in Figure 1.The soil erosion is especially serious in the rainy season, during May and June, and in the leaf-off season from November to March/April, in the Zhejiang Province, requiring a suitable approach to quickly evaluate the soil erosion problem [5].Soil erosion has long been recognized as a serious environmental problem and has gained increasing attention [8][9][10][11][12][13][14].Many models [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32], as summarized in Table S1, have been used for estimating the amount of soil erosion or evaluating the soil erosion risks.Ground covers, soil conditions, topography, climate, and human activities are major factors used in these models.Of the many models in Table S1, the Revised Universal Soil Loss Equation (RUSLE) may be the most extensively used model for soil erosion estimation in different ecosystems [8,[33][34][35][36][37].The RUSLE was initially developed for agricultural ecosystems with gentle slopes [8].When RUSLE is used in a complex landscape, such as hilly or mountainous regions, calibration of these parameters, such as rainfall erosivity factor (R), soil erodibility factor (K), and cover management factor (C-factor) is often needed, thus, different equations for these kinds of calibrations have been proposed [10, [38][39][40].Therefore, direct application of RUSLE in forest or agroforestry ecosystems with steep slopes, is not recommended [5].In reality, sometimes we do not need the quantitative estimation of soil erosion amounts, we just need the erosion risks and their spatial patterns [5].Therefore, new approaches are needed to quickly map soil erosion risk distribution, especially for agroforestry, in a large area, which is valuable for owners to take proper measures to reduce soil erosion problems.
As the requirement to produce spatial distribution of soil erosion increases, remote-sensing and GIS technologies will play important roles in soil erosion estimation models.In previous studies, the C-factor in RUSLE was often calculated from remote-sensing data, such as vegetation indices [41], fractional vegetation and soil images [5,34], or the P-factor, using a land-use map classified from a remote-sensing image [42].GIS, having the capability to provide powerful tools in storage, management, and processing of multiple data sources, plays an increasingly important role in estimation and mapping of soil erosion.For example, precipitation data are required in estimation of soil erosion, and GIS is often used to convert the point (weather station) data to surface images, through interpolation or regression approaches [10,38].Soil attributes are also important factors Soil erosion has long been recognized as a serious environmental problem and has gained increasing attention [8][9][10][11][12][13][14].Many models [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32], as summarized in Table S1, have been used for estimating the amount of soil erosion or evaluating the soil erosion risks.Ground covers, soil conditions, topography, climate, and human activities are major factors used in these models.Of the many models in Table S1, the Revised Universal Soil Loss Equation (RUSLE) may be the most extensively used model for soil erosion estimation in different ecosystems [8,[33][34][35][36][37].The RUSLE was initially developed for agricultural ecosystems with gentle slopes [8].When RUSLE is used in a complex landscape, such as hilly or mountainous regions, calibration of these parameters, such as rainfall erosivity factor (R), soil erodibility factor (K), and cover management factor (C-factor) is often needed, thus, different equations for these kinds of calibrations have been proposed [10,[38][39][40].Therefore, direct application of RUSLE in forest or agroforestry ecosystems with steep slopes, is not recommended [5].In reality, sometimes we do not need the quantitative estimation of soil erosion amounts, we just need the erosion risks and their spatial patterns [5].Therefore, new approaches are needed to quickly map soil erosion risk distribution, especially for agroforestry, in a large area, which is valuable for owners to take proper measures to reduce soil erosion problems.
As the requirement to produce spatial distribution of soil erosion increases, remote-sensing and GIS technologies will play important roles in soil erosion estimation models.In previous studies, the C-factor in RUSLE was often calculated from remote-sensing data, such as vegetation indices [41], fractional vegetation and soil images [5,34], or the P-factor, using a land-use map classified from a remote-sensing image [42].GIS, having the capability to provide powerful tools in storage, management, and processing of multiple data sources, plays an increasingly important role in estimation and mapping of soil erosion.For example, precipitation data are required in estimation of soil erosion, and GIS is often used to convert the point (weather station) data to surface images, through interpolation or regression approaches [10,38].Soil attributes are also important factors required in soil erosion modeling, and GIS converts the point (soil sample) data and soil-type vector format map, to raster format data [38,42].
Since soil erosion is related to many factors, such as precipitation, topography, soil, vegetation cover, and human activities, a large number of variables can be extracted or calculated from these data sources.Thus, one critical step is to identify the key factors that are closely related to soil erosion risks.The traditional approach for selection of key variables is the stepwise regression [5], but the regression-based approach requires the assumption that the independent variables have linear correlations with the dependent variable, and the selected independent variables have no or weak correlations to each other [5].However, this assumption may not be true for soil erosion studies.The random forest (RF) approach has the capability to provide a variable importance ranking [43] and has been extensively used for identifying key variables for land-cover classification and biomass estimation [44,45], but has not yet been used for soil erosion mapping.RF is a nonparametric algorithm based on the Classification and Regression Tree (CART) decision tree [43,46,47].In the RF algorithm, three parameters need to be optimized-ntree, i.e., the number of regression trees (default is five hundred); mtry, i.e., the number of input variables per node (default value is the one-third of the total variables); and the node size (default is one) [48].In general, the determination of ntree and mtry was based on the minimum of root mean squared error [49,50].
The overall goal of this research is to develop an approach suitable for mapping soil erosion risks in an agroforestry ecosystem in a mountainous region.Specifically, the objectives are to, (1) identify the key factors influencing soil erosion risk in hickory plantations; (2) develop a new approach to predict soil erosion risk in a mountainous region; and (3) produce spatial distribution maps of the predicted soil erosion risks for the local government or farmers, to take proper measures to reduce soil erosion problems.The new contribution of this research is to develop a random forest-based approach to predict soil erosion risk distribution in a mountainous region, by the effective use of remotely-sensed and ancillary data.In subtropical regions, many kinds of agroforestry or plantations, such as Torreya grandis, eucalyptus, and the rubber tree play important roles in improving economic conditions for the local farmers but face soil erosion problems.Therefore, the proposed approach for mapping soil erosion risk in hickory plantations will be valuable for other agroforestry and plantations.

Study Area
The study area was located in the Lin'An District, Western Zhejiang Province (Figure 2), which is called the "hometown of hickory" in China [51].This region is undulating with an elevation range of 10 m to 1787 m (average elevation of 475 m) and an average slope of 23 • (see Figure 2c).The terrain has high-elevation mountainous regions in the west and low elevation with flat regions in the east.The average regional elevations are over 1000 m in the northwest and southwest, and less than 50 m in the eastern plain.This study area was scattered with plains and hills.The mountainous areas with elevations of over 500 m accounted for 33.2%, hills with elevation between 100 and 500 m accounted for 57.4%, and plains with elevation of less than 100 m accounted for 10.4% of the study areas.According to China's soil classification system, there are 10 soil types: Red, yellow, purple, limestone, Skel, red clay, mountain meadow, Fluvo-aquic, coastal saline, and Paddy [52].Due to the differences of soil classification systems among different countries, Table 1 provides the corresponding names which are used in the United States [53] and world reference base [54].In our study area, the major soil types are red, yellow, and calcareous (Figure 2d).The climate in this region is subtropical monsoon with four distinct seasons, adequate light, and abundant rainfall.According to our data, collected from sixty-seven weather stations (Figure 2c), the average annual temperature is approximately 15 °C , and annual precipitation is 1500-2000 mm. Figure 3 illustrates the average monthly precipitation values, based on the weather station data between 2002 and 2017, showing the highest monthly precipitation in June and the lowest monthly precipitation in December.The climate in this region is subtropical monsoon with four distinct seasons, adequate light, and abundant rainfall.According to our data, collected from sixty-seven weather stations (Figure 2c), the average annual temperature is approximately 15 • C, and annual precipitation is 1500-2000 mm.Typical forest type is the subtropical evergreen broadleaf forest.Coniferous forest, deciduous forest, and shrubs are also common [55].In this region, the hickory plantation (Figure 2b) plays an important role in the local farmers' economy [6,56].The hickory plantation area increased from 113 km 2 in 1983 to 320 km 2 in 2014, with the production of nuts increasing from 2,300 tons to 25,000 tons, during the same period [7].Due to the intensive and improper management of the hickory plantations, in this region, soil erosion has become a serious problem [5] that has resulted in land degradation and poor water quality.
In order to evaluate the soil erosion problem, Figure 4 illustrates the framework of predicting soil erosion risks using remote-sensing and GIS.The major steps include, (1) collection of different source data, such as soil, weather, digital elevation model (DEM), and remotely-sensed data; (2) extraction of potential variables from different datasets; (3) identification of key variables using the RF approach and developing a soil erosion risk-mapping approach; and (4) examination of the soil erosion risks for the young and mature hickory plantations.Typical forest type is the subtropical evergreen broadleaf forest.Coniferous forest, deciduous forest, and shrubs are also common [55].In this region, the hickory plantation (Figure 2b) plays an important role in the local farmers' economy [6,56].The hickory plantation area increased from 113 km 2 in 1983 to 320 km 2 in 2014, with the production of nuts increasing from 2,300 tons to 25,000 tons, during the same period [7].Due to the intensive and improper management of the hickory plantations, in this region, soil erosion has become a serious problem [5] that has resulted in land degradation and poor water quality.
In order to evaluate the soil erosion problem, Figure 4 illustrates the framework of predicting soil erosion risks using remote-sensing and GIS.The major steps include, (1) collection of different source data, such as soil, weather, digital elevation model (DEM), and remotely-sensed data; (2) extraction of potential variables from different datasets; (3) identification of key variables using the RF approach and developing a soil erosion risk-mapping approach; and (4) examination of the soil erosion risks for the young and mature hickory plantations.Typical forest type is the subtropical evergreen broadleaf forest.Coniferous forest, deciduous forest, and shrubs are also common [55].In this region, the hickory plantation (Figure 2b) plays an important role in the local farmers' economy [6,56].The hickory plantation area increased from 113 km 2 in 1983 to 320 km 2 in 2014, with the production of nuts increasing from 2,300 tons to 25,000 tons, during the same period [7].Due to the intensive and improper management of the hickory plantations, in this region, soil erosion has become a serious problem [5] that has resulted in land degradation and poor water quality.
In order to evaluate the soil erosion problem, Figure 4 illustrates the framework of predicting soil erosion risks using remote-sensing and GIS.The major steps include, (1) collection of different source data, such as soil, weather, digital elevation model (DEM), and remotely-sensed data; (2) extraction of potential variables from different datasets; (3) identification of key variables using the RF approach and developing a soil erosion risk-mapping approach; and (4) examination of the soil erosion risks for the young and mature hickory plantations.

Data Collection and Processing
Datasets used in this research (Table 2) included, (1) field survey data for determining the soil erosion risk levels, (2) Landsat images in the leaf-on and leaf-off seasons, for mapping hickory plantation distribution and for extraction of potential variables for mapping the soil erosion risk distribution, (3) weather station data, (4) soil data, and (5) DEM data.In general, the Universal Transverse Mercator (UTM) coordinate system is used in this type of research, but considering that this study crosses two UTM zones, the Albers Equal Area coordinate system with WGS 1984 datum was selected for this research.

Soil data
Soil data were from the second soil survey in Zhejiang Province; The attributes for each soil polygon included soil type, soil depth, soil structure (sand, clay, silt), organic matter, and others.
DEM data DEM data from ALOS with cell size of 12.5 m were used.

Field Data Collection and Determination of Soil Erosion Risk Levels
The field survey was conducted, based on Google Earth imagery and spatial distribution of hickory plantations that were produced using the Landsat images in our previous studies [51].A global positioning system device (Global Navigation Satellite System, GNSS) was used to obtain longitude/latitude coordinates of the central point, of each sample site.During fieldwork, we recorded the surface soil erosion conditions, soil accumulation conditions in the bottom of the catchment area, slope, ground cover, etc.The slope in degree, for each visited site, was roughly recorded through visual measurement, and the exact value was then determined from the slope image calculated from the DEM data.Based on the Standards for Classification and Gradation of Soil Erosion (SL190-2007) by China's Ministry of Water Resources [57] and our research objectives, three levels of soil erosion risks were defined in Table 3 and the photos corresponding to each level were provided in Figure 5.The soil erosion samples covered thirty-six low-level, fifty-four medium-level, and thirty-two high-level risk areas (see the sites in Figure 2b).Sixty percent (60%) of the samples were randomly selected for use as training samples, and 40% were used for validation.

Remote-Sensing Data Processing and Mapping of Hickory Plantations
Two Landsat 8 OLI images were atmospherically-calibrated with FLAASH (Fast Line-of-sight Atmospheric Analysis of Hypercubes) [58,59].The C-correction approach was used for topographic correction [60], using the DEM data, with 12.5-m spatial resolution.After atmospheric and topographic correction, five spectral bands were used in this research.The blue spectral band was not used because of its serious atmospheric impacts.
Ground cover is an important factor that is directly related to soil erosion risks [61].Vegetation indices are often used to reflect the vegetation cover.Here we used three vegetation indices-the normalized difference vegetation index (NDVI = (NIR − Red)/(NIR + Red)), the normalized difference greenness index (NDGI = (Green − Red)/(Green + Red)), and the simple ratio (SR = NIR/Red) [62].In addition, we developed three fractional images-green vegetation, soil, and shade-from the Landsat multispectral imagery, using linear spectral mixture analysis [63], from which these three endmembers were extracted, using the image-based endmember selection approach [63].The fractional vegetation and fractional soil images have physical meanings, representing the proportions of vegetation and soil in a unit-the cell size of the Landsat image in our study.As hickory is a deciduous tree species, the image differencing of the spectral bands, between the leaf-on and leaf-off

Remote-Sensing Data Processing and Mapping of Hickory Plantations
Two Landsat 8 OLI images were atmospherically-calibrated with FLAASH (Fast Line-of-sight Atmospheric Analysis of Hypercubes) [58,59].The C-correction approach was used for topographic correction [60], using the DEM data, with 12.5-m spatial resolution.After atmospheric and topographic correction, five spectral bands were used in this research.The blue spectral band was not used because of its serious atmospheric impacts.
Ground cover is an important factor that is directly related to soil erosion risks [61].Vegetation indices are often used to reflect the vegetation cover.Here we used three vegetation indices-the normalized difference vegetation index (NDVI = (NIR − Red)/(NIR + Red)), the normalized difference greenness index (NDGI = (Green − Red)/(Green + Red)), and the simple ratio (SR = NIR/Red) [62].In addition, we developed three fractional images-green vegetation, soil, and shade-from the Landsat multispectral imagery, using linear spectral mixture analysis [63], from which these three endmembers were extracted, using the image-based endmember selection approach [63].The fractional vegetation and fractional soil images have physical meanings, representing the proportions of vegetation and soil in a unit-the cell size of the Landsat image in our study.As hickory is a deciduous tree species, the image differencing of the spectral bands, between the leaf-on and leaf-off seasons, can highlight new information about vegetation and soil features and was used in this research.
In order to examine the spatial patterns of soil erosion risks, it is necessary to accurately map agroforest types, especially hickory plantations.In this research, two Landsat OLI images, leaf-on and leaf-off, with 30-m spatial resolution, were used.Different variables, including spectral bands, vegetation indices (e.g., NDVI, NDGI, and SR), textural images using mean, standard deviation, skewness, and entropy [64], with window sizes of 3 × 3, 5 × 5, 7 × 7, 9 × 9, and 11 × 11, based on the first component (PC1) from the principal component analysis (PCA) of the multispectral images, and differences between the leaf-on and leaf-off images, were used for the land-cover classification using the RF classification approach.The use of PC1 is that it concentrates the highest information loads from the Landsat multispectral bands, as shown in Table 4 that the PC1 contained 80.8% in the leaf-on data and 82.8% in the leaf-off data.Figure 6 illustrates the color composites from the leaf-on and leaf-off seasons and the corresponding PC1 images, indicating that the hickory plantations have different spectral features, caused by seasons.
Remote Sens. 2018, 10, x FOR PEER REVIEW 8 of 20 seasons, can highlight new information about vegetation and soil features and was used in this research.
In order to examine the spatial patterns of soil erosion risks, it is necessary to accurately map agroforest types, especially hickory plantations.In this research, two Landsat OLI images, leaf-on and leaf-off, with 30-m spatial resolution, were used.Different variables, including spectral bands, vegetation indices (e.g., NDVI, NDGI, and SR), textural images using mean, standard deviation, skewness, and entropy [64], with window sizes of 3 × 3, 5 × 5, 7 × 7, 9 × 9, and 11 × 11, based on the first component (PC1) from the principal component analysis (PCA) of the multispectral images, and differences between the leaf-on and leaf-off images, were used for the land-cover classification using the RF classification approach.The use of PC1 is that it concentrates the highest information loads from the Landsat multispectral bands, as shown in Table 4 that the PC1 contained 80.8% in the leafon data and 82.8% in the leaf-off data.Figure 6 illustrates the color composites from the leaf-on and leaf-off seasons and the corresponding PC1 images, indicating that the hickory plantations have different spectral features, caused by seasons.A classification system including young hickory, mature hickory, other broadleaf forests, coniferous forests, bamboo forests, shrub, agricultural lands, water, impervious surfaces, and bare lands, was selected.Training samples were selected from the field survey and Google Earth images.The classification result was then grouped into three classes-young hickory, mature hickory, and other land covers (Figure 2b).Since this research only used the classification image for analyzing soil erosion distribution in hickory plantations, a detailed description of the classification procedure is not provided here, but a similar procedure can be found in Cheng et al. [65].
Accuracy assessment is often required for understanding the quality of the classification image.In general, error matrix was developed through the use of validation samples.Overall classification accuracy was calculated from the error matrix to evaluate the overall performance, while the user's and producer's accuracies for specific land-cover types were calculated [66].About forty validation samples for each land-cover class were collected independently from the field survey and the Google Earth imagery.An overall classification accuracy of 89.5% and a kappa coefficient of 0.88 were obtained.Meanwhile, the user's and the producer's accuracies were 95.2% and 88.9%, respectively, for young hickory, and 88.7% and 90.2%, respectively, for mature hickory.

Collection of Precipitation Data and Development of Its Spatial Distribution
The monthly precipitation data between 2002 and 2017 were collected from sixty-seven weather stations in the study area.In order to produce a good-quality precipitation surface image, five interpolation approaches-Kriging, inverse distance weight (IDW), spline, natural neighbor, and Trend [67]-were explored, using 60% of the weather stations (i.e., 40), and their predicted spatial distributions of precipitation were validated using the remaining twenty-seven stations.The validation of these results indicated that the IDW had the smallest relative root mean squared error of 9.56%, compared to the other four approaches (i.e., 10.45% for Kriging, 14.50% for spline, 14.54% for natural neighbor, and 11.59% for Trend).In order to produce the best interpolation result, all weather stations were finally used to produce the monthly precipitation map using the IDW method.This map was resampled to a 30 m by 30 m cell size, to match the Landsat image with a 30 m spatial resolution.As precipitation is directly related to runoff, and is an important factor in soil erosion, the intensity and seasonal variation of precipitation, can affect it.Thus, more variables-annual precipitation, precipitation at leaf-on (April-October) and leaf-off (November-March) seasons, leaf-developing stage (April), flowering stage (May), and fruiting stage (June-October)-were calculated from the average monthly precipitation map.

Collection and Processing of Soil Data
Soil data were collected from the second soil survey in the Zhejiang Province [68].Field survey and analysis of soil samples were two critical steps.During the field survey, 1:50,000-scale topographic maps were used to determine the collection of soil samples.Soil profiles were used to determine soil types, soil depth, and the physical and chemical properties.A soil hydrometer was used to measure the soil structure composition, such as the percentages of sand, silt, and clay.The Chemical Oxygen Demand was used to measure soil organic matter (SOM).The analysis of physical and chemical properties provided the fundamentals for the final determination of soil type classification and mapping [52].The major attributes gleaned from the soil data included type, depth, structure (e.g., the percentages of sand, silt, and clay), and organic matter.These soil attributes, with a vector format (1:50,000), were resampled to a cell size of 30 m by 30 m, according to the maximum area allocation approach [69].

Extraction of Topographic Factors
Topographic factors were calculated from the advanced land observing satellite (ALOS) DEM data, with a cell size of 12.5 m by 12.5 m.The common variables included slope, aspect, and elevation.Other advanced variables were also used; for example, slope of slope (SOS) and the second derivative of surface elevation related to the horizontal surface, representing the terrain curve features [70].According to the slope calculation theory, SOS re-calculates the slope of each point, based on the slope image from the DEM data, or the SOS = arctan (slope difference/horizontal distance).Second, surface roughness, the ratio of surface area at a specific region and its projected area, represented the fragmentation and degree of erosion [70].Based on the slope image, surface roughness = 1/cos (slope).Third, flow accumulation and catchment area, based on DEM and precipitation data, using the ArcGIS spatial analyst and arc hydro extension, was also used [41,71].Using precipitation data can improve the quality of the extracted flow accumulation and catchment area because the initial water value for each pixel was determined from the precipitation map.All the topographic factors, calculated from the DEM data, with a cell size of 12.5 m by 12.5 m, were resampled using the bilinear resampling technique to the cell size of a 30 m by 30 m, the same size as the Landsat imagery.

Determination of Key Variables for Mapping Soil Erosion Risk Distribution and Evaluation of Mapping Results
Many potential factors can influence soil erosion, but not all are needed for soil erosion estimation.The potential variables extracted from different sources (e.g., DEM, soil, weather stations, and remote-sensing data) are summarized in Table 5.The key is to identify the proper variables.The RF approach can provide the order of independent variable importance, related to a dependent variable, and was used here.Based on the seventy-three training samples, the RF provided the order of importance of independent variables.The backward feature elimination was further used to remove the variables that were less important but had a high correlation with another variable, so the minimum number of key variables could be identified.After the optimized parameters were determined, the RF approach, based on these key variables was used to predict soil erosion risk distribution for the entire study area.This prediction map was evaluated using field survey data.Forty-nine validation samples, covering three levels of erosion risks, were used for evaluating the prediction map.As the prediction map is a categorical image (three risk levels), the traditional accuracy assessment approach, based on the error matrix, could be used [66].From this error matrix, overall accuracy was calculated to indicate the overall mapping performance, while the producer's and the user's accuracies were calculated to show the mapping performance of the individual classes, that is, each risk level in this research.

Analysis of Selected Key Variables for Mapping Soil Erosion Risk Distribution
According to the RF results, eight key variables-slope, NDGI at leaf-on season, SOS, SOM, fractional vegetation at leaf-on season, fractional soil at leaf-off season, precipitation in June (the average rainfall between 2002 and 2017), and percent of soil clay-were selected; their spatial distributions are illustrated in Figure 7. Slope and SOS were extracted from the DEM (representing the topographic condition); NDGI, fractional vegetation, and fractional soil were extracted from the Landsat multispectral imagery (representing the conditions of the vegetation cover); SOM and percent of soil clay were extracted from the soil data (representing soil features); and precipitation in June was gleaned from the interpolation of the weather station data.These key variables confirmed the complexity of soil erosion estimation that required consideration of different factors from topography, soil, vegetation cover, and precipitation [13,72].Of the eight key variables, three were directly extracted from the multispectral imagery, implying the important role of remotely-sensed data in soil erosion risk mapping, which have not been effectively used in traditional soil erosion studies.
The roles of these key factors varied, as shown in Table 6.For example, as the risk levels increased from low to high, the values of slope, SOS, and the fractional soil in the leaf-off season, increased, implying that these factors resulted in soil erosion; in contrast, the values of NDGI at the leaf-on season, the SOM, the fractional vegetation at the leaf-on season, and percent of clay decreased, implying their roles in reducing soil erosion.In a relatively small region, precipitations can be similar but are directly related to soil erosion.The average precipitation in June was selected as a key variable, probably because the highest rainfall of the year occurs during that month, resulting in high erosion risk.The importance ranking in Table 6 indicates that the topographic factors (i.e., slope (1st) and SOS (3rd)) play the most important roles, resulting in serious soil erosion problems.While the green vegetation cover (NDGI in the leaf-on season (2nd) and the fractional vegetation in the leaf-on season (5th)), and soil exposure (fractional soil in leaf-off season (6th)), indicate the necessity of maintaining a highly dense vegetation cover and minimizing the soil exposure, to reduce soil erosion risk.The values of each factor, with erosion risk levels (Table 6), may provide some important guidelines for the hickory plantation owners to better manage them.For example, the risk levels increase as slope and SOS values increase, implying that the mountainous regions, with slopes greater than 25 • , should be managed with caution.The decrease of NDGI and vegetation in the leaf-on season and increase of the fractional soil in leaf-off season, associated with higher risk levels, indicate the importance of keeping dense vegetation cover to reduce the soil erosion problems.While the decreases in SOM and clay percentages, from low to high risk levels, imply the role of the soil structure in reducing soil erosion risks.Table 6 implies that planting agroforest plantations in gentle slope regions, while keeping dense vegetation cover and more SOM and clay components in soil, can reduce soil erosion risk.

Analysis of Soil Erosion Risk Mapping Results
The accuracy assessment result (Table 7) indicates that an overall accuracy of 89.8% was obtained, with producer's accuracy of 84.6-95.2% and user's accuracy of 83.3-100%, for three risk levels.The high accuracy indicates that the RF-based approach, using the selected key variables, can successfully predict soil erosion risk distribution in a large area.The spatial patterns of three erosion risk levels in Figure 8 shows different levels of erosion risk dispersed in different regions of the hickory plantations.The high erosion risk level is more prevalent in the southwest part of the study area.
Remote Sens. 2018, 10, x FOR PEER REVIEW 13 of 20 The importance ranking in Table 6 indicates that the topographic factors (i.e., slope (1st) and SOS (3rd)) play the most important roles, resulting in serious soil erosion problems.While the green vegetation cover (NDGI in the leaf-on season (2nd) and the fractional vegetation in the leaf-on season (5th)), and soil exposure (fractional soil in leaf-off season (6th)), indicate the necessity of maintaining a highly dense vegetation cover and minimizing the soil exposure, to reduce soil erosion risk.The values of each factor, with erosion risk levels (Table 6), may provide some important guidelines for the hickory plantation owners to better manage them.For example, the risk levels increase as slope and SOS values increase, implying that the mountainous regions, with slopes greater than 25°, should be managed with caution.The decrease of NDGI and vegetation in the leaf-on season and increase of the fractional soil in leaf-off season, associated with higher risk levels, indicate the importance of keeping dense vegetation cover to reduce the soil erosion problems.While the decreases in SOM and clay percentages, from low to high risk levels, imply the role of the soil structure in reducing soil erosion risks.Table 6 implies that planting agroforest plantations in gentle slope regions, while keeping dense vegetation cover and more SOM and clay components in soil, can reduce soil erosion risk.

Analysis of Soil Erosion Risk Mapping Results
The accuracy assessment result (Table 7) indicates that an overall accuracy of 89.8% was obtained, with producer's accuracy of 84.6-95.2% and user's accuracy of 83.3-100%, for three risk levels.The high accuracy indicates that the RF-based approach, using the selected key variables, can successfully predict soil erosion risk distribution in a large area.The spatial patterns of three erosion risk levels in Figure 8 shows different levels of erosion risk dispersed in different regions of the hickory plantations.The high erosion risk level is more prevalent in the southwest part of the study area.The statistical results in Table 8 indicate that, of the total hickory plantation areas of 325.5 km 2 , about one-fourth of the area is at a high-risk level, requiring their owners to take suitable measures to reduce the soil erosion problem.Although both mature and young hickory plantations have common erosion problems, a larger proportion of young hickory plantations are at high risk.If no remedies are applied in the near future, the high-risk plantations may not be sustainable.

The Roles of Remote-Sensing Data in Soil Erosion Risk Mapping
Remote sensing can accurately reflect land surface information, at the time-point when a satellite or plane passes over.However, remote-sensing data itself cannot directly provide meaningful information for soil erosion, which may be why remote-sensing data has not been used extensively for soil erosion estimation or mapping.One important role of remote-sensing data may be the calculation of the C-factor, because the data can provide vegetation cover information that is closely-related to the C-factor.The C-factor is usually calculated using individual soil loss ratio values and the factors of rainfall and runoff erosivity [8], or a combination of an individual C-factor, from empirical models, and a classified remote-sensing image [73].Some previous research has used the vegetation index [41] or fractional vegetation and soil images [5,34] to calculate the C-factor.However, no accuracy assessment was conducted because no ground truth or reference data were available for evaluation of this calculation result.This research also confirmed the important role of using remote-sensing-derived variables for soil erosion risk mapping.Three variables-NDGI at leaf-on season, fractional vegetation at leaf-on season, and fractional soil at leaf-off season-were identified in this research as key variables for soil erosion risk mapping.These variables can be extracted directly from the Landsat multispectral images, implying that they play an important role in accurately mapping the spatial distribution of soil erosion risks.Even as more types of remote-sensing data with very different spatial resolutions (e.g., from 0.5 m to 1 km) become available, the effective process of extracting the best variables for the soil erosion risk mapping is still poorly understood.This research indicates that for the hickory plantations-a deciduous tree species-use of the leaf-on and leaf-off seasonal images is needed.However, more research is needed to identify proper variables from remotely-sensed data for other tree species, such as evergreen broadleaf or needle-leaf forest types.

The Roles of GIS in Soil Erosion Risk Mapping
GIS has long been regarded as a powerful tool for storage, management, organization, and processing of different kinds of data sources [74].It is especially valuable for soil erosion studies because it requires a combination of different data sources, such as DEM, soil, climate, and remote-sensing-derived products.These data sources often have various formats, qualities, and spatial resolutions that require GIS to handle them.Previous research has extensively used GIS technologies to solve environmental problems [74]; for instance, SOM was mapped using a geographically weighted regression approach, based on the SOM sample plots, the DEM, and remote-sensing data [3].This research also confirmed the value and necessity of using GIS in soil erosion risk mapping.For example, this research used GIS technologies to calculate slope, SOS, surface roughness, and flow accumulation from DEM data, and converted the monthly precipitation from weather stations to surface image, through the IDW interpolation approach.
Compared to remote-sensing data with raster format and high spatial resolution, most GIS data, such as soil data in vector format and precipitation data in point format, from weather stations, have relatively coarse spatial resolutions and poor data quality, and requires conversion of these data into raster format data with the same cell size as the selected remote-sensing data.Some advanced techniques are needed to improve the data quality [3].For example, direct interpolation of the SOM data, based on sample plots, yields a poor SOM distribution result, while the use of a geographically weighted regression approach to combine DEM and remote-sensing data, considerably improved the spatial patterns of the predicted SOM distribution [3].A similar situation occurred in this research with the precipitation image.Currently, we use the IDW to interpolate the monthly precipitation weather station data into a surface image, but the bullseye effect was obvious, as shown in Figure 7g.In the mountainous regions, topography may considerably influence the rainfall amounts and spatial patterns in a large area.It will be necessary to combine the topographic information in the future for predicting the precipitation spatial distribution.
Data quality and spatial resolution are important factors influencing soil erosion mapping performance.In particular, the quality of DEM data was critical for calculation of slope and SOS that were used in this research.This is because the current DEM data from Shuttle Radar Topography Mission (SRTM), Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), or ALOS are more similar to the DSM (digital surface model) than to the DEM, thus, it may affect the accuracy of calculation of the slope and the SOS.In reality, DEM is a difference of the DSM and the object height image.In a mountainous region, the tree height value is much smaller, compared to the elevation value, thus DSM is often used as DEM, by ignoring the tree height.If lidar data could have been obtained in the given study, the lidar-derived DEM data should be used.In addition to the DEM data, another important data source is soil, for which SOM and clay percentages were used, in this research.Obtaining good-quality soil data is especially a challenge.The soil data were often assumed constant, but, in reality, the soil structure and nutrient composition could be changed due to human activities (e.g., addition of fertilizers) and soil erosion.More research is needed to update the soil data in a timely manner, using the GIS-based modeling approach, based on soil sample plots, and remote-sensing and ancillary data [3].

The Approach to Identify Key Variables for Mapping Soil Erosion Risk Distribution
Different soil erosion estimation models (see Table S1) have their own requirements for data sources, thus estimation results could be hugely different.Cautions should be taken when a model is used in mountainous regions because of the very complex landscape, compared to agricultural lands.In many soil erosion applications, the exact soil erosion amount may not be required, and people may just need to know the spatial distribution of different erosion risk levels, so they can take proper measures to reduce the soil erosion problems in a large area.This research used an RF-based approach to identify key variables for mapping soil erosion risk distribution.The selected eight variables are from DEM, soil, precipitation, and remote-sensing data (vegetation cover information), implying that similar data sources are required for soil erosion studies.However, comparing the models in Table S1, the advantages of this RF-based approach for mapping soil erosion risk distribution are, (1) the use of primary variables that can be directly extracted from different data sources without calibration; (2) the important role of the leaf-on and the leaf-off Landsat multispectral images in producing the vegetation-and soil-related variables, with high spatial resolution images, compared to the GIS data (e.g., soil, precipitation); and (3) the potential applications to other agroforest systems, such as eucalyptus, Torreya grandis, and plantations in subtropical regions.However, cautions should be taken that this research is only for hickory plantations, a deciduous tree species; the selected key variables suitable for the hickory plantations may be not the best variables for other plantations.Therefore, more research is needed to identify the best key variables for other forest types, under different topographic conditions.

Conclusions
This research selected a typical agroforestry region in a subtropical forest ecosystem to examine the remote-sensing and GIS-based approach to map soil erosion risk distribution.This research used the RF approach to identify eight primary variables-slope, SOS, NDGI at the leaf-on season, fractional vegetation at the leaf-on season, fractional soil at the leaf-off season, SOM, clay, and average precipitation in June-that could be directly extracted from the DEM, the Landsat multispectral imagery, soil, and the weather station data.The results showed that 25.7%, 41.8%, and 32.5% of the hickory plantations in this study area were at high, medium, and low erosion risk levels, respectively, indicating serious soil erosion problems and requiring urgent measures to maintain their sustainability.This research indicates the important role of remote-sensing data in mapping soil erosion risk distribution and provides a new approach to rapidly predict soil erosion risk distribution, using remote-sensing and GIS technologies.The major advantage of this RF-based soil erosion risk-mapping approach is that the selected variables can be directly extracted from remotely-sensed data, DEM, and soil data, without calibration of the empirical models.The proposed approach may be applied to other agroforestry or plantations, but cautions should be taken in other environments.More research is needed to identify the key variables for other forest types, especially the evergreen forest types, under complex forest ecosystems.

Figure 1 .
Figure 1.Two photos showing the hickory plantations in leaf-on and leaf-off seasons.

Figure 1 .
Figure 1.Two photos showing the hickory plantations in leaf-on and leaf-off seasons.

Figure 2 .
Figure 2. Study area: (a) Location within the Zhejiang Province, China; (b) spatial distribution of hickory plantations with overlay of the field survey sites showing different erosion risks; (c) elevation image with overlay of weather stations; and (d) spatial distribution of the major soil types.

Figure 2 .
Figure 2. Study area: (a) Location within the Zhejiang Province, China; (b) spatial distribution of hickory plantations with overlay of the field survey sites showing different erosion risks; (c) elevation image with overlay of weather stations; and (d) spatial distribution of the major soil types.

Figure 3
illustrates the average monthly precipitation values, based on the weather station data between 2002 and 2017, showing the highest monthly precipitation in June and the lowest monthly precipitation in December.

Figure 3 .
Figure 3. Average monthly precipitation based on the weather station data between 2002 and 2017.

Figure 4 .
Figure 4.The framework for predicting soil erosion risk distribution in a subtropical agroforestry region, using the random forest approach.

Figure 3 .
Figure 3. Average monthly precipitation based on the weather station data between 2002 and 2017.

20 Figure 3 .
Figure 3. Average monthly precipitation based on the weather station data between 2002 and 2017.

Figure 4 .
Figure 4.The framework for predicting soil erosion risk distribution in a subtropical agroforestry region, using the random forest approach.

Figure 4 .
Figure 4.The framework for predicting soil erosion risk distribution in a subtropical agroforestry region, using the random forest approach.

Figure 6 .
Figure 6.Comparison of the color composite, using near-infrared (NIR), red, and green and the first component from the Landsat multispectral bands, using the principal component analysis, based on the leaf-on (a,b) and leaf-off seasons (c,d).

Figure 6 .
Figure 6.Comparison of the color composite, using near-infrared (NIR), red, and green and the first component from the Landsat multispectral bands, using the principal component analysis, based on the leaf-on (a,b) and leaf-off seasons (c,d).

Figure 7 .
Figure 7. Spatial distribution of the selected eight key variables, using the random forest approach.(a) Slope image in degrees calculated from the DEM data; (b) normalized difference greenness index calculated from the Landsat multispectral imagery at the leaf-on season; (c) slope of slope image in degrees calculated from DEM data; (d) soil organic matter (%) from the soil data; (e) fractional vegetation image extracted from the Landsat multispectral image at the leaf-on season; (f) fractional soil image extracted from the Landsat multispectral image at the leaf-off season; (g) precipitation in June (average for 2002-2017); and (h) soil clay content (%)).

Figure 7 .
Figure 7. Spatial distribution of the selected eight key variables, using the random forest approach.(a) Slope image in degrees calculated from the DEM data; (b) normalized difference greenness index calculated from the Landsat multispectral imagery at the leaf-on season; (c) slope of slope image in degrees calculated from DEM data; (d) soil organic matter (%) from the soil data; (e) fractional vegetation image extracted from the Landsat multispectral image at the leaf-on season; (f) fractional soil image extracted from the Landsat multispectral image at the leaf-off season; (g) precipitation in June (average for 2002-2017); and (h) soil clay content (%)).

Figure 8 .
Figure 8. Spatial distribution of different soil erosion risk levels in young and mature hickory plantations.

Figure 8 .
Figure 8. Spatial distribution of different soil erosion risk levels in young and mature hickory plantations.

Table 1 .
Reference conversions among soil group of the genetic soil classification system of China, soil order of the U.S.

Table 1 .
Reference conversions among soil group of the genetic soil classification system of China, soil order of the U.S. taxonomy, and the soil group of the world reference base for soil resources.

Table 2 .
Datasets used in research.

Table 3 .
The definitions of soil erosion risk levels.
some sand/soil accumulation found at bottom of catchment areaHighA and B horizons were almost eroded with many small rocks; parent material (C horizon) appeared; large sand/soil accumulation found at bottom of catchment area

Table 4 .
The eigenvector and eigenvalue results using the principal component analysis of the Landsat multispectral images at the leaf-on and leaf-off seasons.

Table 5 .
The potential variables used in mapping the soil erosion risk distribution.

Table 6 .
The average values of key variables based on different soil erosion risk levels.

Table 7 .
Accuracy assessment results with the RF model, for evaluation of soil erosion risks.

Table 8 .
Statistical results of the hickory plantations, at different soil erosion risk levels.