Prediction of Regional Forest Soil Nutrients Based on Gaofen-1 Remote Sensing Data

: The study on the spatial distribution of forest soil nutrients is important not only as a reference for understanding the factors affecting soil variability, but also for the rational use of soil resources and the establishment of a virtuous cycle of forest ecosystems. The rapid development of remote sensing satellites provides an excellent opportunity to improve the accuracy of forest soil prediction models. This study aimed to explore the utility of the Gaofen-1 (GF-1) satellite in the forest soil mapping model in Luoding City, Yunfu City, Guangdong Province, Southeast China. We used 1000 m resolution coarse-resolution soil map to represent the overall regional soil nutrient status, 12.5 m resolution terrain-hydrology variables to reﬂect the detailed spatial distribution of soil nutrients, and 8 m resolution remote sensing variables to reﬂect the surface vegetation status to build terrain-hydrology artiﬁcial neural network (ANN) models and full variable ANNs, respectively. The prediction objects were alkali-hydro-nitrogen (AN), available phosphorus (AP), available potassium (AK), and organic matter (OM) at ﬁve soil depths (0–20, 20–40, 40–60, 60–80, and 80–100 cm). The results showed that the full-variable ANN accuracy at ﬁve soil depths was better than the terrain-hydrology ANNs, indicating that remote sensing variables reﬂecting vegetation status can improve the prediction of forest soil nutrients. The remote sensing variables had different effectiveness for different soil nutrients and different depths. In upper soil layers (0–20 and 20–40 cm), remote sensing variables were more useful for AN, AP, and OM, and were between 10%–14% ( R 2 ), and less effective for AK at only 8% and 6% ( R 2 ). In deep soil layers (40–60, 60–80, and 80–100 cm), the improvement of all soil nutrient models was not signiﬁcant, between 3 and 6% ( R 2 ). RMSE and ROA ± 5% also decreased with the depth of soil. Remote sensing ANNs (coarse resolution soil maps + remote sensing variables) further demonstrated that the predictive power of remote sensing data decreases with soil depth. Compared to terrain-hydrological variables, remote sensing variables perform better at 0–20 cm, but the predictive power decreased rapidly with depth. In conclusion, the results of the study showed that the integration of remote sensing with coarse-resolution soil maps and terrain-hydrology variables could strongly improve upper forest soil (0–40 cm) nutrients prediction and NDVI, green band, and forest types were the best remote sensing predictors. In addition, the study area is rich in AN and OM, while AP and AK are scarce. Therefore, to improve forest health, attention should be paid to monitoring and managing AN, AP, AK, and OM levels.


Introduction
The spatial distribution of forest soil nutrients is directly related to the growth and health of forests and has an important influence on forest ecosystem restoration and sustainable management [1]. Forest soil nutrients are essential components of forest tree on previous research, the remote sensing images of the Landsat series satellite are the most widely used. However, as the first high-resolution earth observation satellite in China, the Gaofen-1 (GF-1) satellite has an important strategic significance for developing remote sensing technology. The remote sensing images obtained by GF-1 have a shorter revisit period (4d) and higher spatial resolution (8 m/16 m), and it is easier to show detailed surface features and fragmentation features [28]. At present, GF-1 images are mainly used for forest detection, ground object identification and disaster monitoring. However, their application in DSM has not been explored or developed to its full potential.
An approach combining the ANN model with GF-1 remote sensing variables to predict nutrients in a forest site has not been reported. Thus, this study seeks to combine the GF-1 satellite to improve the accuracy of traditional terrain-hydrology soil models. The purpose is to find the most suitable prediction model to map five forest soil depths of AN, AP, AK, and OM. The specific research objectives were to: (1) select the combination of optimal variables for all ANN models for each depth and evaluate model performance and prediction performance outside of the area used for model development; (2) use the selected model to produce depth-specific soil element maps in the study area; and (3) evaluate the effects of the used remote sensing variables and terrain-hydrology variables on depth-specific soil elements.

Study Area
The model building area (22 • 25 -22 • 57 N, 111 • 03 -111 • 52 E) was Luoding City, which is located in the southwestern Yunfu City, Guangdong Province, China, with 1426.9 km 2 of its total 2327.5 km 2 area forested. The extra validation area was Xinxing County (22 • 22 -22 • 5 N, 111 • 57 -112 • 31 E), which is located in the southeastern Yunfu City, with 1008.4 km 2 of its total 1521.7 km 2 area forested, see Figure 1. The two study areas are in the subtropical monsoon zone, with high temperature and ample precipitation in the same period (May-October), which is conducive to plant growth [21]. The two areas are mountainous regions with elevations ranging from 12 to 1320 m. In both study areas, tropical monsoon forest types are dominated by natural secondary evergreen broad leaved forests, coniferous forests, and mixed forests. The main coniferous species are Pinus massoniana, Eucalyptu ssp. and Cunninghamia lanceolate, and the main broad leaved tree species include Chenopodium album, Liquidambar formosana, and Cinnamomum camphora (Guangdong Forestry Survey and Planning Institute, Guangzhou, China, 2014). As economic forests occupy larger and larger areas, human management activities on forests are intensifying, affecting the structure and fertility of forest soils. As a result of local climate, terrain, and vegetation type, the two study areas comprised Typic Kanhapludults (Lateritic Red Earths) and Typic Hapludults (Red Earths). Udults is the common soil type in tropics and subtropics of China, and it is rich in iron and aluminum as a result of desilication. In nature, the content of N, P, K and OM are low because of the strong physical and chemical weathering of soil minerals, the rapid decomposition of OM and the massive leaching of nutrients in the field [20]. Based on this, high-resolution three-dimensional models are effective to analyze and predict the spatial distribution of forest soil nutrients.

Data Sources
In this study, the coarse-resolution soil nutrient maps at 1000 m resolution represented the overall state of forest soil nutrients, the 12.5 m resolution digital elevation model (DEM)-derived terrain-hydrology variables represented the detailed forest soil nutrients change caused by terrain-hydrology conditions, and the 8 m resolution GF-1 satellitederived band reflectance and vegetation indices represented the detailed forest soil nutrient variation caused by vegetation (see Table 1). All variable maps from various sources were interpolated into 10 m resolution raster format using the conventional inverse distance weighted approach. The pixel values of the raster layers (environmental variables) that corresponded to the coordinates of sampling points were extracted to build the model.

Data Sources
In this study, the coarse-resolution soil nutrient maps at 1000 m resolution rep sented the overall state of forest soil nutrients, the 12.5 m resolution digital elevati model (DEM)-derived terrain-hydrology variables represented the detailed forest soil n trients change caused by terrain-hydrology conditions, and the 8 m resolution GF-1 sat lite-derived band reflectance and vegetation indices represented the detailed forest s nutrient variation caused by vegetation (see Table 1). All variable maps from vario sources were interpolated into 10 m resolution raster format using the conventional verse distance weighted approach. The pixel values of the raster layers (environmen variables) that corresponded to the coordinates of sampling points were extracted to bu the model.

Coarse-Resolution Soil Map and Soil Sampling
A coarse-resolution soil nutrient map roughly reflects the spatial distribution of soil nutrients in large-scale areas. In this study, a 0-20 cm coarse-resolution soil nutrient map (http://www.soilinfo.cn/map/index.aspx (accessed on 27 October 2018)) was used, and the basic necessary input variables, including coarse resolution alkali hydro nitrogen (CAN), coarse resolution available phosphorus (CAP), coarse resolution available potassium map (CAK), and coarse resolution organic matter map (COM), were obtained from the Institute of Soil Research, Chinese Academy of Sciences with a scale of 1:1,000,000. Soil vector maps (1:1,000,000 scale) were converted into 1000 m resolution raster maps in ArcGIS 10.2, and then resampled to 10-m resolution. In this study, a coarse resolution soil map represents the soil nutrient average value in the field. The soil nutrient content provided by the coarse-resolution soil nutrient map is the basis for model construction, and both terrain-hydrology and remote sensing variables are used to respond to more details.
The soil profiles (n = 385) for building (260) and validating (125) were collected by the 2015 Forest Soil Survey Project of Guangdong Academy of Forestry Sciences, taking soil type, terrain-hydrology conditions, and vegetation characteristics into consideration. Two sample schemes, including thematic distribution and random distribution, were used to set the sample points. Based on these conditions, the soil samples were small in number but were representative soil points for the area. The distribution of the sample points is shown in Figure 1. The latitude and longitude of all soil points were located by a hand-held GPS (global positioning system) receiver, and the positioning accuracy was 5 m. A 1 m deep soil pit was dug at each sample site. If the profile had no record before 1 m depth, the soil profile was excavated to the parent material horizon. Each soil profile was divided into five depth intervals: 0-20 cm (D1), 20-40 cm (D2), 40-60 cm (D3), 60-80 cm (D4), and 80-100 cm (D5). Sampling stratification was based on the coarse-resolution soil nutrient maps to obtain enough samples within every class of soil elements. Each soil sample was air-dried, ground, passed through 2 mm sieves, and stored in glass bottles for further analysis. The NaOH alkali solution expansion method and NaHCO3 extraction molybdenum blue colorimetric method were used to determine AN concentrations and AP concentrations, respectively. A NH4OAC extraction-flame photometric method was used to measure AK content and a dichromic acid oxidation-external heating method to estimate OM [17,29].

Terrain-Hydrology Variables
Moore, I.D. et al. [30] summarized the significance and physical meaning of various terrain attributes to landscape processes. Building on their work, many authors have used terrain attributes derived from DEM as explanatory variables in predictive soil models [31]. Terrain affects water movement, which involves the transportation and deposition of sediment. In the model building area, nine terrain hydrology variables, including slope, aspect, topographical position index (TPI), potential solar radiation (PSR), depth to water (DTW), sediment delivery ratio (SDR), flow length (FL), flow direction (FD), and soil terrain factor (STF), were derived from DEM images. Detailed information is shown in Table 1. The DEM was obtained from Cartosat-1 (IRS P5) with a 12.5 m resolution of Guangdong Academy of Forestry Sciences and was resampled to 10 m raster using ArcGIS 10.2 software. The spatial analyst extension tools and developed forest hydrology tools of ArcGIS were used to generate terrain and hydrology variables [32]. Terrain-hydrology variables affect the accumulation of forest soil nutrients in the region.

Remote Sensing Variables
In nature, vegetation litter is the main source of forest soil nutrients. Vegetation density and vegetation type determine the amount of litter, and the root condition of vegetation affects its ability to absorb soil nutrients. Remote sensing images provide direct land vegetation information. Therefore, the close relationship between vegetation variables and soil nutrients allows remote sensing variables that reflect the condition of vegetation to be applied to forest DSM work [33]. The clearest 8 m resolution GF-1 multi-spectral remote sensing images covering the model building area were downloaded from the geospatial data cloud website (http://www.gscloud.cn (accessed on 9 May 2016)). Firstly, remote sensing images were pre-processed by ENVI 5.3 software, including radiometric calibration, atmospheric correction, geometric correction, mosaic, cropping, and resampling to the 10 m resolution. In this study, the remote sensing variables included four bands of GF-1, including the blue band (B), green band (G), red band (R), and near infrared band (NIR). These bands were selected because they represented the growth and biomass of vegetation. We also used the band math function of ENVI 5.3 to extract four plant indices: NDVI (normalized difference vegetation index), DVI (difference vegetation index), RVI (ratio vegetation index), and RDVI (renormalized difference vegetation index). Some information on remote sensing variables is shown in Table 1. Vegetation indices are effective indicators for detecting vegetation growth status, vegetation cover, and eliminating some radiation errors. In remote sensing science, NDVI is widely used in forest growth monitoring and yield prediction [34] and is an uncertain graphical sign that can be employed to outline the greenness, relative density, and healthiness of vegetation [35]. DVI is very sensitive to the change in soil background, and it increases rapidly with an increase in vegetation [36]. RVI can better reflect the difference in vegetation coverage and growth status, which is especially suitable for vegetation monitoring with vigorous growth and high coverage [37]. RDVI is commonly used to investigate plants at the growth stage and the greenness present in the vegetation [38].
The hierarchical classification method was used to classify forest types from the processed 8 m resolution images in the eCognition 9.0 software to improve the accuracy of forest classification. The mixed forest, coniferous forest, and broad-leaved forest were automatically classified according to the spectral characteristics, geometric characteristics, texture characteristics, and vegetation indexes of the segmented objects. Samples of 900 forest types were taken from the forest class vector map of Luoding City provided by the Guangdong Academy of Forestry Sciences and randomly divided into two parts, in which 600 samples were used for classification training, and 300 samples were used for accuracy verification. The "Feature Space Optimization" module of eCognition was used to select the features with the best separation. The first step was to distinguish between the vegetation area and the non-vegetation area. Next, we determined the forest land and the cultivated land in the vegetation areas. Finally, we divided the forest land into conifers, broad-leaved forests, and mixed forests. The "Error Matrix based on Sample" in eCognition was used for classification accuracy evaluation. The overall accuracy was 0.81 and KIA (Kappa coefficient) was 0.77. The overall accuracy and KIA reflect the classification accuracy of the whole map. Broad leaved forest, coniferous forest, and mixed forest were the main forest types in the area, but bamboo and other kinds of vegetation were too small for automatic classification. Therefore, these plants were distinguished according to the forest class vector map [39]. Finally, resampling the forest type image to the 10 × 10 m resolution corresponded to other variable raster maps. The forest types and other eight remote sensing variables obtained from GF-1 are shown in Figure 2.
KIA (Kappa coefficient) was 0.77. The overall accuracy and KIA reflect the classification accuracy of the whole map. Broad leaved forest, coniferous forest, and mixed forest were the main forest types in the area, but bamboo and other kinds of vegetation were too small for automatic classification. Therefore, these plants were distinguished according to the forest class vector map [39]. Finally, resampling the forest type image to the 10 × 10 m resolution corresponded to other variable raster maps. The forest types and other eight remote sensing variables obtained from GF-1 are shown in Figure 2.

Artificial Neural Network Model
We applied the ANN model, a machine learning algorithm developed by Zhao et al. [17,40,41], to predict the spatial distribution at 1 m depth of AN, AP, AK, and OM by MATLAB 2016 software. The most common ANN type is a multi-layer perceptron with three layers: input layer, output layer, and hidden layer. The input layer contained independent variables used to make a model prediction (the coarse-resolution nutrient maps, the DEM-generated terrain-hydrology variables, and GF-1-generated remote sensing variables). The output layer contained the prediction dependent variable (the measured data of AN, AP, AK, and OM). The input layer and the output layer were connected by a hidden layer whose number of neurons determined the complexity of the ANN model: too many hidden neurons will cause over-fitting, and too few hidden neurons will lead to poor fit. All links between the input and hidden layers form the input weight matrix, and all links between the hidden and output layers form the output weight matrix [42]. The ANN model was trained using a back-propagation technique that adjusted the

Artificial Neural Network Model
We applied the ANN model, a machine learning algorithm developed by Zhao et al. [17,40,41], to predict the spatial distribution at 1 m depth of AN, AP, AK, and OM by MATLAB 2016 software. The most common ANN type is a multi-layer perceptron with three layers: input layer, output layer, and hidden layer. The input layer contained in-dependent variables used to make a model prediction (the coarse-resolution nutrient maps, the DEM-generated terrain-hydrology variables, and GF-1-generated remote sensing variables). The output layer contained the prediction dependent variable (the measured data of AN, AP, AK, and OM). The input layer and the output layer were connected by a hidden layer whose number of neurons determined the complexity of the ANN model: too many hidden neurons will cause over-fitting, and too few hidden neurons will lead to poor fit. All links between the input and hidden layers form the input weight matrix, and all links between the hidden and output layers form the output weight matrix [42]. The ANN model was trained using a back-propagation technique that adjusted the weight and bias values along a negative gradient descent, using the Levenberg Marquardt algorithm [43] to minimize the mean squared error (MSE) between the network output (predicted value) and the target value (measured value). When MSE is less than 0.01, the training stops. A 10-fold cross-validation was used to evaluate and train the ANNs; the entire model building area data set (260 soil profiles) was divided into 10 equal subsets. The data from 9 subsets were used as calibration data to build the model, and the remaining subsets were used as data for validation. This process was repeated 10 times until all subsets were used as the validation set. Meanwhile, an early stopping method was used to avoid "over-fitting", which uses a training set (80% of the calibration data) to calculate the gradient, update the network weights, and estimate the bias. The testing dataset (20% of the calibration data) was used to monitor the training process to prevent overfitting. If the training MSE decreases but the test MSE increases, the training of the ANN model was stopped assuming that the most appropriate model coefficients had been obtained. According to previous studies [40,41], the number of hidden layer neurons changed from 5 to 40. Finally, 35 neurons in the hidden layer were the best structure in this study.

Screening and Assessing ANN Models
In this study, 260 soil samples of each soil depth were used to build, train, and evaluate the ANN model in the model building area (Luoding City). Xinxing County, which had 125 sample profiles, served as an independent verification area outside the ANN model building area to test the generalization ability of the selected ANN model. Three indexes were adopted to compare the performance of the ANN model accuracy, including root mean square error (RMSE), coefficient of determination (R 2 ), and relative overall accuracy (ROA ± 5%). According to the three model evaluation indexes, the best model appears when the model precision is not obviously improved after other variables are added. Therefore, the best model should have a relatively higher R 2 and ROA ± 5% and lower RMSE. The specific formulas were as follows: where Y i is the predicted value; X i is the measured value; n is number of sample points; Y i is the average value of the model's predicted values; T is the accuracy threshold (e.g., 5 for 5% in this study) determined based on target to fit the model.

Exploratory Data Analysis
In total, the measured soil nutrient data from 260 soil profiles in Luoding City and 125 soil profiles in Xinxing County were used in this study. The statistical results of AN, AP, AK, and OM are shown in Figure 3. The boxplot was drawn using GraphPad Prism 8.0 software. AN content varied from 7.44 to 310.07 mg kg −1 with the most significant variation at D4 from 10.33 to 310.07 mg kg −1 . AP content was from 0.01 to 6.52 mg kg −1 , AK content was from 4.80 to 140.62 mg kg −1 , and OM content was from 0 to 63.20 g kg −1 ; these three nutrient elements had the most significant variation at D1. The mean of the four soil nutrients showed a general decreasing trend in variation from D1-D5. AK and OM means at D3, D4, and D5 were about two times lower than D1, but AN and AP means at each depth did not significantly change. Using ArcGIS software to analyze the spatial autocorrelation of the samples, the results show that the Moran index (I) approaches 0, showing weak autocorrelation.

Optimal Variables Combination of Each Soil Depths
The accuracy results of the Model A are shown in Table 2. For each depth, based on coarse-resolution data, model prediction performance was significantly improved by increasing terrain-hydrology variables. Still, this improvement was not sustainable when the additional variable exceeded a certain degree. Taking AN as an example, the accuracy of the Model A in D1 was significantly improved by gradually adding the terrain-hydrology variable from one to six, decreasing the RMSE values from 1536.42 to 863.56, 823.64, 580.53, 528.38 and 495.68 mg kg −1 , increasing the R 2 values from 0.23 to 0.55, 0.58, 0.71, 0.74 and 0.76%, and increasing the ROA ± 5% values from 55 to 59, 66, 70, 71, and 74%, respectively. However, when further adding the seventh, eighth, and ninth terrain variables, RMSE values increased from 593.21 to 638.11 and 748.31 mg kg −1 , R 2 values decreased from 72 to 69 and 67%, and ROA ± 5% values decreased from 72 to 69 and 68%, respectively. This was not a surprise because the additional variables were based on their order of importance to soil nutrients. The added variables were less important to the soil nutrients. Therefore, the optimal variables combined for the D1 layer of AN were SDR, Slope, Aspect, DTW, TPI, and FD. On the other hand, the additional input variables may have introduced more uncertainty because of their own precision and accuracy, thus limiting the model's prediction performance. Based on this selection scheme, the optimal Model A, Model B, and Model C of all soil layers were generated by this screening method. The slope and aspect appear in all selected optimal Model A in Table 2, indicating that slope and aspect had a huge impact on the distribution of soil nutrients. Tables 3 and 4 show the prediction accuracies for the Model B and Model C fore four soil nutrient contents, respectively. Model B had better predictive performance than the Model A. In Table 3, the NDVI, G, and forest have the highest occurrence frequency, indicating that they had the greatest influence on soil nutrients among the nine remote sensing variables.

Optimal Variables Combination of Each Soil Depths
The accuracy results of the Model A are shown in Table 2. For each depth, based on coarse-resolution data, model prediction performance was significantly improved by increasing terrain-hydrology variables. Still, this improvement was not sustainable when the additional variable exceeded a certain degree. Taking AN as an example, the accuracy of the Model A in D1 was significantly improved by gradually adding the terrain-hydrology variable from one to six, decreasing the RMSE values from 1536.42 to 863.56, 823.64, 580.53, 528.38 and 495.68 mg kg −1 , increasing the R 2 values from 0.23 to 0.55, 0.58, 0.71, 0.74 and 0.76%, and increasing the ROA ± 5% values from 55 to 59, 66, 70, 71, and 74%, respectively. However, when further adding the seventh, eighth, and ninth terrain variables, RMSE values increased from 593.21 to 638.11 and 748.31 mg kg −1 , R 2 values decreased from 72 to 69 and 67%, and ROA ± 5% values decreased from 72 to 69 and 68%, respectively. This was not a surprise because the additional variables were based on their order of importance to soil nutrients. The added variables were less important to the soil nutrients. Therefore, the optimal variables combined for the D1 layer of AN were SDR, Slope, Aspect, DTW, TPI, and FD. On the other hand, the additional input variables may have introduced more uncertainty because of their own precision and accuracy, thus limiting the model's prediction performance. Based on this selection scheme, the optimal Model A, Model B, and Model C of all soil layers were generated by this screening method. The slope and aspect appear in all selected optimal Model A in Table 2, indicating that slope and aspect had a huge impact on the distribution of soil nutrients. Tables 3 and 4 show the prediction accuracies for the Model B and Model C fore four soil nutrient contents, respectively. Model B had better predictive performance than the Model A. In Table 3, the NDVI, G, and forest have the highest occurrence frequency, indicating that they had the greatest influence on soil nutrients among the nine remote sensing variables. OM: organic matter. The optimal variables were selected among all combinations with the same number of DEM-generated terrain variables based on the values of ROA ± 5% of model. For example, the number of combinations with four variables is when the optimal variables were selected among all combinations with the same number of DEM-generated terrain-hydrology variables based on ROA ± 5% of model values. For example, the number of combinations with four variables is C 4 9 = 26. The RMSE unit of AN, AP and AK were mg kg −1 , OM was g kg −1 . TPI, terrain position index; PSR, potential solar radiation; STF, soil terrain factor; SDR, sediment delivery ratio; DTW, depth to water; FL, flow length; FD, flow direction; R 2 , the coefficient of determination; RMSE, root-mean-squared error; ROA, relative overall accuracy. The RMSE unit of AN, AP and AK were mg kg −1 , OM was g kg −1 . TPI, terrain position index; PSR, potential solar radiation; STF, soil terrain factor; SDR, sediment delivery ratio; DTW, depth to water; FL, flow length; FD, flow direction; R 2 , the coefficient of determination; RMSE, root-mean-squared error; ROA, relative overall accuracy. * Number of optimal terrain hydrology variables.   This study mainly compared the performance of the Model A and Model B that are shown in Tables 2 and 3. The full-variable ANNs (Model B) presented the best prediction performance, exhibited the lowest RMSE, the highest R 2 and ROA ± 5%. This was expected because when the terrain-hydrology ANNs (Model A) accepts more helpful information, the prediction accuracy will naturally improve. Moreover, compared with the Model A, the Model B showed that the improved scales of surface soil depths (D1 and D2) were more significant than deep soil depths (D3, D4, and D5). In the Model B, the R 2 of AN at D1 and D2 layers increased 13 and 12%, while the D3, D4, and D5 layers only increased 6, 5, and 5%, respectively. The R 2 of AP for D1 and D2 increased 13 and 12%, but D3, D4, and D5 only increased 5, 5, and 6%, respectively. The R 2 of AK for D1 and D2 increased 8 and 6%, and D3, D4, and D5 increased 4, 5, and 5%, respectively. The R 2 of OM for D1 and D2 increased 14 and 10%, and D3, D4, and D5 only increased 5, 4, and 3%, respectively. For the other two accuracy evaluation indexes, RMSE and ROA ± 5% also showed a similar trend. This revealed that remote sensing variables were limited in improving the accuracies of AN, AP, AK, and OM. In addition, remote sensing variables had different performances for different soil nutrients and different depths. The more effective for AN, AP, and OM at 0-40 cm were between 10-14% (R 2 ), the less helpful for AK at 0-40 cm were only 8 and 6% (R 2 ), and the improvement of all deep soil layers (40-100 cm) was not significant. The results of remote sensing ANNs (Model C) further support this conclusion that the model prediction accuracies decrease rapidly with depth and have poor predictive power for AK.

Performance of ANN Model Outside of the Model-Building Area
In an attempt to truly test the capability of generalization, the optimal models for D1-D5 built with the 260 soil profiles from Luoding (model building area) were used to estimate AN, AP, AK, and OM content in Xinxing (the extra validating area). The accuracies of the extra validating area (125 soil profiles) are shown in Table 5. The five soil layers of each soil nutrient index's prediction ability decreased in the extra validation area. Compared with the building accuracy, the RMSE of extra validation accuracy rose by 65.77-105.08 mg kg −1 for AN, 0.08-0.15 mg kg −1 for AP, 35.50-57.50 mg kg −1 for AK, and 1.02-4.35 g kg −1 for OM; the R 2 decreased by 25-34% for AN, 27-39% for AP, 32-44% for AK, and 23-42% for OM; and the ROA ± 5% declined by 26-40% for AN, 20-39% for AP, 24-44% for AK, and 24-38% for OM. The prediction ability declined to a certain extent, indicating that the early stopping technology played a limited role in preventing the over-fitting of the model training data set. However, the three evaluation indexes still showed a good prediction capability, close to or better than that of others [13,44,45]. The complexity of the network architecture was adequate because the extra validation showed that the model's ability developed to generalize predictions. The optimal ANN model screened from the model building area can be applied in similar areas.

Spatial Prediction of Soil Nutrients
Our model results showed that the Model B performs best, so it was selected to predict the spatial distribution of topsoil (D1) AN, AP, AK, and OM contents with a resolution of 10 × 10 m grid (Figure 4). Table 6 also summarizes the prediction mean and standard deviation for AN, AP, AK, and OM contents at different soil depths. Mean values of soil nutrients decreased with the depth of soil. The average values for AN content varied from 133.91 to 159.93 mg·kg −1 with soil depths, AP content ranged from 0.53 to 0.96 mg·kg −1 , AK content changed from 37.78 to 49.94 mg·kg −1 , and OM content ranged from 10.05 to 23.17 g·kg −1 . The average content of each soil layer showed little difference, indicating that the soil nutrient surface aggregation was not significant in the model building area. Moreover, as a whole, the standard deviation of the predicted value was lower than that of the measured value, indicating that the predicted result was more stable. For a real analysis of the stability of predictions, we calculate the standard errors (SE) for the 10 maps produced by the 10 best ANN models obtained from the 10-fold cross-validation. The low prediction SE map of D1 for AN in Figure 4e further demonstrates the good performance of the constructed ANN. Figure 4f-i, show a partial enlargement of Figure 4a-d, showing detailed information and covering the boundary of the coarse-resolution maps. The boundaries of coarse-resolution maps were faintly visible in the generated maps, indicating that the coarse-resolution maps had a small impact on the generated map.

Assessment of Prediction Models
Soil nutrients affect the healthy growth of trees and the stability of ecosystems and are an important indicator for evaluating soil quality. The integration of remote sensing variables in forest soil nutrient prediction is the trend in this field. In this study, when nine GF-1 derived remote sensing variables were added to the terrain-hydrology ANNs, there was a good improvement in the model prediction accuracy, which was consistent with the reports of Wang et al. [46] and Zhou et al. [26]. They believed that the total variable model was more successful than the terrain-hydrology model in predicting soil properties.
We conclude that these variables had good performance in mapping soil nutrient content, the nonlinear modality of responses, and the complex interaction between input variables. Compared with similar studies, our model has superior predictive performance. Wang et al. [18] found that the RF model, combined with all variables (including topography, climate, and remote sensing images), had the best prediction performance (R 2 = 0.71) for soil nitrogen. Odebiri et al. [27] used Landsat 8 to predict SOC and the ANN model accuracy was 0.77 (R 2 ) in commercial forests. Most of the studies used Landsat series satellite images with 30 m resolution to produce soil maps [31,42,47,48]. We used 8 m resolution remote sensing images to produce higher resolution forest soil nutrient maps. Highly accurate spatial distribution models of forest soil nutrients facilitate forestry management decisions.
By comparing Tables 2 and 3, we found that the scale of uplift was more significant in the topsoil (0-40 cm) and lower in the deep soil (40-100 cm). This was consistent with the results of Kempen et al. [49], Minasny et al. [12], and Liu et al. [50], who all reported that the performance of three-dimensional mapping methods decreased with depth. The results of remote sensing ANNs (model C) further demonstrated that the prediction accuracies of remote sensing variables decreased rapidly with soil depth. This may be due to the reduced mapping ability of auxiliary environmental variables. Most of the ancillary data used could not effectively capture the soil nutrient variation in these layers (40-100 cm). Their uncertainty increased with depth, increasing the model's prediction uncertainty in deep soil depths. In the deep soil layers (D3, D4, and D5), the remote sensing variables made little difference in the prediction of AK, AN, AP, and OM. In the surface soil layers (D1 and D2), there was a poor prediction performance for AK, and the model accuracy only improved by 8 and 6% (R 2 ), respectively. It may be that the selected remote sensing variables were less sensitive to AK. However, the prediction accuracies of terrain-hydrology (Model A) had no significant difference at different soil depths, which is consistent with the findings of Zhao et al. and Ding et al. [17,21]. The predictive power of DEM-derived terrain-hydrology data is similar for different soil layers.

Effect of Remote Sensing Data on Predicting Soil Nutrients
Given the close relationship between soil and vegetation conditions, many studies have used remote sensing variables that reflect vegetation conditions as auxiliary variables to predict soil nutrients Vegetation is the main source of soil nutrients, as it controls the amounts of AN, AP, AK, and OM entering the soil [51]. Its effects on soil biophysical processes and, in turn, the distribution of plant communities, are affected by soil nutrients [22,26]. Different vegetation statuses have different reflection capabilities in the visible light range. They have been recorded by the sensor forming different spectral curves, showing different DN values in remote sensing images. Analyzing and interpreting of vegetation conditions recorded on remote sensing images can reflect soil nutrients in a certain sense. Kim and Grunwald [52] demonstrated that remote sensing variables closely related to vegetation could represent the spatial variability and quantity of SOC. The results from model C showed that the remote sensing variables performances close to or better than the terrain-hydrology variables in the topsoil layer (D1). Judging from the optimal Model B in Table 3, the most critical remote sensing band for predicting soil nutrients was the G band in Table 3. Yang et al. [48] believed that the G band was an influential variable to characterize vegetation density and biomass, so it was an important predictor of soil nutrients [53]. To measure absorption characteristics, some scholars focus on vegetation indices that are more beneficial than individual bands because they minimize the effects of interfering external factors such as sun, viewpoint, and lighting conditions. The use of vegetation indices can help us to understand some plant conditions, including vegetation land cover, biomass, crop production, and plant health [47]. In Table 3, NDVI showed the most robust predictive ability in predicting AN and AK. Based on AN (D1) in the Model A, by adding only NDVI the RMSE was reduced 48.01 mg kg −1 , R 2 and ROA ± 5% increased 3 and 10%, respectively. The finding indicated that NDVI could significantly improve the accuracy of AN in topsoil (D1). That was consistent with previous studies. Dematte et al. [54] believed that NDVI plays a vital role in describing soil nitrogen index spatial patterns. In this study, forest types were the most important predictor for OM. The primary source of soil nutrients is vegetation litter. The denser the vegetation, the more litter there is [55]. However, the characteristics of plant litter are also important reasons. The decomposition rate of coniferous species leaf litter is slow, which leads to low nutrient content, but the leaf litter of broad-leaved trees decomposes easily, so the nutrient content is higher [56]. Furthermore, some deciduous broad-leaved forests have seasonal defoliation, leading to increased nutrient content in the soil. In Guangdong Province, Liu et al. [57] found a significant relationship between forest species and OM content, which is consistent with this study that forest type is the most important predictor of OM. Different forest types have different litter properties and climatic conditions, which in turn lead to spatial differences in soil nutrients. Based on our current study results, G, NDVI, and forest types were the best remote sensing predictors in soil mapping. These indicators were most sensitive to alteration in vegetation cover, vegetation pigment contents, and leaf water. They have been successfully used to evaluate vegetation cover status, predict soil nutrients and soil salinity, etc. [58,59]. Therefore, spectral reflection and vegetation indices can be introduced as indirect indicators to understand the status of soil nutrients.

Effect of Terrain-Hydrology Data on Predicting Soil Nutrients
Pouladi et al. [60] believed that when the sampling points were dense enough, there was no need to introduce other environmental variables into the model. Our research showed that terrain variables played a key role in predicting soil nutrients, and terrain variables alone could explain soil nutrients 66-79% (R 2 ) spatial distribution, as seen in Table 3. Moreover, from the perspective of the improved scale of the Model B, the remote sensing variables had a better prediction effect at the topsoil (D1 and D2) than deep soil depths (D3, D4, and D5) and model C also showed that remote sensing variables performed poorly in deep soil. Therefore, terrain variables are necessary to predict the spatial distribution of deep soil nutrients, especially in forest areas with large undulating terrain. According to Mosleh et al. [61], terrain attributes are very useful auxiliary variables to predict soil properties in the areas where other soil-forming factors are almost homogeneous. As one of the five soil-forming factors, topography can affect water temperature conditions and the distribution of soil-forming materials. According to the optimal combination of variables screened by the Model A, the representative topographic and hydrological variable for predicting AN content was SDR, which directly explained 55% (ROA ± 5%) of the total variation. SDR reflected sediment transport efficiency in the watershed, where the sediment is severely washed and the AN content is low. PSR, the total amount of solar radiation, was the most important terrain variable for AP accumulation. It mainly affected the growth of vegetation and then affected the source and decomposition rate of soil nutrients. The slope was a strong determinant of AK and OM content. The larger the slope, the weaker the human influence, which provided a favorable environment for AK and OM aggregation. These terrain-hydrology variables determined the distribution of light, heat, and water, affecting the distribution of land vegetation types and the migration and transformation of soil nutrients [62]. Studies have also proved the importance of terrain variables. Li et al. [63] explored the role of Sentinel series satellites in predicting soil nutrients and believed that terrain-hydrology variables were irreplaceable.

Spatial Distribution of Soil Properties
Regarding the vertical distribution of AN, AP, AK, and OM, with the increase of soil depth their content decreased gradually, which is consistent with the research results of Deng et al. [64] on the subtropical Guangxi mountainous red soil region. The main reason is that the vegetation litter, soil animals and microorganisms are mainly in the surface layer of the soil, and the surface soil ventilation, moisture, and heat conditions are better than the deep soil, which increases the accumulation of AN, AP, AK, and SOM in the surface layer. Regarding the horizontal pattern, the spatial distribution of the four soil nutrients were similar to some remote sensing raster maps (e.g., NDVI map, Green map, and forest type map). The higher soil nutrient content corresponded to higher NDVI and green values, mainly in the broad-leaved forest area and mixed forest. In a way, high-resolution remote sensing data reflect detailed vegetation conditions, which in turn influence soil nutrient accumulation [65]. The accumulation of soil nutrients is mainly affected by higher plant litter, soil animals, and microorganisms. The litter of broad-leaved forest decomposes more easily than that of the coniferous forest. In addition, Pinus massoniana, Cunninghamia lanceolata and Eucalyptus spp. are the main coniferous forest species in Luoding City, which are more affected by human forestry activities and are not conducive to the accumulation of soil nutrients. From the prediction results, the overall forest soil in Luoding is very rich in AN, deficient in AK and OM, and extremely deficient in AP. This may be the result of forestry managers focusing on applying nitrogen fertilizer and not applying enough organic fertilizer. In addition, the study area is a typical mountainous area with steep slopes and much rainfall. Soil nutrients are easily lost due to rainfall scouring and transport, which is not conducive to accumulation [20]. Phosphate and potassium existing in soil were affected by the soil organic matter, so we suggest increasing the quantities of phosphate and potassium and applying organic fertilizer. However, because of the diverse spatial distribution of soil nutrients, the fertilization patterns for different types of forests should be adapted to local conditions. Moreover, variations in soil nutrients should be caught in time to improve the efficiency of fertilization. Soil nutrients are the main factor affecting forestry productivity, so we recommend that the rate of fertilization should be observed in subsequent forest management.

Uncertainty and Insufficiency in Current Research
In this study, despite the success with remote sensing variables, some aspects are worthy of special attention. First, there might be sampling and experimental errors in data collection and laboratory analysis. Second, affected by the terrain and clouds, high altitude areas often produced shadows in the image segmentation process that led to large reflectivity errors in satellite image data [66]. Moreover, the variation in land surface characteristics over time and the acquisition time of remote sensing variables also affected the accuracy. Thirdly, we used only nine variables, including four wavebands, four vegetation indices and forest types, and may have omitted better remote sensing variables for modelling work. In the future we should include more vegetation variables that can reflect vegetation growth conditions and soil background. Relevant research has shown that the texture features of images can effectively improve the image classification effect and the model's accuracy [67,68]. Therefore, more remote sensing variables, such as texture features of multi-spectral bands and panchromatic bands, and more variables such as terrain and climate, should be introduced to test and analyze their applicability in subsequent studies. Furthermore, future testing and analysis models can be undertaken with new remote sensing data sources, classification methods, and modelling methods, etc., to improve the estimation system model, providing strong support for subsequent soil nutrient estimation.

Conclusions
Study results showed that full variables ANNs (Model B) are best overall at predicting soil nutrients in this region. The GF-1 remote sensing satellite can be applied to soil mapping. However, spatial prediction performed better for the topsoil layers (0-40 cm) than deep layers (40-100 cm), which showed that the subsoil prediction needs to be improved using other more effective variables. The remote sensing factors had different performances for different soil nutrients at 0-40 cm-more useful for AN, AP, and OM, and less helpful for AK. Herein, NDVI, G, and forest type were the most useful auxiliary remote sensing data to map soil nutrients. Overall, fine-resolution GF-1 remote sensing images are useful for many soil and environmental scientists and land managers in China. Therefore, it is worth exploring the use of more GF-1 remote sensing satellite data for soil mapping to promote the further development of forestry and agriculture.