Prediction of Soil Nutrients Based on Topographic Factors and Remote Sensing Index in a Coal Mining Area, China

: (1) Background: Coal mining operations caused severe land subsidence and altered the distributions of soil nutrients that influenced by multiple environmental factors at different scales. However, the prediction performances for soil nutrients based on their scale ‐ specific relationships with influencing factors remains undefined in the coal mining area. The objective of this study was to establish prediction models of soil nutrients based on their scale ‐ specific relationships with influencing factors in a coal mining area. (2) Methods: Soil samples were collected based on a 1 × 1 km regular grid, and contents of soil organic matter, soil available nitrogen, soil available phosphorus, and soil available potassium were measured. The scale components of soil nutrients and the influencing factors collected from remote sensing and topographic factors were decomposed by two ‐ dimensional empirical mode decomposition (2D ‐ EMD), and the predictions for soil nutrients were established using the methods of multiple linear stepwise regression or partial least squares regression based on original samples (MLSR Ori or PLSR Ori ), partial least squares regression based on bi ‐ dimensional intrinsic mode function (PLSR BIMF ), and the combined method of 2D ‐ EMD, PLSR, and MLSR (2D ‐ EMD PM ). (3) Results: The correlation types and correlation coefficients between soil nutrients and influencing factors were scale ‐ dependent. The variances of soil nutrients at smaller scale were stochastic and non ‐ significantly correlated with influencing factors, while their variances at the larger scales were stable. The prediction performances in the coal mining area were better than those in the non ‐ coal mining area, and 2D ‐ EMD PM had the most stable performance. (4) Conclusions: The scale ‐ dependent predictions can be used for soil nutrients in the coal mining areas.


Introduction
Soil nutrients, including soil organic matter (SOM), soil available nitrogen (SAN), soil available phosphorus (SAP), and soil available potassium (SAK) are essential for plant growth, and thus, their spatial distribution is important for agricultural management. Traditionally, soil nutrient measurement is based on laboratory analysis of field soil samples, and the discrete sampling points may not provide continuous distributions of soil nutrients.
A number of methods, including geostatistical models [1], inverse distance weighted [2], trend surface analysis [3], and the like, resolve the spatial distribution of soil properties based on geospatial autocorrelation [4,5]. However, these methods are limited by sampling density, and the methods either fail to consider the environmental effects on soil nutrients or consider the environmental effects only at the sampling scale, causing their heterogeneous distribution at different scales with different intensities [6,7]. The qualitative and quantitative information of vegetation growth or land surface temperatures can be obtained from the electromagnetic radiation of remote sensing (RS), which indirectly demonstrates the distribution of soil nutrients [8][9][10][11]. Schillaci et. al. indicated that the application of an RS index could enhance the predicting performance of soil properties [12]. Meanwhile, topographic factors affect the transposition and redistribution of soil nutrient contents [13], and their effects on soil nutrients vary at different scales [14,15]. A number of prediction methods for soil properties based on their relationships with topographic factors are proposed [16], including the ensemble learning model [17], extended models [18], regression model [19], and so on. Nevertheless, these methods do not consider their multiscale relationships between soil nutrients and the covariates. Additionally, scale-specific correlations between soil properties and environmental factors are explored using the geostatistical approach [20], multivariate empirical mode decomposition [21,22], wavelet transform [14], and so on. However, these methods either failed to realize the prediction of soil properties or did not establish the validation model.
For the coal mining area, effects of human activities on the spatial distribution of soil nutrients should not be ignored, and their effects on the relationships between soil nutrients and environmental factors should also be considered [23]. Thus, the traditional methods are not suitable for describing the spatial distribution characteristics of soil nutrients in the coal mining areas. Therefore, the objective of this study was to establish the validation model of soil nutrients based on their multiscale relationships with RS index and topographic factors, and to assess the prediction accuracy of soil nutrients based on different methods in the coal mining and non-coal mining areas.

Study Area and Soil Sampling
The study area is the Changhe watershed located in the southeast of the Chinese Loess Plateau. It is bounded by latitude 35°30′14″-35°38′04″ N and longitude 112°27′39″-112°46′13″ E with a total area of about 116 km 2 ( Figure 1). The dominant soil type in the study area is calcaric cambisols [24], and the major crops in the arable land are maize, potato, and wheat. The Changhe river runs through the area from northeast to southwest and divides the study area into the eastern and western parts. In the western part, there are many coal mines containing hard coal. Most lands here were seriously damaged by coal mining activities, and some of them have been reclaimed. The range of elevation is from 718 to 1163 m in the coal mining area, while it is from 711 to 1040 m in the non-coal mining area, and the average elevations are 851 and 857 m for coal mining and non-coal mining areas, respectively.
A total of 120 sampling points was systematically established using the 1 × 1 km regular grid (Figure 1b), and soil samples were collected during 15-31 July 2015. At each point, three soil samples within a 5 m radius were collected using a soil auger and mixed together for one sample at a top layer of 0-20 cm according to the ploughing depth in the cultivated land of the study area. The samples were air-dried, gently crushed, and passed through a 2 mm sieve for soil nutrient determination. The SOM was measured by the dichromate oxidation method [25], SAN was determined by the diffusion method [26], SAP was obtained by the Olsen extraction method using alkaline sodium bicarbonate as the extractant in a 20:1 ratio [27], and SAK was extracted by a flame photometer [28]. Contents of sand (0.050-2.000 mm), silt (0.002-0.050 mm), and clay (<0.002 mm) were determined by the pipette method [29].

Collection of Environmental Factors
The products of moderate resolution imaging spectroradiometer (MODIS), including gross primary production (GPP, MOD17A2), net primary production (NPP, MOD17A3), land surface temperature for daytime and night (DLST and NLST, respectively, MOD11A2), evapotranspiration (ET, MOD16), and normalized difference vegetation index (NDVI, MOD13Q1) with 1 km resolution (the same with soil sampling scale of 1 km) in 2015 was downloaded (https://ladsweb.modaps.eosdis.nasa.gov/). The DLST, NLST, ET, and NDVI were used for extracting RS covariates in ENVI 5.3, including the intra-annual variance of the land surface temperature for daytime (DLSTV), the intra-annual variance of the land surface temperature for night (NLSTV), the annual mean evapotranspiration (ETM), the intra-annual standard deviation of evapotranspiration (ETD), the intra-annual variance of evapotranspiration (ETV), the annual mean NDVI (NDVIM), the inter-annual standard deviation of NDVI (NDVID), and the inter-annual variance of NDVI (NDVIV). The index of GPP, NPP, NDVIM, NDVID, and NDVIV indicated vegetation information and was selected because vegetation was the key source of the SOM. The index of DLSTV, NLSTV, ETM, ETD, and ETV indicated soil temperature and moisture and was selected because they were associated with the decomposition rates of SOM. A digital elevation model (DEM) with 30 m spatial resolution (http://gdem.ersdac.jspacesystems.or.jp/) was downloaded. The DEM was used for extracting topographic factors, including elevation, aspect, slope, convergence index (CI), channel network base level (CNBL), flow accumulation (FA), LS factor (LS), topographic wetness index (TWI), valley depth (VD), and vertical distance to channel network (VDCN) in the software of SAGA ver. 6.4 (Department of Physical Geography, University of Göttingen, Germany) [30] and was resampled into the same resolution with an RS of 1 km in ArcGIS ver. 10.5 (ESRI Inc., Redlands, USA). The factors indicated topographic conditions, which were related to microclimate and soil erosion and thus affected the accumulation, decomposition, and flow of soil nutrients [12].

Two-Dimensional Empirical Mode Decomposition (2D-EMD)
Two-dimensional empirical mode decomposition has been used to separate the overall variation of any spatial or temporal dataset into different scale components called bi-dimensional intrinsic mode function (BIMFs) [31]. Unlike EMD finding the overall extrema, 2D-EMD determines the local extrema (maxima and minima) of the spatial dataset. The minima and maxima points of each location have to be interpolated, and the mean values of the interpolated minima and maxima points have to be calculated. The 2D-EMD for two-dimensional spatial dataset is defined as Equation (1).
where z(x,y) was the original two dimensional spatial dataset, , is the BIMF, and , is the residue. The main process of 2D-EMD can be seen in Figure 2. More detailed description of 2D-EMD can be found in other studies [32,33].

Data Analysis
Soil nutrients and the selected environmental factors were decomposed into three BIMFs and their corresponding residue in the "spemd" package in R ver. 3.6 (The University of Auckland, Oakland, New Zealand) [34] and guaranteed that each scale component contained at least 5% variance of the original data. However, the environmental factors of DLSTV, NLSTV, NDVIM, NDVID, and slope could be decomposed into only one BIMF and its residue, while the environmental factors of ETD and elevation were only decomposed into two BIMFs and its residue.
The percentage variance contributions of each BIMF and residue were obtained through the ratio of each BIMF variance to the variance of original data as Equation (2).
The Pearson's correlation coefficients between soil nutrients and environmental factors at the sampling scale or the scale components (BIMFs and residue) were calculated. The analysis of variance (ANOVA) was used to analyze soil nutrients between coal mining and non-coal mining areas. When the soil nutrients (SOM and SAN) had significant difference, different prediction models were established for the coal mining area and non-coal mining area, respectively. However, one prediction model was established for the entire area when soil nutrients had no significant difference. Regardless of one model or two different models for the coal mining and noncoal mining areas, 75% of sampling points were selected randomly as calibration samples, and 25% of sampling points were selected as validation samples. There were four different methods of prediction, including prediction from environmental factors at the sampling scale using multiple linear stepwise regression (MLSROri), from environmental factors at the sampling scale using partial least squares regression (PLSROri), from environmental factors at the scale component using PLSR (PLSRBIMF), and the combined method of 2D-EMD, PLSR, and MLSR (2D-EMDPM). For the method 2D-EMDPM, specifically, soil nutrients and the environmental factors were decomposed into BIMFs and residue using 2D-EMD; soil nutrients at each BIMF (or residue) were predicted from the environmental factors at the corresponding BIMF (or residue) using PLSR; soil nutrients at the sampling scale were estimated from the predicted soil nutrients at the BIMFs and residue using MLSR. The prediction performances were evaluated based on three parameters of coefficient of determination (R2), root mean square error (RMSE), and the ratio of performance to deviation (RPD). The R2, RMSE, and RPD are defined in Equations (3), (4), and (5), respectively. The optimal number of latent variables (LV) for PLSR models were determined at the minimum value of root mean squared error of prediction (RMSEP) in the calibration model.

1
(3) where n is the number of samples; is the mean of the measured values; and and are the measured and predicted values of soil nutrients in the ith sample, respectively.

Soil Properties at the Sampling Scale
The basic statistics of soil nutrients and soil texture are shown in Table 1. The mean soil nutrients in the coal mining area were 17.42 g/kg, 32.44 mg/kg, 7.95 mg/kg, and 152.50 mg/kg for SOM, SAN, SAP, and SAK, respectively, which were lower than those in the non-coal mining area (19.88 g/kg, 39.40 mg/kg, 9.21 mg/kg, and 163.14 mg/kg for SOM, SAN, SAP, and SAK, respectively). The coefficients of variance (CVs) in the coal mining area were 44.73%, 37.12%, and 26.73% for SOM, SAN, and SAK, respectively, greater than those (28.09%, 30.26%, and 20.74% for SOM, SAN, and SAK, respectively) in the non-coal mining area. The greater CVs in the coal mining area might be attributed to mining practices in the area. The ANOVA analysis indicated that the spatial distribution of SOM and SAN between coal mining area and non-coal mining area were different. This data indicated that the predicting models for SOM and SAN should be established in the coal mining and non-coal mining areas separately, while the predicting models for SAP and SAK could be constructed in the coal mining and non-coal mining areas together. Obviously, the mean sand in coal mining area (0.23) was higher than that in the non-coal mining area (0.19); the mean silt in the coal mining area (0.48) was lower than that in the non-coal mining area (0.52); the mean clay in the coal mining and non-coal mining areas were similar. The spatial differentiation between coal mining area and non-coal mining area were significant for sand and silt, while it was non-significant for clay. The relationships between soil nutrients and soil texture under the coal mining area, non-coal mining area, and the entire watershed are shown in Table 2. Despite the non-significant correlation between SOM and soil texture, SOM was positively related to silt in the coal mining area and positively correlated with clay in the non-coal mining area. Although silt had a non-significant relationship with SAN in the non-coal mining area, silt had a positive effect on SAN in the coal mining area and the entire area. The relationships between clay and SAP in the non-coal mining area and the entire area were significant. Notably, sand had significantly negative effects on SAK in the coal mining area, non-coal mining area, and the entire study area.

Scale-Specific Relationships of Soil Nutrients with Environmental Factors
The scale components for each soil nutrient were decomposed into three BIMFs and residue by 2D-EMD. The soil nutrients at the sampling scale and at these scale components are shown in Figure  3. The spatial distributions of SOM and SAN at the sampling scale were identical, while the spatial pattern of SAP and SAK at the residue were similar. Moreover, the distinction between coal mining and non-coal mining areas for SAN were obvious. The percentages of variance explained by each scale component for soil nutrients are shown in Table 3. Obviously, the percentage of variance explained by BIMF1 were greater than half of the total variance, indicating the significantly stochastic effects on soil nutrients, such as coal mining operations. Notably, the percentages of variance explained by BIMF1 for SOM and SAN in the coal mining area were slightly lower than those in the non-coal mining areas, which might be attributed to the less cultivated activities because of the severe subsidence resulting from coal-mining operations. Thus, the effects of cultivated activities on soil nutrients were remarkable in the study area, especially for SAP. Conversely, the structured variance for SOM and SAN, which stand for the sum of BIMF2, BIMF3, and residue, were also greater in the coal mining area than those in the noncoal mining area.  Figure 4. In the coal mining area, GPP at BIMF2 and BIMF3 was negatively correlated with SOM at some scales, while GPP at residue was positively correlated with SOM at some scales. NPP at residue negatively correlated with SOM at some scales. The land surface temperatures for daytime and night (DLSTV and NLSTV) had less correlation with SOM at the residue. The factors of evapotranspiration (ETD and ETV) at residue had significantly positive relationship with SOM. The relationships between factors of NDVI and SOM indicated that NDVIV had greater effects on SOM, and the greatest coefficient was 0.81 at residue. Elevation at all scales had a negative relationship with SOM at residue. Aspect at BIMF3 had a positive effect on SOM at almost all scales, while its residue had a negative effect on SOM at the sampling scale, BIMF3, and residue. The effects of slope on SOM distribution at different scales were not greater. CI at the sampling scale, BIMF2, and BIMF3 had positive effects on SOM at the sampling scale, BIMF2, and residue, while CI at the residue had negative effects on SOM at the sampling scale and at some scale components. The significant correlation between FA and SOM were identified at their residues. LS at residue negatively correlated with SOM, while the effects of TWI on SOM at most scales were not significant. VD at residue had a positive relationship with SOM at some scales, while VDCN at residue had a negatively correlation with SOM at some scales.

The correlation coefficients between the scale component of SOM and the corresponding scale component of topographic factors or RS index are shown in
In the non-coal mining area, the correlation between influencing factors and SOM differed greatly at different scales, and they were slightly less than those in the coal mining area. Generally, the greatest correlation coefficients were identified at the residues of SOM and the influencing factors. The correlation coefficients between the scale component of SAN and the scale component of influencing factors are shown in Figure 5. The correlation coefficients of the influencing factors with SAN were generally greater than those with SOM. The relationships between GPP (NPP, DLSTV, ETD, ETV) at the residue and SAN at almost all scales were significant. The correlation between topographic factors at the residue and SAN at almost all scales were negatively significant except VD. Additionally, the correlations between influencing factors and SAN in the coal mining area were generally greater than those in the non-coal mining area.

Soil Nutrient Prediction
The evaluating parameters for soil nutrient prediction are shown in Table 4. There were two methods (MLSROri and PLSROri) based on the soil nutrients and influencing factors at the sampling scales, and two methods (PLSRBIMF and 2D-EMDPM) based on the scale components of soil nutrients and influencing factors. Although the calibration accuracy of 2D-EMDPM was not the best among the four methods, its validation accuracy was the best, indicating that 2D-EMDPM was more stable than the other three methods. MLSROri: multiple linear stepwise regression based on original samples; PLSROri: partial least squares regression based on original samples; PLSRBIMF: partial least squares regression based on bi-dimensional intrinsic mode function; 2D-EMDPM: the combined method of 2D-EMD, PLSR, and MLSR; SOM: soil organic matter; SAN: soil available nitrogen; SAP: soil available phosphorus; SAK: soil available potassium; * significant at p < 0.05; ** significant at p < 0.01.
The parameters for the developed 2D-EMDPM models for SOM, SAN, SAP, and SAK are shown in Table 5, and the numbers of latent variables (LVs) are listed. Generally, the LVs gradually increased with the increasing scales. Obviously, the predicting performance was the worst at the scale component of BIMF1, and they were becoming better at the larger scales. Meanwhile, the predicting accuracy for SOM and SAN was greater in the coal mining area than those in the non-coal mining area.  The spatial distributions of soil nutrient predicted errors based on 2D-EMDPM are shown in Figure  6. Considering the distinction between the coal mining area and non-coal mining area of SOM and SAN, different models were established. The greater predicted errors were distributed in a dispersed pattern and mainly located in the validated samples.
The measured SOM were generally lower than the predicted in the coal mining area, while the measured SOM were generally higher than the predicted in the non-coal mining area. This might be attributed to the mining operations resulting in the SOM leakage in the coal mining area, and the fertilizer application resulted in the higher content of SOM in the non-coal mining area. The predicted SAN were higher than the measured, which were mainly along the Changhe River in the non-coal mining area. The highly predicted SAP and SAK were also found in the non-coal mining areas, which might be attributed to the farming practices. However, the highly predicted SAP was also found in the coal mining area, which might be due to the great transformation easily affected by stochastic factors [35].

Discussion
Coal mining operations reduced the mean content of soil nutrients, which were consistent with previous studies [36,37], showing that land subsidence caused by coal mining could induce losses in surface soil nutrients. This might be due to the subsidence in the coal mining area resulting in the vertical leakage of soil nutrients from the top layer to the deeper layer [38]. Meanwhile, mining activities increased the variances of soil nutrients except SAP. In the Changhe Watershed, spatial distribution of SOM and SAN in the coal mining area and non-coal mining area were different, which were affected by the coal mining activities. However, the distributions of SAP and SAK were not altered by the coal mining activities, which might be attributed to the horizontal running off of soil nutrients decreasing the different patterns between the coal mining area and non-coal mining area.
The correlation coefficient between soil nutrients and the influencing factors varied greatly, and their correlation types were also different at the sampling scale and each scale component, similar to previous studies [39][40][41][42]. The correlations between soil nutrients and the influencing factors were generally non-significant at the scale component of BIMF1, which might be attributed to the stochastic effects (i.e., the effect of farming practices) resulting in the random distribution of SOM at the small scale of BIMF1. The spatial distributions of BIMF1 varied greatly (Figure 3), and the worst predicting performance at BIMF1 (Table 5) demonstrated the stochastic variation at the small scale of BIMF1. In addition, a previous study [31] also proved the stochastic variation of soil nutrients at the small scale of BIMF1. Consequently, predicting models that excluded BIMF1 could improve their stability of predicting performance. On the other hand, the total percentage of variance explained by BIMF2, BIMF3, and residue was even less than the percentage of variance explained by BIMF1, which revealed the lesser effect of the structured variance and the greater effect of stochastic variance affected by human activities. Therefore, the predicting performance based on the structured variance of soil nutrients would not be better even if excluded the stochastic variance (BIMF1) in the study area.
SOM predictions using PLSR based on the scale components were better than the other methods in the calibration model except SOM in the coal mining area, indicating that predicting methods based on the scale components could improve the predicting performance, in line with previous reports [14,21]. However, these studies only established the calibration model and did not validate the predicting accuracies. The 2D-EMDPM was chosen as the ideal method based on the validation accuracy, which indicated the stability of 2D-EMDPM. However, the prediction performance in the coal mining area was better than those in the non-coal mining area. This might be attributed to the lesser effect of cultivated activity on SOM and SAN in the coal mining area because of the severe coal mining subsidence. Meanwhile, the large variance in the coal mining area could also result in the better performance of soil nutrients prediction. Therefore, the effects of cultivated activities were stronger than the effects of mining operations on the spatial distribution of soil nutrients. The prediction models for SAP and SAK were not satisfied in the entire area, which might be due to the farming practices in the cultivated land, such as fertilizer applications.
Because the prediction accuracy was not significant at the stochastic variance of BIMF1, and the prediction accuracy was significant at the larger scales, the predictions at the sampling scale should exclude the stochastic variance of BIMF1. These data further indicated that the predicting method of 2D-EMDPM excluding BIMF1 was stable in the coal mining area and cultivated fields affected by human activities. Meanwhile, topographic factors and RS index were easily acquired from the internet, and soil samplings and measurement were not necessary. If 2D-EMDPM was stable in certain coal mining areas, this would facilitate the prediction of soil nutrients in the coal mining area. However, 2D-EMDPM was only applied for the validated dataset, which was randomly selected only once, and was not applied for all the possible datasets randomly selected. Meanwhile, 2D-EMDPM application in other coal mining areas was not included in this study, and further study is needed in the future for better predictions of soil nutrients.

Conclusions
(1) The mean soil nutrients were less, and their CVs were greater in the coal mining area than those in the non-coal mining area. Meanwhile, the correlation coefficients between soil nutrients and influencing factors were generally higher in the coal mining area than those in the non-coal mining area.
(2) The small scale of BIMF1 represented stochastic variance, and their relationships with soil nutrients were almost non-significant, which were influenced by human activities. The predictions for BIMF1 were also non-significant because of the unstable distributions.
(3) Although the predicting method of MLSRORI was sometimes best in the calibration models, its performance was particularly unstable in the validation model and should not be used in the areas affected significantly by coal mining operations or farming practices. The proposed method was 2D-EMDPM, which decomposed the soil nutrients and their covariates into four scale components, then predicted soil nutrients at each scale component from the corresponding scale of the covariates, and finally predicted soil nutrients at the sampling scale from the predicted soil nutrients at each scale except BIMF1. The method was stable in the calibration and validation models and should be validated in other coal mining areas in the world.