Hyperspectral Estimation Model of Forest Soil Organic Matter in Northwest Yunnan Province , China

Soil organic matter (SOM) is an important index to evaluate soil fertility and soil quality, while playing an important role in the terrestrial carbon cycle. The technology of hyperspectral remote sensing is an important method to estimate SOM content efficiently and accurately. This study researched the best hyperspectral estimation model for SOM content in Shangri-La forest soil. The spectral reflectance of soils with sizes of 2 mm, 1 mm, 0.50 mm, and 0.25 mm were measured indoors. After smoothing and de-noising, the reciprocal reflectance (RR), logarithmic reflectance (LR), first-derivative reflectance (FR), reciprocal first-derivative reflectance (RFR), logarithmic first-derivative reflectance (LFR), and mathematical transformations of the original spectral reflectance (REF) were carried out to analyze the relevance of spectral reflectance and SOM content and extract the characteristic bands. Finally the simple linear regression (SLR), multiple stepwise linear regression (SMLR), and partial least squares regression (PLSR) models for SOM content estimation were established. The results showed that: (1) With the decrease of soil particle size, the spectral reflectance increased. The smaller the soil particle sizes, the more obvious was the increase in spectral reflectance. (2) The sensitive bands of SOM were mainly in the 580–690 nm range (correlation coefficient (R) > 0.6, p-value (p) < 0.01), and the spectral information of SOM could be significantly enhanced by first-order differential transformation. (3) Comparing the three models, PLSR had better estimation ability than SMLR and SLR. The precision of the 0.25 mm soil particle size and the LFR index in the PLSR estimation model of SOM content was the best (coefficient of determination of validation (Rv) = 0.91, root mean square error of validation (RMSEv) = 13.41, the ratio of percent deviation (RPD) = 3.33). The results provide a basis for monitoring SOM content rapidly in the forests of Northwest Yunnan, and provide a reference for forest SOM estimation in other areas.


Introduction
The soil organic matter (SOM) is not only an important basis for measuring soil fertility and quality [1], but also an important part of the terrestrial ecosystem carbon pool [2].Traditional SOM monitoring from field sampling to indoor chemical analysis has high precision, but it is time-consuming, laborious, has a long cycle and high cost, and is unable to meet the high efficiency, fast and immediate detection requirements [3].How to quickly and efficiently obtain SOM content Forests 2019, 10, 217 2 of 16 information and monitor the soil environment has been an urgent problem in precise management.With the development of technology, non-destructive, fast, accurate and large-scale remote sensing monitoring makes up for the shortcomings of traditional monitoring methods [4,5], while hyperspectral technology is also widely used in soil quality monitoring [6].
Previous studies have shown that soil spectral reflectance is significantly correlated with soil properties, such as soil moisture, iron oxides, clay minerals, and organic matter, while the absorption in the range of 400-1000 nm is mainly caused by iron oxide and organic matter [7].The reflectance is significantly negatively correlated with organic matter [8][9][10], it has a high correlation with SOM in the visible bands, especially in the red bands [11,12], and the most sensitive bands are mainly in the 550-710 nm range [13][14][15][16][17]. Studies on the spectral estimation of SOM content in different regions by various scholars have shown that the spectral information can be highlighted, the number of characteristic bands can be increased, and the correlation between reflectivity and organic matter can be enhanced by mathematically transforming the original spectrum [18][19][20][21][22].The researchers also established a variety of SOM content estimation models, including simple linear regression (SLR) [23], Multivariate Linear Regression (MLR) [24,25], stepwise multiple linear regression (SMLR) [9,11,26], partial least squares regression (PLSR) [27][28][29][30], principal component regression (PCR) [31][32][33], boosted regression tree (BRT) [34], support vector machine (SVM) [6], artificial neural network (ANN) [35], etc.There are differences in the best estimation models for different soil types, but the results of SMLR and PLSR are better and more stable [36].
Furthermore, soil particle size is also one of the factors impacting on soil spectral reflectance and the accuracy of SOM content estimation [37,38].Ma et al. [39] considered that there was a significant negative correlation between soil particle size and soil spectral reflectance that would affect the modeling accuracy [10].Bao et al. [40] believed that the accuracy of the soil nitrogen content estimation model would be affected when the soil particle size was less than 0.25 mm or more than 5 mm.Si et al. [41] thought that the improvement of the accuracy of the SOM estimation model was not obvious when the soil particle size was less than 0.25 mm, because the soil particles were so fine that they changed the soil physical properties, and the spectral characteristic information of SOM was masked.Li et al. [42] took Huangshui River basin as an example and used soil spectral information to estimate SOM content, which concluded that the SVM model based on 1 mm size had the best estimation accuracy (the coefficient of determination (R 2 ) = 0.96, the ratio of percent deviation (RPD) = 3.3).
Currently, research on estimating SOM content based on hyperspectral technology has made abundant achievements, but there are differences in soil types, causes of formation, soil physical and chemical properties in different regions.There is no unified standard for selecting soil particle size and models in existing studies, furthermore, model estimation accuracy varies greatly.So, it is difficult to apply the estimation model established in one region to other regions, and the results are also not comparable.Most of studies on spectral estimation of SOM focus on areas with low organic matter content (<10%).There are few reports on spectral estimation of SOM content in forests, especially the forests in Northwest Yunnan of China which are rich in organic matter in the topsoil, and where the SOM content is even greater than 20%.However, it is also very difficult to realize remote sensing monitoring of the topsoil under natural conditions, especially in Northwest Yunnan, where the vegetation coverage is high.Because of the noise caused by the vegetation that litters the horizon, the sunlight which is necessary for hyperspectral measurements in the field is blocked by trees, making the spectral information of the topsoil inaccurate.Therefore, this study took the forest soil of Shangri-La, a typical area in Northwest Yunnan of China, as the research object, carried out laboratory experiments, and established hyperspectral estimation models of SOM content in forests of different sizes in order to find a method for rapidly estimating SOM on the large numbers of field-collected samples in the laboratory.The results would allow the changes in SOM in larger areas to be frequently monitored and analyzed.

Study Area
Shangri-La city is the capital of the Diqing Tibetan Autonomous Prefecture, Yunnan Province, China.It is located in the northwest of Yunnan Province, the eastern part of Diqing, between latitudes 26 • 52 N-28 • 52 N and longitudes 99 • 20 E-100 • 19 E (Figure 1).The study area, the average elevation is 3280 m and mountain area accounts for over 90%, is an important area for biodiversity conservation and water conservation for ecological protection in the alpine valleys of northwest Yunnan.The climate is alternately controlled by the southwest monsoon and the south branch of the westerly jet.The vertical zoning of the climate is obvious, the dry season (June to October) and wet season (November to next May) are distinct.The annual average precipitation is about 620 mm, the annual average temperature is about 6 • C. According to research by the Soil and Fertility Station of Yunnan Province (SFSY) and the Office of Soil Survey of Yunnan Province (OSSY), the soil parent materials in Shangri-La include plateau lake sediments, river sediments, flood deposits residual, etc., and the soil types include alpine frost desert soil, alpine shrubby meadow soil, brown coniferous forest soil, dark brown soil, subalpine meadow soil, brown soil, red soil, etc. [43].The city's forest coverage rate is more than 70% and the total area of four constructive species, Quercus aquifolioides (Quercus aquifolioides Rehd.et Wils.),Abies georgei (Abies georgei Orr), Alpine pine (Pinus densata Mast.), and Yunnan pine (Pinus yunnanensis Franch.),accounts for over 80% of the city's arbor forests [44,45].So, the above four kinds of forest soils were selected for research.

Study Area
Shangri-La city is the capital of the Diqing Tibetan Autonomous Prefecture, Yunnan Province, China.It is located in the northwest of Yunnan Province, the eastern part of Diqing, between latitudes 26°52′ N-28°52′ N and longitudes 99°20′ E-100°19′ E (Figure 1).The study area, the average elevation is 3280 m and mountain area accounts for over 90%, is an important area for biodiversity conservation and water conservation for ecological protection in the alpine valleys of northwest Yunnan.The climate is alternately controlled by the southwest monsoon and the south branch of the westerly jet.The vertical zoning of the climate is obvious, the dry season (June to October) and wet season (November to next May) are distinct.The annual average precipitation is about 620 mm, the annual average temperature is about 6 °C.According to research by the Soil and Fertility Station of Yunnan Province (SFSY) and the Office of Soil Survey of Yunnan Province (OSSY), the soil parent materials in Shangri-La include plateau lake sediments, river sediments, flood deposits residual, etc., and the soil types include alpine frost desert soil, alpine shrubby meadow soil, brown coniferous forest soil, dark brown soil, subalpine meadow soil, brown soil, red soil, etc. [43].The city's forest coverage rate is more than 70% and the total area of four constructive species, Quercus aquifolioides (Quercus aquifolioides Rehd.et Wils.),Abies georgei (Abies georgei Orr), Alpine pine (Pinus densata Mast.), and Yunnan pine (Pinus yunnanensis Franch.),accounts for over 80% of the city's arbor forests [44,45].So, the above four kinds of forest soils were selected for research.

Soil Sample Collection and Experiment
Soil samples of forests were collected in the study area from July 17 to 26 and September 24 to October 1, in 2017 (Figure 1).Sampling was based on the soil genetic layer, and soil profiles were made at the relatively primitive sites which were less affected by human activities.Soil samples of each occurrence layer were collected from bottom to top.Samples were prepared after indoor

Soil Sample Collection and Experiment
Soil samples of forests were collected in the study area from 17 to 26 July and 24 September to 1 October, in 2017 (Figure 1).Sampling was based on the soil genetic layer, and soil profiles were made at the relatively primitive sites which were less affected by human activities.Soil samples of each occurrence layer were collected from bottom to top.Samples were prepared after indoor air-drying and after roots and stones were removed.The content of SOM was estimated by multiplying 1.724 with the soil organic carbon (SOC) content that was measured by potassium dichromate oxidation with external heating [46].The remaining dichromate was measured by volumetric titration [46].
Considering the influence of particle size on the accuracy of the SOM content estimation model [40,41], and the difficulty of making soil samples, the air-dried soil samples were prepared into four groups of particle sizes, <2 mm, <1 mm, <0.50 mm, and <0.25 mm, which were named 2 mm, 1 mm, 0.50 mm, and 0.25 mm in turn.Spectral reflectance was measured by an SVC HR-1024i ground object spectrometer (Spectra Vista, Co., Poughkeepsie, NY, USA) with a spectral range of 350-2500 nm and a field-of-view angle of 25 degrees.Spectrum measurement experiments were carried out in a dark room.A light source, with 45 degrees of zenith angle and 65 cm of vertical height, was the standard light source matched with the spectrometer.The spectrometer probe was 10 cm away from the center of the surface of the soil samples which were filled into a black container of 10 cm in diameter and 1 cm in height.The surface was scraped flat, the working table was covered with black paper and the white board was referenced before each spectral measurement.Each soil sample was measured 10 times, the soil sample was rotated 72 degrees every two times, and a total of 10 spectral reflectance curves were obtained for each soil sample.Finally, 64 valid samples were screened with SOM content and spectral data (Table 1).

Data Pre-processing
In the process of spectral data acquisition, it is unavoidable to be affected by the whiteboard error, instrument error, test environment, sample impurities, etc., resulting in large noise in the spectral data.Therefore, it was necessary to perform whiteboard correction, smooth de-noising, and merging to improve the signal-to-noise ratio.These processes were carried out in software SVC HR-1024i (Spectra Vista Co., Poughkeepsie, NY, USA).In order to enhance the spectral information and highlight the SOM sensitive information, the original spectral reflectance (REF) was transformed into the reciprocal reflectance (RR), logarithmic reflectance (LR), first-derivative reflectance (FR), reciprocal first-derivative reflectance (RFR), and logarithmic first-derivative reflectance (LFR) by Microsoft Office Excel 2010 (Microsoft Corp., Redmond, WA, USA).

Modeling and Validation
The hyperspectral data contains a large amount of spectral reflection information.It is necessary to establish estimation models to extract the spectral information related to SOM content from the hyperspectral data.Considering the operability, estimation ability, and stability of hyperspectral estimation models, three commonly used Hyperspectral Estimation Models, SLR, SMLR, and PLSR, were established by Matlab R2017a (MathWorks Inc., Natick, MA, USA).
The SLR model only uses the band with the highest correlation between SOM content and reflectance to construct a linear model for SOM content estimation.The model is simple, Forests 2019, 10, 217 5 of 16 convenient, and easy to be interpreted physically, but its estimation accuracy is affected in the case of multi-variables [47].The SMLR model uses multiple bands with high correlation between SOM content and reflectance to construct a linear model [47].It is developed from the multiple regression model to solve the collinearity problem among independent variables [48].SMLR usually sets the reliability with 0.05, and adopts a strategy of filtering variables one by one forward or backward, to finally obtain the optimal model [36].The PLSR model combines principal component analysis, multivariate regression and correlation analysis, it considers the dependent variable while revealing the principal component causing the change of SOM content [49,50].When the number of samples is small, and the data has strong collinearity and noise, the modeling analysis can still maintain a good effect, and retain all the variable information [36,51].
The whole dataset (n = 64) was split randomly into 54 samples (about 70%) for calibration and 19 samples (about 30%) for validation.Pearson correlation analysis was performed between SOM content and reflectance by using the calibration set and IBM SPSS Statistics 22.0 (IBM Corp., Armonk, NY, USA).The representative bands with significance (p < 0.01) and high correlation coefficient were selected as the characteristic bands of SOM, then SLR, SMLR, and PLSR models were established.The quality of the model was evaluated based on R 2 (Equation ( 1)), root mean square error (RMSE; Equation ( 2)), and RPD (Equation ( 3)) [36,52].The bigger the R 2 and the smaller the RMSE, the better is the fitting effect of the model [35,42].RPD, the ratio of the standard deviation of the measured value to the RMSE of the validation, was used to test the estimation ability of the model: If RPD >2, the model estimation ability is very good, and if RPD <1.0, the estimation ability is very poor, and cannot be used to estimate organic matter content [52]. (1) where y is the measured value of SOM content, y is the average value of SOM content, ŷ is the predicted value of SOM content, n is the total sample number (i = 1, 2, 3 . . .n), SD is the standard deviation.

Spectral Characteristics of Soils with Different Particle Sizes
The average spectral reflectance of the four particle sizes was calculated, and the spectral reflectance curves were obtained (Figure 2).Generally, the spectral reflectance increased rapidly in the 400-800 nm range, while the change at 800-2400 nm was relatively stable.There were absorption valleys around 1400 nm, 1900 nm, and 2200 nm, which are generally considered to be caused by moisture [53].The effect of particle size on the soil spectral reflectance was obvious, the smaller the soil particle size, the higher the reflectance and the more obvious were the spectral changes.The main reason was that with the decrease of particle size, the void between the soil particles decreased, which made the soil appear smoother, and the spectral reflection was enhanced.The spectral reflectance of the soil sample with 0.25 mm began to change significantly at around 580 nm, however, the spectral reflectance of soil samples with 2 mm, 1 mm, and 0.5 mm changed significantly around 800 nm.

Feature Band Extraction and Analysis
The correlation coefficient curves (Figure 3, taking the original spectral reflectance as an example) and the characteristic bands (Table 2) were obtained by analyzing the correlation between spectral reflectance information and SOM content.It was shown that REF was highly negatively correlated with the organic matter content (correlation coefficient (R) > 0.6, p-value (p) < 0.01) in the wavelength range of 580-690 nm which meant it was sensitive to the organic matter.The 0.25 mm particle size had the greatest correlation with SOM at 628 nm (R = 0.67, p < 0.01).After the mathematical transformation, the correlation coefficient was improved and the number of characteristic bands of the first-order differential transformation was increased.However, the characteristic bands of the different indicators were contrasting, mainly concentrated in bands of ranges 564-630 nm, 755-861 nm, 1020-1049 nm, 1365-1428 nm, 1520-1554 nm, 1600-1620 nm, 1738 nm, 1798 nm, 1924-1970 nm, 2160-2191 nm, 2239-2256 nm, and 2312-2320 nm (Table 2).

Feature Band Extraction and Analysis
The correlation coefficient curves (Figure 3, taking the original spectral reflectance as an example) and the characteristic bands (Table 2) were obtained by analyzing the correlation between spectral reflectance information and SOM content.It was shown that REF was highly negatively correlated with the organic matter content (correlation coefficient (R) > 0.6, p-value (p) < 0.01) in the wavelength range of 580-690 nm which meant it was sensitive to the organic matter.The 0.25 mm particle size had the greatest correlation with SOM at 628 nm (R = 0.67, p < 0.01).After the mathematical transformation, the correlation coefficient was improved and the number of characteristic bands of the first-order differential transformation was increased.However, the characteristic bands of the different indicators were contrasting, mainly concentrated in bands of ranges 564-630 nm, 755-861 nm, 1020-1049 nm, 1365-1428 nm, 1520-1554 nm, 1600-1620 nm, 1738 nm, 1798 nm, 1924-1970 nm, 2160-2191 nm, 2239-2256 nm, and 2312-2320 nm (Table 2).

Feature Band Extraction and Analysis
The correlation coefficient curves (Figure 3, taking the original spectral reflectance as an example) and the characteristic bands (Table 2) were obtained by analyzing the correlation between spectral reflectance information and SOM content.It was shown that REF was highly negatively correlated with the organic matter content (correlation coefficient (R) > 0.6, p-value (p) < 0.01) in the wavelength range of 580-690 nm which meant it was sensitive to the organic matter.The 0.25 mm particle size had the greatest correlation with SOM at 628 nm (R = 0.67, p < 0.01).After the mathematical transformation, the correlation coefficient was improved and the number of characteristic bands of the first-order differential transformation was increased.However, the characteristic bands of the different indicators were contrasting, mainly concentrated in bands of ranges 564-630 nm, 755-861 nm, 1020-1049 nm, 1365-1428 nm, 1520-1554 nm, 1600-1620 nm, 1738 nm, 1798 nm, 1924-1970 nm, 2160-2191 nm, 2239-2256 nm, and 2312-2320 nm (Table 2).

Simple Linear Regression (SLR)
The SLR models were established by selecting the highest correlation coefficient band from the characteristic bands which was extracted from six spectral transformation indexes of the four particle sizes as the independent variable and the SOM content as the dependent variable (Table 3).The modeling results showed that the coefficients of determination of calibration (R c 2 ) with indexes of REF, RR, and LR were both less than 0.6, the root mean square error of calibration (RMSE c ) was greater than 41.After the first-order differential transformation, the model fitting effect was improved, the R c 2 was increased maximum by 0.39, and the RMSE c reduced maximum by 21  : coefficient of determination of validation; 4 RMSE v : root mean square error of validation; 5 RPD: the ratio of percent deviation.
Among the four particle sizes, the 2 mm, 1 mm, and 0.50 mm particle sizes with the RFR conversion index were better, the 0.25 mm particle size was better fitted with the LFR transformation index, and R v 2 Forests 2019, 10, 217 9 of 16 of all of them was greater than 0.7.In the four best models, except for the 2 mm-RFR model, in which the estimation ability is low (RPD < 2), the other models showed good estimation ability (RPD > 2).The best was the 0. Y SOM = 1 × 10 6 R 807 + 21.494 (5) where: Y SOM = predicted value of SOM, R j = spectral reflectance of soil at wavelength j in nm.

Stepwise Multiple Linear Regression (SMLR)
Since REF, RR, and LR indexes only extract one characteristic band, the FR, RFR, and LFR transformation indexes were analyzed in the SMLR and PLSR models.The SMLR models were established by selecting the characteristic bands extracted from three spectral transformation indexes of the four particle sizes as the independent variable and SOM content as the dependent variable (Table 4).Rc 2 varied from 0.78-0.90with an average of 0.85, the maximum value of RMSEc is 30.47 and minimum value is 20.68, average value is 24.83.The best calibration result was the model of 0.25 mm particle size and the RFR transformation index.The validation results showed that Rv 2 ranged from 0.29 to 0.91 with an average of 0.70, RMSEv ranged from 42.38 to 13.60 with an average of 26.35, and RPD varied from 1.18 to 3.28 with an average of 2.04.In total, the SMLR model had a good fitting effect and strong SOM content estimation ability, and was superior to the SLR model.Because the average value of Rv 2 was increased by 0.19, RMSEv was reduced by 4, and the model estimation ability (RPD) was improved by 0.49.

Stepwise Multiple Linear Regression (SMLR)
Since REF, RR, and LR indexes only extract one characteristic band, the FR, RFR, and LFR transformation indexes were analyzed in the SMLR and PLSR models.The SMLR models were established by selecting the characteristic bands extracted from three spectral transformation indexes of the four particle sizes as the independent variable and SOM content as the dependent variable (Table 4).R c 2 varied from 0.78-0.90with an average of 0.85, the maximum value of RMSE c is 30.47 and minimum value is 20.68, average value is 24.83.The best calibration result was the model of 0.25 mm particle size and the RFR transformation index.The validation results showed that R v 2 ranged from 0.29 to 0.91 with an average of 0.70, RMSE v ranged from 42.38 to 13.60 with an average of 26.35, and RPD varied from 1.18 to 3.28 with an average of 2.04.In total, the SMLR model had a good fitting effect and strong SOM content estimation ability, and was superior to the SLR model.Because the average value of R v 2 was increased by 0.19, RMSE v was reduced by 4, and the model estimation ability (RPD) was improved by 0.49.For the four particle sizes, the models of 2 mm and 1 mm particle sizes with the RFR conversion index were better, while the 0.50 mm and 0.25 mm particle size were better fitted with LFR.The best combination of models was the 0.25 mm-LFR (R v 2 = 0.91, RMSE v = 13.60,RPD = followed by the models of 0.50 mm-LFR and 2 mm-RFR, the worst was the 1 mm-RFR (R v 2 = 0.79, RMSE v = 23.21,RPD = 2.16).Figure 5 shows the comparison of the estimated values by SMLR and the measured values of the verification samples, while the expression is shown as Equation ( 6).
Forests 2019, 10, x FOR PEER REVIEW 10 of 15 For the four particle sizes, the models of 2 mm and 1 mm particle sizes with the RFR conversion index were better, while the 0.50 mm and 0.25 mm particle size were better fitted with LFR.The best combination of models was the 0.25 mm-LFR (Rv 2 = 0.91, RMSEv = 13.60,RPD = 3.28), followed by the models of 0.50 mm-LFR and 2 mm-RFR, the worst was the 1 mm-RFR (Rv 2 = 0.79, RMSEv = 23.21,RPD = 2.16).Figure 5 shows the comparison of the estimated values by SMLR and the measured values of the verification samples, while the expression is shown as Equation (6).
where: YSOM = predicted value of SOM, Rj = spectral reflectance of soil at wavelength j in nm

Partial Least Squares Regression (PLSR)
The PLSR models were constructed based on reflectance (including its mathematical transformation) and SOM content.The calibration results (Table 5) showed that the fitting accuracy of the PSLR models was good, Rc 2 for all was above 0.79 (the maximum was the model of 2 mm-RFR, Rc 2 = 0.90) and the mean value was 0.85; RMSEc varied from 21.00 to 29.93, and the

Partial Least Squares Regression (PLSR)
The PLSR models were constructed based on reflectance (including its mathematical transformation) and SOM content.The calibration results (Table 5) showed that the fitting accuracy of the PSLR models was good, R c 2 for all was above 0.79 (the maximum was the model of 2 mm-RFR, R c 2 = 0.90) and the mean value was 0.85; RMSE c varied from 21.00 to 29.93, and the average was 25.23.
The validation results showed that R v ranged from 0.65 to 0.91, with an average of 0.77; RMSE c ranged from 13.41 to 26.27, with an average of 20.96; and RPD ranged from 3.33 to 1.70, with an average of 2.20.Overall, the PLSR model was better than the models of SLR and SMLR in modeling accuracy.The average of R v 2 was 0.9 higher than the SMLR model, the average of RMSE v was 5.39 lower while RPD was improved by 0.16.
Comparing the best models of the four particle sizes, the results were similar to those of SMLR: the model of 2 mm and 1 mm particle sizes with the RFR conversion index were better, while the 0.50mm and 0.25mm particle sizes were better fitted with LFR.The optimum model was the 0. where: Y SOM = predicted value of SOM, R j = spectral reflectance of soil at wavelength j in nm.
Summarizing the three models of SLR, SMLR, and PLSR, the estimation ability of the SLR model was the worst.Most SLR models gave R 2 less than 0.6 and the RMSE was high, while the fitting effect and estimation ability of the PLSR and SMLR models were close and better than the SLR model.However, the accuracy of the PLSR model was higher than that of the SMLR model.Comparing the optimal models of the three modeling methods, the R v 2 and RPD of the 0.25 mm-LFR-PLSR model were improved respectively by 0.08 and 0.93 compared to the SLR model, and RMSE v was reduced by 5.20, in addition, the RPD was 0.05 higher and RMSE was 0.19 less than for the SMLR.Therefore, the hyperspectral estimation model based on 0.25 mm-LFR-PLSR was the best estimation model for forest SOM content in northwest Yunnan.Summarizing the three models of SLR, SMLR, and PLSR, the estimation ability of the SLR model was the worst.Most SLR models gave R 2 less than 0.6 and the RMSE was high, while the fitting effect and estimation ability of the PLSR and SMLR models were close and better than the SLR model.However, the accuracy of the PLSR model was higher than that of the SMLR model.Comparing the optimal models of the three modeling methods, the Rv 2 and RPD of the 0.25 mm-LFR-PLSR model were improved respectively by 0.08 and 0.93 compared to the SLR model, and RMSEv was reduced by 5.20, in addition, the RPD was 0.05 higher and RMSE was 0.19 less than for the SMLR.Therefore, the hyperspectral estimation model based on 0.25 mm-LFR-PLSR was the best estimation model for forest SOM content in northwest Yunnan.

Discussion
The study measured the spectral data of 2 mm, 1 mm, 0.50 mm, and 0.25 mm forest soil samples, and analyzed the relationship between spectral reflectance (mathematical transformation) and SOM content to obtain the characteristic bands.The results showed that with decreased particle size, the spectral reflectance of the soil increased.When the wavelength was greater than 580 nm, the spectral reflectance changed significantly, and the smaller the particle size, the greater was the

Discussion
The study measured the spectral data of 2 mm, 1 mm, 0.50 mm, and 0.25 mm forest soil samples, and analyzed the relationship between spectral reflectance (mathematical transformation) and SOM content to obtain the characteristic bands.The results showed that with decreased particle size, the spectral reflectance of the soil increased.When the wavelength was greater than 580 nm, the spectral reflectance changed significantly, and the smaller the particle size, the greater was the increase of the reflectivity.This result is in conformity with other study results in that spectral reflectance varies significantly with particle size change when the wavelength is greater than 600 nm [42] and spectral reflectance increases exponentially with decreasing particle size [39].The response bands of SOM in the original spectral reflectance curve were mainly at wavelength of 580-690 nm (R > 0.6, p < 0.01), this result was basically consistent with those of 620-660 nm proposed by Xu [16], 550-700 nm proposed by Galvão et al. [14,15], 570-630 nm proposed by Peng et al. [13] and 560-710 nm proposed by Fang et al. [17].Although there is no uniform standard for the pre-processing of spectral reflectance [19], it is undeniable that the first order differential transformation of spectral reflectance can significantly enhance the characteristic spectral information of SOM, reduce the effect of noise, improve the correlation between SOM and reflectance, and increase the number of characteristic bands.In addition, the spectral information of soil is complex, which is affected by the parent material, water, soil nutrients, and texture [22,54,55].So the focus of SOM content spectrum estimation research is to select the best band according to different research areas and soil types.
SOM spectral reflectance is affected by soil formation factors and soil properties.It is difficult to find a general SOM content estimation model for different soil types [33,56].Hou et al. [23] established the hyperspectral estimation model of SOM content in the desert, the results showed that the fitting results of the PLSR model were better than the SMLR model and SLR model.Si et al. [41] built SOM estimation models based on PLSR and considered that SOM estimation model with the 0.25 mm particle size was the best, R 2 and RMSE were 0.816, 4.26, respectively.Zhou et al. [20] and Hu et al. [21] considered that the logarithmic first-order differential model of spectral reflectance was the best for estimating SOM content in soil plowing layers.The 0.25 mm-LFR-PLSR obtained in this study basically conforms with the results of the above studies, and also conforms with the particle size of the SOM chemical analysis.At the same time, the results of estimating SOM content in soil with small particle sizes (0.50 mm, 0.25 mm) were more accurate, and the optimum particle size was 0.25 mm.This was different from research by Li et al. [42] based on particle sizes of 2 mm, 1 mm, 0.25 mm, and 0.15 mm, which concluded the 1mm particle size of soil was best.There are two reasons for this: (1) the modeling accuracy of small samples tends to increase with the decrease of particle size [21]; (2) there may be impurities such as fine grains and root debris, which will affect the spectral information and modeling accuracy of SOM.For the modeling results, R 2 and RPD of the 0.25 mm-LFR-PLSR are better than the PLSR models in the studies implemented by Fidêncio et al. [27,28], Dunn et al. [29], Kooistra et al. [30], and Hou et al. [23], but the result of RMSE is slightly worse than previous researches [23,[27][28][29][30].The reason is that the variation range of SOM content in soil samples is large (3.28-257.34g/kg), the difference of SOM content is great, and the coefficient of variation is high (0.95), which have a certain impact on the accuracy of the SOM estimation model.However on the whole, the model has good estimation ability.

Conclusions
Overall, with the decrease of soil particle size, the spectral reflectance increases, and the spectral reflectance of different particle sizes changes obviously near the wavelength of 580 nm.The smaller the particle size, the larger the increased range of reflectance.In the original spectral reflectance curve, the sensitive bands of SOM were mainly in the range of 580-690 nm (R > 0.6, p < 0.01), which is negatively correlated with forest SOM, while the spectral information of SOM can be significantly enhanced by first-order differential transformation.The PLSR method outperformed the SLR and SMLR according to R v 2 , RMSE v, and RPD.The predictive ability decreased in the following order: PLSR > SMLR > SLR.So the method of PSLR based on the 0.25 mm-LFR is best for estimation of forest soil organic matter by hyperspectral technology in northwest Yunnan.
The results provide a basis for the rapid monitoring of SOM content and ecological environment management in forests in northwest Yunnan, and also provide a reference for hyperspectral estimation of SOM content in other areas.However, with the increasing demand for precise soil management, it is necessary in further studies to collect and analyze more data for estimating the SOM content in other forest areas to improve the applicability of the estimating methods.We hope that, in further study, satellite-borne or airborne hyperspectral remote sensing can be used to estimate the SOM in such areas, and that satisfactory results can be achieved.

Figure 1 .
Figure 1.Position map of Shangri-La and distribution of sampling points in different stands.

Figure 1 .
Figure 1.Position map of Shangri-La and distribution of sampling points in different stands.

Figure 2 .
Figure 2. Spectral reflectance curves of soil: (a) 60 spectral reflectance curves of the total samples; (b) mean spectral reflectance curves of four particle sizes.

Figure 3 .
Figure 3. Correlation between original spectral reflectance (REF) and organic matter content of the four particle sizes.

Figure 2 .
Figure 2. Spectral reflectance curves of soil: (a) 60 spectral reflectance curves of the total samples; (b) mean spectral reflectance curves of four particle sizes.

ForestsFigure 2 .
Figure 2. Spectral reflectance curves of soil: (a) 60 spectral reflectance curves of the total samples; (b) mean spectral reflectance curves of four particle sizes.

Figure 3 .
Figure 3. Correlation between original spectral reflectance (REF) and organic matter content of the four particle sizes.

Figure 3 .
Figure 3. Correlation between original spectral reflectance (REF) and organic matter content of the four particle sizes.

Figure 4 .
Figure 4. Comparison of estimated and measured values of soil organic matter (SOM) content by simple linear regression (SLR).R 2 indicates determination of calibration, RMSE indicates root mean square error, RPD indicates the ratio of percent deviation.

Figure 4 .
Figure 4. Comparison of estimated and measured values of soil organic matter (SOM) content by simple linear regression (SLR).R 2 indicates determination of calibration, RMSE indicates root mean square error, RPD indicates the ratio of percent deviation.

Figure 5 .
Figure 5.Comparison of estimated and measured values of SOM content by stepwise multiple linear regression (SMLR).

Figure 5 .
Figure 5.Comparison of estimated and measured values of SOM content by stepwise multiple linear regression (SMLR).

Figure 6 .
Figure 6.Comparison of estimated and measured values of SOM content by partial least squares regression (PLSR).

Figure 6 .
Figure 6.Comparison of estimated and measured values of SOM content by partial least squares regression (PLSR).

Table 1 .
Statistical characteristics of soil organic matter (SOM) content.

Table 2 .
Characteristic bands and correlation coefficients of SOM with different particle sizes.

Table 3 .
Modeling results of simple linear regression (SLR) for SOM content.
1R c2 : coefficient of determination of calibration; 2 RMSE c : root mean square error of calibration; 3 R v 2

Table 4 .
Modeling results of stepwise multiple linear regression (SMLR) for SOM content.

Table 4 .
Modeling results of stepwise multiple linear regression (SMLR) for SOM content.

Table 5 .
Modeling results of partial least squares regression (PLSR) for SOM content.