Fractional Modeling for Quantitative Inversion of Soil-Available Phosphorus Content

The study of field spectra based on fractional-order differentials has rarely been reported, and traditional integer-order differentials only perform the derivative calculation for 1st-order or 2nd-order spectrum signals, ignoring the spectral transformation details between 0th-order to 1st-order and 1st-order to 2nd-order, resulting in the problem of low-prediction accuracy. In this paper, a spectral quantitative analysis model of soil-available phosphorus content based on a fractional-order differential is proposed. Firstly, a fractional-order differential was used to perform a derivative calculation of original spectral data from 0th-order to 2nd-order using 0.2-order intervals, to obtain 11 fractional-order spectrum data. Afterwards, seven bands with absolute correlation coefficient greater than 0.5 were selected as sensitive bands. Finally, a stepwise multiple linear regression algorithm was used to establish a spectral estimation model of soil-available phosphorus content under different orders, then the prediction effect of the model under different orders was compared and analyzed. Simulation results show that the best order for a soil-available phosphorus content regression model is a 0.6 fractional-order, the coefficient of determination (R2), root mean square error (RMSE), and ratio of performance to deviation (RPD) of the best model are 0.7888, 3.348878, and 2.001142, respectively. Since the RPD value is greater than 2, the optimal fractional model established in this study has good quantitative predictive ability for soil-available phosphorus content.


Introduction
Soil-available phosphorus refers to inorganic phosphorus or small molecular organic phosphorus that can be directly absorbed and utilized by plants [1][2][3].Its content refers to the amount of phosphorus that can be absorbed by seasonal crops.It is an important indicator for evaluating the phosphorus supply capacity of soil phosphorus.Soil-phosphorus deficiency affects crop growth, while soil phosphorus in surplus for a long time increases the risk of soil phosphorus flowing into water bodies and creating potential ecological problems.The content of available phosphorus in soil varies with soil type, climate, fertilization level, irrigation, cultivation practices, and so on.Therefore, real-time detection of soil-available phosphorus content can provide a scientific reference for rational application of phosphate fertilizer and improvement of phosphate-fertilizer utilization.The traditional method for detecting soil-available phosphorus content is to adopt laboratory chemical reagents to measure, which has the disadvantage of being a cumbersome procedure, high-cost, and time-consuming [4][5][6].Visible near-infrared spectroscopy has the characteristics of rapid, non-destructive, and low-cost detection [7][8][9]; it can analyze a large number of soil samples in a short time, and realize real-time online measurement of soil parameters.
A traditional integer-order differential method is widely used to process soil-spectral signals [10,11], but its description of the physical model is only an approximation, which largely ignores the authenticity of the system.Fractional calculus generalizes the order of traditional integer calculus to the field of fractions.Compared with traditional integer-order calculus, its greatest advantage lies in its memory and inheritance properties, which makes it more accurate and effective to describe certain physical phenomena by using fractional-order differential equations.It has been proved that fractional-order systems are more in line with natural laws and engineering physics phenomena.Fractional-order systems can better reflect the performance of dynamic systems and can more clearly describe the physical characteristics of the system.The composition of soil is very complex-it is mainly a mixture of minerals, organic matter, living organisms, water, and air.Fractional-order differential algorithms can be used to determine the inflection point of a soil spectral-reflectance curve and can perform baseline correction to eliminate background noise and atmospheric influence, distinguish overlapping spectra, and improve detection signal-to-noise ratio.It also can reduce the impact of soil type, sample size, and other factors; and excavate the spectral absorption characteristics.
In recent years, scholars have introduced fractional-order differentials into the field of spectral analysis, mainly focusing on the spectra of corn, wheat, and diesel in public collections; a few literature sources relate to soil spectra collected in an ideal indoor condition.For example, Kharintsev et al. [12] used a fractional derivative algorithm to separate the overlapping spectral features and extracted spectral characteristics, such as half-width and amplitude, showing that fractional differentials had certain feasibility in spectral analysis applications.Zheng et al. [13] utilized the Savitzky-Golay (SG) fractional derivative to preprocess near-infrared spectroscopy datasets for corn, wheat, and diesel; simulations showed that the preprocessing effect after fractional derivation was better than integer order.Zhang et al. [14] preprocessed the indoor spectrum of saline soil in Xinjiang by a fractional derivative algorithm; simulations showed that the effect of a fractional differential was significantly better than that of an integer-order differential.Wang et al. [15] studied the hyperspectral detection of chromium content by fractional differential algorithms and found that the 1.8-order differential model was the optimal model.Some scholars, however, have applied the fractional-order differential method to study soil spectra under an indoor controllable light source with ideal conditions.The indoor soil spectrum does not take into account the influence of complex factors in the field and a prediction model established by the indoor spectrum is difficult to extend directly into the field.At present, there are relatively few studies on soil-available phosphorus and the study of field spectra based on fractional-order differentials has rarely been reported.Therefore, this paper took the desert soil in Xinjiang as the research object to collect the field spectral signals.We studied the application of the Grünwald-Letnikov fractional differential in field spectrum data preprocessing and feature extraction, and fully exploited the useful information in the spectrum for the prediction model of available phosphorus content to obtain effective sensitive bands.In addition, the dynamic law of spectrum data with fractional order changes was discussed and the optimal scheme of fractional spectral modeling sought.Moreover, the research of this paper has enriched the method of soil hyperspectral-data preprocessing, improved the accuracy of hyperspectral prediction models of available phosphorus content, and provided scientific support and application reference for local precision agriculture.

Research Area
The study area is located between the northern foot of the Tianshan Mountains and the southern margin of the Junggar Basin (87 • 44 -88 • 46 E, 43 • 29 -45 • 45 N).It belongs to the territory of Fukang in Xinjiang and has a pH value of 7.76-8.98.The characteristics of the Fukang terrain are low in the north and high in the south, with the landforms of mountainous in the south, plains in the middle, and deserts in the north.Fukang is a moderately temperate desert climate with plenty of light in the area.The average temperature in this region is 6.7 • C, the highest temperature is 39.7 • C, and the lowest temperature is −26.2 • C. The precipitation in this area is scarce, but the surface evaporation is very large.

Sampling Point Layout and Measurement of Available Phosphorus Content
In mid-May 2017, we arranged five sampling lines from south to north in the study area, with sampling line spacings of 600-800 m.Five sampling points on each sampling line were selected to represent the soil background in this area, the sampling distance was 300-500 m.A total of 25 sampling points was obtained and Global Positioning System (GPS) positioning was performed.The position distribution of the sampling points is shown in Figure 1.

Sampling Point Layout and Measurement of Available Phosphorus Content
In mid-May 2017, we arranged five sampling lines from south to north in the study area, with sampling line spacings of 600-800 m.Five sampling points on each sampling line were selected to represent the soil background in this area, the sampling distance was 300-500 m.A total of 25 sampling points was obtained and Global Positioning System (GPS) positioning was performed.The position distribution of the sampling points is shown in Figure 1.
Soil samples of 0-10 cm were collected from each sampling point, were numbered into bags, and brought back to the laboratory.All of the soil samples were air-dried naturally, had impurities removed, and were sieved through a 1 mm aperture.Then they were sent to the Xinjiang Institute o Ecology and the Geography of the Chinese Academy of Sciences to determine soil-available phosphorus content.

Field Spectral Data Acquisition
The ASD FieldSpec ® 3Hi-Res (Malvern Panalytical Ltd, Malvern, UK) spectrometer was used to acquire field spectra on 9-23 May 2017.Its measured spectral range is 350-2500 nm, the sampling interval of the spectrum was 1.3 nm at 350-1000 nm, 2 nm at 1000-2500 nm, and the re-sampling interval was 1 nm.Since weather conditions affect the spectral measurement, in order to reduce the data error caused by weather conditions, the spectral measurement was selected to be carried out a 11:00-15:00 o'clock local time in sunny, cloudless, and windless weather.Before each spectra measurement, the spectrometer needed to be calibrated with a white board to remove dark curren effects.Spectral probes were placed at a vertical distance of 15 cm above the surface of sampling points to represent the characteristics of this area.The ground surface was flat, with no cracks, and no weeds around.Spectral acquisition was performed in the same manner for each point, by selecting five position points close to the soil background value within a range of 1 m.Ten spectral curves were repeatedly measured at each sampling point, for a total of 50, and the mean value was taken as the actual measured spectrum value of the sampling point.
Before performing the subsequent data analysis, firstly, the SG smoothing method was used to Soil samples of 0-10 cm were collected from each sampling point, were numbered into bags, and brought back to the laboratory.All of the soil samples were air-dried naturally, had impurities removed, and were sieved through a 1 mm aperture.Then they were sent to the Xinjiang Institute of Ecology and the Geography of the Chinese Academy of Sciences to determine soil-available phosphorus content.

Field Spectral Data Acquisition
The ASD FieldSpec ® 3Hi-Res (Malvern Panalytical Ltd, Malvern, UK) spectrometer was used to acquire field spectra on 9-23 May 2017.Its measured spectral range is 350-2500 nm, the sampling interval of the spectrum was 1.3 nm at 350-1000 nm, 2 nm at 1000-2500 nm, and the re-sampling interval was 1 nm.Since weather conditions affect the spectral measurement, in order to reduce the data error caused by weather conditions, the spectral measurement was selected to be carried out at 11:00-15:00 o'clock local time in sunny, cloudless, and windless weather.Before each spectral measurement, the spectrometer needed to be calibrated with a white board to remove dark current effects.Spectral probes were placed at a vertical distance of 15 cm above the surface of sampling points to represent the characteristics of this area.The ground surface was flat, with no cracks, and no weeds around.Spectral acquisition was performed in the same manner for each point, by selecting five position points close to the soil background value within a range of 1 m.Ten spectral curves were repeatedly measured at each sampling point, for a total of 50, and the mean value was taken as the actual measured spectrum value of the sampling point.
Before performing the subsequent data analysis, firstly, the SG smoothing method was used to smooth the spectrum, and then the spectral reflectances at 350 nm-400 nm in the ultraviolet bands and 2400 nm-2500 nm in the short-wavelength infrared bands were eliminated, because the signal-to-noise ratio of these bands was relatively low.Finally, the wavelength bands (1355-1410 nm, 1820-1942 nm) located in the moisture absorption zone were eliminated because these bands have a great influence on the accuracy of the spectral inversion of available phosphorus content.

Fractional Derivative
At present, the expressions of fractional differential mainly include Riemann-Liouville, Grünwald-Letnikov, and Caputo [16][17][18], and the most commonly used expressions are Grünwald-Letnikov (G-L) expressions.The G-L differential is defined by: where the coefficient is: According to the definition in Equation ( 1), suppose that the duration of signal s(t) is t ∈ [a, t].Because we used the ASD Field Spec ® 3Hi-Res spectrometer to collect field spectra, the re-sampling interval was 1 nm, and therefore, the signal duration for [a, t] was divided equally into equal intervals of h = 1, then n can be defined as follows: We can further deduce that the v-order fractional differential form expression of the signal s(t) is: From Equation (4), the difference coefficient of fractional differential can be described as follows: where, v is an order, the 0th-order differential of function s(t) is s(t) itself which does not perform differential processing.For v = 1 and 2, respectively, Equation (4) agrees with the 1st and 2nd-order differential formulas when the differential window scale is = 1.

Stepwise Multiple Linear Regression
Stepwise multiple linear regression (SMLR) is an optimization process based on a general multivariate regression analysis method [19,20].The multiple linear regression model is a regression model composed of multiple independent variables that reveals the linear relationship between multiple independent variables and dependent variables.
Y is a dependent variable.X 1 , X 2 , . . ., X m are m known independent variables and the number of samples is n.In the regression equations, the independent variable X i (i = 1, 2, . . ., k) gives a significant effect to Y.For different fractional orders, we need to find which factors X i contribute to Y. Selection and rejection of independent variables for stepwise multiple regression analysis method is determined by the F statistic.The SMLR method is given as follows.
Step 1: Choose one variable x k 1 from m variables to establish a linear regression equation: Step 2: Select the second of the remaining m − 1 variables that has the most significant effect on y and establish a binary regression equation: Check whether it is significant or not.If it is not, return to step one.If it is, continue to find the next variable.
Step 3: The regression equation is obtained.

Model Accuracy Verification Method
In order to evaluate comprehensively the accuracy of the SMLR quantitative estimation model, we selected three accuracy evaluation indicators [21]: root mean square error (RMSE), coefficient of determination (R 2 ), and ratio of performance to deviation (RPD).R 2 was divided into the coefficient of determination for the calibration set (R 2 c ) and the coefficient of determination for the verification set (R 2 p ).The definition of RMSE, R 2 , and RPD can be described as follows: where, n is the number of samples.M i is the measured value of the i-th sample.P i is the predicted value of the i-th sample.SSR represents the regression sum of squares.SST represents the sum of squares.SD is the standard deviation of validation sample.RMSE is the root mean square error of validation set.When 0.66 ≤ R 2 ≤ 0.80, the model fitting effect is better [22].When 0.81 ≤ R 2 ≤ 0.90, the model fitting result is very good.When R 2 ≥ 0.90, the model fitting effect is excellent.The closer the RMSE is to 0, the model has higher prediction accuracy and stronger the prediction ability.
When the RPD is greater than 2.5, it indicates that the model has strong predictive ability [23].When the RPD is between 2.0 and 2.5, it indicates that the model has good quantitative prediction ability.When the RPD is between 1.8 and 2.0, it indicates that the model has quantitative predictive ability.When RPD is between 1.4 and 1.8, it indicates that the model has general quantitative prediction ability.When RPD is between 1.0 and 1.4, it indicates that the model only has the ability to distinguish between high and low values.When RPD is less than 1.0, it indicates that the model does not have predictive power.
For the calibration set, when R 2 c is larger, and RMSE is smaller, then the modeling accuracy of calibration set will be higher, and the model will be more stable.In addition, for the verification set, when R 2 p and RPD are larger, and RMSE is smaller, the prediction model has higher prediction accuracy.

Correlation Coefficient
Original spectral reflectance was programmed in Matlab R2015a software (MathWorks, Natick, MA, USA), and the correlation coefficient between available phosphorus content and original spectral reflectance was calculated at 0.2-order intervals, with a total of 11 fractional differentials.The 0.05 significance test level in this area was *P0.05 = 0.396.Figure 2 shows the results of fractional processing of the original spectrum.The abscissa represents the wavelength and the ordinate represents the differential value of the spectral reflectance after fractional differential calculation.It can be seen from the simulation results that the original spectral reflectance has bands that passed 0.05 significance test from 0th-order, and the number of passed bands is large.Moreover, as the fractional order changes, more information in the original spectral data is mined, and its subtle changes are more obvious.Taking the differential values of 700 nm and 800 nm in Figure 2a as an example, we found that the subtle changes in the original spectral image were amplified and, by observing Figure 2b-d, some of the details were more pronounced.As the fractional order increased, sharper peaks appeared in the spectral curve.

Correlation Coefficient
Original spectral reflectance was programmed in Matlab R2015a software (MathWorks, Natick, MA, USA), and the correlation coefficient between available phosphorus content and original spectral reflectance was calculated at 0.2-order intervals, with a total of 11 fractional differentials.The 0.05 significance test level in this area was *P0.05 = 0.396.Figure 2 shows the results of fractional processing of the original spectrum.The abscissa represents the wavelength and the ordinate represents the differential value of the spectral reflectance after fractional differential calculation.It can be seen from the simulation results that the original spectral reflectance has bands that passed 0.05 significance test from 0th-order, and the number of passed bands is large.Moreover, as the fractional order changes, more information in the original spectral data is mined, and its subtle changes are more obvious.Taking the differential values of 700 nm and 800 nm in Figure 2a as an example, we found that the subtle changes in the original spectral image were amplified and, by observing Figure 2b-d, some of the details were more pronounced.As the fractional order increased, sharper peaks appeared in the spectral curve.

Modeling Process for Quantitative Analysis Model
The steps of the quantitative analysis model established in this paper are as follows: Step 1: Calculate the fractional differential value of 11 fractional differentials spectral reflectance between 0th-order and 2nd-order using Equation (4).
Step 2: Calculate the correlation coefficient between spectral reflectance and available phosphorus content and perform a 0.05 significance test.
Step 3: Statistically calculate the absolute value of the maximum correlation coefficient after the 11th-order fractional differential transformation and its corresponding wavelength.
Step 4: Select the bands whose absolute value of maximum correlation coefficient with each fractional order is greater than 0.5 as the sensitive bands.
Step 6: All fractional spectral data and its corresponding sensitive bands are obtained by traversing 0th-order to 2nd-order at intervals of 0.2 step.
The flow chart of the quantitative analysis model for soil-available phosphorus content is shown in Figure 3.

Modeling Process for Quantitative Analysis Model
The steps of the quantitative analysis model established in this paper are as follows: Step 1: Calculate the fractional differential value of 11 fractional differentials spectral reflectance between 0th-order and 2nd-order using Equation (4).
Step 2: Calculate the correlation coefficient between spectral reflectance and available phosphorus content and perform a 0.05 significance test.
Step 3: Statistically calculate the absolute value of the maximum correlation coefficient after the 11thorder fractional differential transformation and its corresponding wavelength.
Step 4: Select the bands whose absolute value of maximum correlation coefficient with each fractional order is greater than 0.5 as the sensitive bands.
Step 6: All fractional spectral data and its corresponding sensitive bands are obtained by traversing 0th-order to 2nd-order at intervals of 0.2 step.
The flow chart of the quantitative analysis model for soil-available phosphorus content is shown in Figure 3.

Collect soil spectral reflectance and measure available phosphorus content
Fractional order is 0

Calculate the correlation coefficient between spectral reflectance and available phosphorus content
Calculate the max correlation coefficient and its corresponding wavelength at each fractional order Select the bands whose max correlation coefficient greater than 0.5 as the sensitive bands.

Model Optimal Wavelength Selection
We have calculated the absolute value of the maximum correlation coefficient and its corresponding band information after the 11th-order fractional differential processing, which was shown in Table 1.The absolute value of the maximum correlation coefficient was 0.81085, the corresponding band was 2283 nm, and the order was 0.6.According to Table 1, the band with absolute

Model Optimal Wavelength Selection
We have calculated the absolute value of the maximum correlation coefficient and its corresponding band information after the 11th-order fractional differential processing, which was shown in Table 1.The absolute value of the maximum correlation coefficient was 0.81085, the corresponding band was 2283 nm, and the order was 0.6.According to Table 1, the band with absolute value of maximum correlation coefficient greater than 0.5 is selected as the sensitive band, and there are seven bands in the study area.The selected bands are 1179, 2047, 2165, 2283, 2364, 2365, and 2393 nm, respectively.

Establishment Stepwise Multiple Linear Regression Model
Fifteen randomly selected samples from 25 soil samples were used to establish a model, and the other 10 samples were used as a validation set.Taking soil-available phosphorus content as a dependent variable, spectral reflectance was used as an independent variable for the sensitive bands 1179, 2047, 2165, 2283, 2364, 2365, and 2393 nm.The SMLR model was established to estimate available phosphorus content (Table 2). is 0.424 in 0th-order, R 2 is 0.591 in integer 1st-order, and R 2 is 0.644 in integer 2nd-order.After performing the fractional differential transformation by 0.4-order, 0.6-order, 0.8-order, 1.2-order, 1.4-order, 1.6-order and 1.8-order, R 2 has a certain degree of improvement.Among them, the 1.6-order improves the most and reaches 0.865.The RMSE is 5.13826 at 0th-order, 4.3324598 at 1st-order, and 4.040979 at 2nd-order.After fractional differential processing, RMSE decreased, and the lowest value of RMSE is 2.491789.

Predictive Model Accuracy Comparison
The root mean square error, R 2 and RPD were used as reference indexes for model evaluation (Table 3).Regression equations for 0.4-order, 0.6-order, 1.6-order and 1.8-order have relatively high R 2 and RPD values and relatively low RMSE values, indicating that these four differential orders have better prediction performance than other cases.The RPD value is greater than 2 at the 0.6-order, which indicates that the model has better prediction ability.The RPD values are between 1.4 and 2 in the range of 0.6, 1.6h, and 1.8-order, which indicates that the corresponding model predictability is general, and the available phosphorus content can be quantitatively estimated.

Selection Best Prediction Model
In order to obtain the best predictive model of available phosphorus content, 10 samples were used to verify the relationship between the measured and predicted values in Figure 4.For the 0.6-order, the verification set sample data points are all distributed along the 1:1 straight line, and the prediction correlation is better than for other orders.In general, based on the SMLR model of 0.6-order, the R 2 is higher, the RMSE is the smallest, and the RPD is the largest.Therefore, this model is the best prediction model for the available potassium content.

Selection Best Prediction Model
In order to obtain the best predictive model of available phosphorus content, 10 samples were used to verify the relationship between the measured and predicted values in Figure 4.For the 0.6order, the verification set sample data points are all distributed along the 1:1 straight line, and the prediction correlation is better than for other orders.In general, based on the SMLR model of 0.6order, the 2 R is higher, the RMSE is the smallest, and the RPD is the largest.Therefore, this model is the best prediction model for the available potassium content.

Conclusions
In this paper, the spectra of desert soils were collected in a field environment, and the spectral data were preprocessed using the Grünwald-Letnikov fractional differential.The seven sensitive bands were identified by the correlation analysis method.The bands estimated soil-available

Conclusions
In this paper, the spectra of desert soils were collected in a field environment, and the spectral data were preprocessed using the Grünwald-Letnikov fractional differential.The seven sensitive bands were identified by the correlation analysis method.The bands estimated soil-available phosphorus content are 1179, 2047, 2165, 2283, 2364, 2365, and 2393 nm.According to varied fractional derivatives, we obtained different regression equations where different sensitive bands were independent variables.The fractional derivative played an important role to in finding major independents and the SMLR model and gave the regressive relationships.An optimal fractional prediction model for soil-available phosphorus content was finally provided.The method proved to be efficient in this study.In the future, we will consider neural network, machine learning, and similar algorithms to improve the prediction accuracy and other applications in the spectrum analysis field.

Is fractional order greater thanFigure 3 .
Figure 3. Flow chart of the quantitative analysis model.

Figure 3 .
Figure 3. Flow chart of the quantitative analysis model.

Table 1 .
Absolute value of maximum correlation coefficient of the 11th-order differential and its corresponding band.

Table 2 .
Stepwise multiple linear regression (SMLR) model for soil-available phosphorus content.
RMSE: root mean square error.

Table 3 .
Accuracy evaluation for the SMLR prediction model.