Prediction of Soil Oxalate Phosphorus using Visible and Near-Infrared Spectroscopy in Natural and Cultivated System Soils of Madagascar

: Phosphorus is among the main limiting nutrients for plant growth and productivity in both agricultural and natural ecosystems in the tropics, which are characterized by weathered soil. Soil bioavailable P measurement is necessary to predict the potential growth of plant biomass in these ecosystems. Visible and near-infrared reflectance spectroscopy (Vis-NIRS) is widely used to predict soil chemical and biological parameters as an alternative to time-consuming conventional laboratory analyses. However, quantitative spectroscopic prediction of soil P remains a challenge owing to the difficulty of direct detection of orthophosphate. This study tested the performance of Vis-NIRS with partial least square regression to predict oxalate-extractable P (Pox) content, representing available P for plants in natural (forest and non-forest including fallows and degraded land) and cultivated (upland and flooded rice fields) soils in Madagascar. Model predictive accuracy was assessed based on the coefficient of determination ( R 2 ), the root mean squared error of cross-validation (RMSECV), and the residual predictive deviation (RPD). The results demonstrated successful Pox prediction accuracy in natural ( n = 74, R ² = 0.90, RMSECV = 2.39, and RPD = 3.22), and cultivated systems ( n = 142, R ² = 0.90, RMSECV = 48.57, and RPD = 3.15) and moderate usefulness at the regional scale incorporating both system types ( R ² = 0.70, RMSECV = 71.87 and RPD = 1.81). These results were also confirmed with modified bootstrap procedures (N = 10,000 times) using selected wavebands on iterative stepwise elimination–partial least square (ISE–PLS) models. The wavebands relevant to soil organic matter content and Fe content were identified as important components for the prediction of soil Pox. This predictive accuracy for the cultivated system was related to the variability of some samples with high Pox values. However, the use of “pseudo-independent” validation can overestimate the prediction accuracy when applied at site scale suggesting the use of larger and dispersed geographical cover sample sets to build a robust model . Our study offers new opportunities for P quantification in a wide range of ecosystems in the tropics.

They are dominated by ferritic soils (FAO soil classification) which are generally acid with low available phosphorus [29,30]. The Vakinankaratra region is also among the rice-growing areas of Madagascar. The eastern sites are characterized by perhumid and humid climates with a mean annual rainfall of 2500 mm and a mean annual temperature of 18-24 °C [31,32]. This region is characterized by red and yellow ferralsols [33].
In the Vakinankaratra area, soil sampling at 15 cm depth was conducted in 142 farmer field plots under irrigated and upland rice systems during 2018 and 2019 (Table 1). In eastern Madagascar, soil samples were collected similarly during 2014 and 2015 from 74 forest and non-forest plots, the latter including fallow and degraded land systems [34]. The descriptive statistics of soil parameters for each studied site are reported in Table 2.    SOC-Soil organic carbon, Feox-oxalate-extractable Fe, Pox-oxalate-extractable P.

Laboratory Analyses
Soils were air-dried, ground, and sieved through 2 mm and 0.2 mm mesh prior to chemical analysis. All soil samples were analyzed for texture and for phosphorus and organic carbon contents. Oxalate-extractable P and Fe were determined following Schwertmann [10]. Soil organic carbon was determined by wet combustion using dichromate oxidation [35]. The separation of soil fractions for the soil texture analysis was carried out with the pipetting method in which soil samples pretreated with heat and H2O2 (35%) to remove organic matter are dispersed into clay, silt and sand fractions using NaOH.

Spectral Data Acquisition Using Vis-NIRS
Spectral data were recorded in a dark room at the Laboratoire des Radioisotopes, Antananarivo University using a Vis-NIR portable spectro-radiometer with 350-2500 nm range (ASD FieldSpec 4 Hi-Res, ASD Inc., Longmont, CO, USA). The recorded spectral resolution was 3 nm between 350 nm and 1000 nm and 6 nm between 1000 nm and 2500 nm. The output data were generated at 1 nm resolution using the cubic spline interpolation function in the ASD software (RS3 for Windows; ASD). Before each measurement, the spectrometer was calibrated using a white reference spectrum [17]. Soil samples were previously spread and leveled in optical-glass Petri dishes 85 mm in diameter. Five measurements were carried out at different positions for each soil sample. For each measurement, the instrument made 25 internal scans to optimize the signal-to-noise ratio. The generated spectra were averaged into one spectrum for each sample. Further details can be found in Kawamura et al. [36].

Spectral Analyses and Modeling Approaches
Prior to the modeling of Pox using PLS regression, data pre-processing was applied. Spectral data were reduced to 400-2400 nm by removing the spectral regions of 350-399 nm and 2401-2500 nm, in order to eliminate the influence of noise [36][37][38]. The reflectance spectra (R) were transformed into apparent absorbance (A = log (1/R)). To reduce noise and enhance the signals, first derivative reflectance (FDR) using a Savitzky-Golay smoothing filter [39] was used with an order 3 polynomial. The generated Vis-NIR spectra were mean-centered. Scatter correction using a standard normal variate transform (SNV) was applied to all spectra to reduce the particle size effect.
The modeling approach consisted of testing whether these reflectance spectra could be used to predict chemical data and identifying which spectral regions contribute to the prediction [40]. The PLS model incorporated the algorithms that extract a small number of latent factors as the independent variables relating to reflectance spectra, then used these factors in regression analysis with the chemical data as the dependent variables. The PLS regression model describing the relationship between soil spectra and measured soil Pox was built from the spectroscopic modeling. Leave-one-out cross-validation was used to select the best latent variable number and to avoid overfitting of the PLS regression model [36,38,41]. The optimum number of latent variables was chosen by minimizing both the root mean squared error (RMSE) and the number of factors or latent vectors.
Two PLS regression approaches were performed to estimate soil parameters: FS-PLS and iterative stepwise elimination regression (ISE-PLS) [36]. The FS-PLS is a standard PLS model using FDR datasets. ISE-PLS is a PLS model using a waveband elimination algorithm to remove noisy variables and to select those able to improve predictive performance.
The prediction accuracies were evaluated using the coefficient of determination (R 2 ), the root mean squared error of cross-validation (RMSECV), and the residual predictive deviation (RPD). The RPD is the ratio of standard deviation (SD) of the measured data to the standard error of prediction [42]. The model with the larger R² and RPD, and the smaller RMSE was considered the best model to predict soil Pox. It is generally accepted that an RPD value greater than 3 indicates an excellent predictive model for agricultural applications, and values between 2 and 3 indicate good predictive ability; values between 1.5 and 2 indicate an acceptable model requiring some improvement, and those below 1.5 indicate a poor predictive model [36,40].
To assess the predictive ability and reliability of the PLS models, a modified bootstrap procedure was performed [25]; the data was divided randomly into training (70%) and test (30%) data sets with a replacement for N = 10,000 times. In each process, a PLS model was developed using the training data set. Here, FS-PLS and ISE-PLS were developed using selected wavebands, and then the models were used to predict Pox in the test data set. The robustness of the prediction models was evaluated by the mean (±SD) values of R 2 and the root mean squared error of prediction (RMSEP) from 10,000 runs in the test data sets.
All data handling and statistical analysis were performed using MATLAB software (Version 9.3; The MathWorks, Sherborn, MA, USA) and R software version 3.1.3 [43] (R Core Team 2015).

Soil Characteristics by Chemical Analysis
The descriptive statistics for soil Pox as measured by chemical analysis for all data and by the system are summarized in Table 3. The coefficient of variation (CV) for Pox when all data were combined data indicated large Pox variability (148.57%) with a heterogeneous distribution. The Pox content averaged 87.66 mg·kg −1 across all data, ranging from 21.89 to 856.84 mg·kg −1 . As illustrated in Figure 2, the Pox level varied markedly within the cultivated rice systems, much more so than in the natural systems. Indeed, the highest Pox value recorded from natural systems was 57.93 mg·kg −1 with a CV of 22.23%, in contrast to that of the cultivated system, which was 856.84 mg·kg −1 with a CV of 133.56%. The third quartile cutoff, containing 75% of the data was 38.73 mg·kg −1 for all the natural systems and 106.62 mg·kg −1 for all the cultivated systems. The variation in P level seen in the substantial dispersion of the cultivated system data probably results from the different levels of fertilizer application to farmers' plots. Based on the study by Dardenne et al. [44], such wide variation (CV  50%) is recommended to achieve good NIRS calibration accuracy, indicating that our soil data were suitable for developing the spectroscopy model.  The difference in soil characteristics, including soil texture and the level of Pox in each system, can explain the high accuracy of prediction for each specific model. The correlation matrix between the Pox, SOC, and their related soil parameters are shown in Table 4. In the ensemble of the data, no significant correlation was observed for Pox and SOC. Among the significant relationships observed, soil parameters which could affect the Pox were SOC, sand, clay, and Fe contents. In the natural system, the Pox was positively correlated with SOC, clay, and Feox while negative relations were observed between Pox and Feox with sand content suggesting a direct effect of soil organic matter and texture on Pox contents. In the cultivated system, Pox is more affected by Feox than the SOC. Principal component ( Figure 3) and texture triangle ( Figure 4) analyses showed the contrasting properties of cultivated and natural soils. Natural system soils with a coarse texture were marked by low Pox and Feox content compared to the cultivated soils. Cultivated soils with a clayey loam texture had high Pox and lower SOC compared to natural soils, probably due to the soil management techniques applied.   Values in bold are significant at P < 0.05.
The mineral properties of soil are strongly related to their NIR-spectra absorption patterns [45]. Mouazen et al. [46] confirmed that soil texture affected the reflectance of the soil surface during NIR spectral measurement. Light scattering increased with increasing sand content due to a large amount of quartz in the sand fraction, which increases the intensity of spectral reflectance [47]. The spectral absorption related to some soil components (O-H and metal O-H, O-H in water) increased with increasing clay content [48].
Soil preparation, specifically tillage, could break up soil particles and aggregates and thereby accelerate the mineralization of soil organic matter, resulting in lower SOC compared to that of natural systems [49,50]. The level of Pox in the cultivated systems is due to fertilizer input and high mineralization rates, which released the soil nutrients (including phosphorus).

Model Prediction Accuracy for Oxalate-Extractable P Under Different Land-Use Systems
Predictions of Pox content were made using standard FS-PLS and ISE-PLS regressions for all combined systems and for each system individually. The PLS regression model predictions of Pox levels are shown in Table 5 and Figure 5. ISE-PLS regression always improved Pox prediction compared to FS-PLS regardless of the land-use system.  n, number of samples, NLV, number of latent variables; FS-PLS, full-spectrum partial least square regression; ISE-PLS, iterative stepwise elimination-partial least square regression.
ISE-PLS regression performed well; we attributed this to the importance of waveband selection for Pox prediction. The percentage of wavebands (NW%) used in the model was the ratio of the number of selected wavebands (NW) to the total wavebands for a full-spectrum (NW% = NW / 2001 bands × 100). The NW% results were 20.6% and 7.5% for cultivated and natural systems, respectively. In other words, fewer than 21% of available wavelengths contributed to the prediction of Pox for the cultivated system, with over 79% neither contributing to nor disturbing the predictions [51]. Selecting wavebands related to soil Pox and eliminating unusable wavebands improved the predictive ability of ISE-PLS for Pox compared to FS-PLS. This finding was in agreement with previous studies, in which fewer than 20% of wavelengths contained information relevant to the prediction of soil properties [25,36]. ISE-PLS produced excellent predictions of Pox in natural and cultivated systems, with RPD values greater than three and an R² of 0.90 (Table 5). Although the performance of model prediction is better for the cultivated system than the natural system, this prediction model accuracy seems to be associated with the large distribution of Pox values, which were characterized here by some samples with high Pox value. A high variation of the data set could affect the accuracy of NIRS calibration and predictive performance [52]. The performance of ISE-PLS models was better for individual land-use systems than for the combined data (R² = 0.70, RMSE = 71.9, RPD = 1.81). Stevens et al. [48] highlighted the importance of building local, more accurate models that are specific to a given geographical entity or soil type, suggesting that this feature is a strength, rather than a weakness, of this model.
The results of a modified bootstrap procedure were reported in Table 6 and Figure 6. Table 6 gives the mean values of R 2 and RMSEP between FS-PLS and ISE-PLS models for each system in the test data set (30%). Figure 6 illustrated the distribution of R 2 values in the test data set for each system. The accuracy of the model prediction with validation data showed that the ISE-PLS models predicted soil oxalate-extractable P more accurately than FS-PLS in terms of R 2 and RMSEP for all systems. The ISE-PLS resolved 70% to 88% of the variation in Pox whereas total variance explained with FS-PLS was from 14% to 50%. The best mean R 2 and the lowest RMSEP values were obtained from the natural system. The predictive ability and reliability of the ISE-PLS models were confirmed by this modified bootstrap procedure.   Figure 7 shows the selected wavebands used for the PLS regression modeling and prediction of Pox resulting from the preprocessing of the spectra using first derivative data. All samples showed similar spectral absorption features, characteristic of mineral and organic spectra as reported by several authors [15,53]. The most influential wavelengths in terms of the Pox prediction model were recorded in the visible light range (around 500 nm) and in the NIR range (at 1400 nm and from 2000 nm). The spectral absorption peaks in the Vis-NIRS region are related to iron oxides, clay minerals, and some functional groups of soil organic matter (SOM) [37]. In our study, the selected wavebands in the visible region common to both natural and cultivated systems (409, 430, 431, 443, 444, 591, and 592 nm) were associated with Fe-containing minerals (hematite, goethite) and dark-colored organic matter [54,55]. Residual minerals like hematite and goethite have an effect on the organic matter sorption of soil nutrients such as phosphorus [56].  (Table 7) [15,53,57]. The spectral bands at 1906-1907 nm, 2200-2235 nm, and 306-2400 nm, related to minerals and water [15,58], and that at 2270 nm, corresponding to gibbsite (an Al oxide mineral) [56,59], contribute to Pox prediction. The detection of the mineral and organic compounds in soils allow soil spectroscopy to predict Pox because of the potential relation between phosphorus and carbon content [22].

Properties of the Prediction-Relevant Wavebands
The number of selected wavelengths for Pox prediction is higher for cultivated systems than natural systems (Figure 7). The specific selected visible wavelengths for cultivated areas were 527-590 nm, associated with hematite and organic matter; and 763-870 nm, related to amine N-H, aromatic C-H, Fe 3+ , and ferric oxide [58][59][60]. The regions related to amine N-H at 1000 nm; aromatic C-H at 1100 nm; alkyl C-H at 1170 nm; O-H in water, CH2, lignin, and cellulose at 1464-1483 nm [61]; and Al-OH and kaolin at 2160-2164 nm [62] contributed to Pox prediction in the NIR regions. In contrast, the specific selected wavelengths for natural systems were 738-740 nm and 753 nm (amine N-H); and 1291 nm, related to lignin, starch, and protein [59][60][61]. These specific bands for each system demonstrated the variation in SOM and absorbents contributing to Pox prediction, which may explain the low accuracy of prediction when all data were combined. Table 7. Selected visible and near infra-red (NIR) wavelengths related to soil components and functional groups as reported in the literature, and common and specific selected wavelengths observed in our study.

Factors Influencing the Prediction Model Accuracy for Oxalate-Extractable P
According to our results, the main soil components which contributed to the prediction of Pox were organic matter and iron oxides, in both natural and cultivated systems. This is consistent with the study of Sørensen and Dalsgaard, which suggested that indirect relationships between soil P and organic components would be useful in soil P prediction using spectrographic methods [63], and that of Ludwig et al., in which a useful calibration of soil P, measured using the Olsen method, was found to positively correlate with SOC [22]. The present study showed that Pox is significantly correlated with SOC in natural and cultivated systems with coefficients of correlation (r) of 0.61 (P < 0.001) and 0.30 (P < 0.001), respectively, but not when all data are combined (r = 0.10, P = 0.15). Abdi et al. confirmed that successful prediction of soil total P is related to its significant correlation with soil carbon [42]. Soil P is obtainable by NIRS through covariation with other soil properties but this relation may vary between datasets [16], possibly explaining the lack of correlation between soil carbon and Pox for all combined data. The high correlation between Pox and SOC in natural systems may have resulted from the accumulation of P in the surface layer through litter input, while in the cultivated system P is lost with the harvested crops.
Phosphorus in soil was mainly fixed and in solid phase with Fe, Al in acidic soil, and Ca in alkaline soil. These elements are the main adsorbing agents for phosphate [64]. Khalid et al. [65] found that higher P availability under flooded soil was related to ammonium oxalate Fe. In our study, Pox and oxalate Fe (Feox) were significantly and positively correlated for cultivated, natural and all combined systems with correlation coefficients of 0.51 (P < 0.001), 0.45 (P < 0.001), and 0.55 (P < 0.001), respectively. In addition to the selected wavebands for Pox prediction in the Vis-NIRS regions associated with iron oxides, this result is in agreement with previous studies confirming the primary role of Fe in P sorption [7,25]. This highlights the importance of Fe to Pox prediction model development.
The high correlation between Pox, SOC, and Feox observed mainly under the natural system can be associated with the related properties of this system such as fallow without fertilization, justifying here the high accuracy of the model. As the high performance of model prediction in the cultivated system could be related to some samples with high Pox content (n = 15), a low prediction accuracy was obtained with selected samples excluding these high Pox samples (data not shown) suggesting that in the cultivated system under varying fertilization and other management practices may interfere and disturb the correlation of Pox with organic matter and iron oxides. The correlations of Pox with SOC and Feox are very weak for the selected samples (without the high Pox samples), r = 0.22, P < 0.05 and r = 0.03, P = 0.69, respectively. Application of ISE-PLS model in a large sample with a large geographical cover can help to understand the main drivers of Pox in the cultivated and the natural system in order to build more robust models.
In this study, the "pseudo-independent" approach of using a randomly selected sample (30%) for a validation in the modified bootstrap procedure or LOOCV, which provide more accurate PLS models in Pox prediction, presents a limitation. A previous study on SOC prediction using the first derivative Vis-NIRS PLS approach reports a stable model accuracy from a "pseudo-independent" validation (random selection of non-independent test samples), but the prediction models failed when applied for each site through site-hold validation (using samples from one site for validation and the samples from the remaining sites for model calibration) [66]. We attempted to perform the FS-PLS based on the site-hold cross-validation by considering the seven studied sites and found very poor results (data not shown). This may be due to the mixture of sites and land-use systems using a small number of samples. This suggests building models using a large geographical cover and relatively dispersed sample sets for a regional application.

Conclusions
Soil P is an important limiting nutrient for plant growth. An accurate assessment of available P is essential for effective fertilizer management in agriculture and sustainable management of ecosystems. Vis-NIRS is a simple and nondestructive method that can be used to predict several soil properties. This study demonstrates that Vis-NIRS models, in combination with ISE-PLS regression, can successfully predict soil oxalate-extractable phosphorus (Pox) in soil samples from natural and cultivated systems in Madagascar. Together, these methods were able to estimate soil Pox in both systems with high accuracy (R² = 0.90, RPD > 3) using fewer than 21% of wavelengths in the Vis-NIRS region. ISE-PLS regression outperformed FS-PLS regression. However, model accuracy for cultivated systems was affected by some samples with high Pox value. The effective wavebands for the two land-use systems were associated with Fe and Al oxides, and organic components. The accuracy of Pox prediction was related to its significant correlation with soil organic carbon and iron content. The use of "pseudo-independent" validation in the current study can also overestimate the prediction accuracy when applied at site scale suggesting the use of larger and dispersed geographical cover sample sets to build a robust model in the future. The Vis-NIRS approach has potential as a tool for rapid soil P evaluation and may be useful for soil management. Further investigations using large numbers of soil samples for external validation of the Vis-NIRS approach are required to enable application at regional and national scales.