Estimating Soil Organic Matter (SOM) Using Proximal Remote Sensing: Performance Evaluation of Prediction Models Adjusted at Local Scale in the Brazilian Cerrado

: The quantiﬁcation of soil organic matter (SOM) has increased over the years, especially in the Brazilian Cerrado region, one of the most important areas for grain production in the country. In this area, SOM content tends to be low, which directly impacts the physical, chemical and biological quality of soils. Thus, the use of spectroradiometry has been widely evaluated to investigate whether it can be used as a faster, more reliable and cheaper solution to meet the SOM estimation. In this context, the objective of the present paper was to evaluate the performance of a local spectral model for SOM prediction generated through the spiking strategy. The research was developed in the municipality of Passos, Minas Gerais State, located in the Brazilian Cerrado. Soil samples (0–0.2 m and 0.2–0.4 m depths) were collected in a zigzag pattern and split in calibration of the local models from a test area (90 soil samples) and recalibration and validation from a target area (46 soil samples). After this stage, the SOM contents were determined in a laboratory, and the spectral responses (350–2500 nm) of each soil sample were collected. From the target area, 10, 25 and 50% of soil spectra were selected for recalibration of the local models generated for the test area. Although median results were observed in the post-recalibration, due to the type of sample selected and the relative similarity among the spectral curves of both areas, improvement was observed for all statistical indices, especially when using 50% (23) of samples for recalibration of the local models, reaching r 2 = 0.43, RMSEP = 2.34 gdm − 3 and RPIQ = 4.58. These results are important for the SOM estimation in the Brazilian Cerrado considering its importance to the food security and socioeconomic activities. However, considering the lack of similar research in the study area, it is necessary to further investigate the development of spectral models on a local scale and their contribution to improve the identiﬁcation of SOM spatial variability.


Introduction
The measurement of soil organic matter (SOM) content by spectroradiometry has been widely tested due to its relation to the physical, chemical and biological properties of soil [1][2][3].For years, researchers have studied the relationship between the content of organic matter present in the soil and its spectral response and have concluded that this is one of the attributes that has the greatest influence on soil reflectance [4][5][6].
Soil organic matter is a primary constituent of soil color, showing a close relationship with its reflected energy, influencing the shape and albedo of the spectral curve throughout the entire optical spectrum.In the literature, different spectral intervals are used in algorithms to predict its content in the soil [7].Analyzes employing the near-infrared region (NIR) are successful for estimating soil organic carbon due to its sensitivity to functional groups C-H, O-H and N-H [8] that dominate organic matter.Although SOM is more frequently estimated by the visible region (Vis) and NIR, the shortwave infrared region (SWIR) has shown satisfactory results.
SOM content and composition are factors that influence soil spectral reflectance, having the ability to mask the absorption features of other constituents, as demonstrated by Heil and Schmidhalter [9].Evaluating highly weathered soils, Madeira Neto [10] found alterations in the spectral curves after the removal of SOM.Demattê et al. [11] observed an increase in reflectance in the spectral range from 350 to 2500 nm (Vis-NIR-SWIR) after the removal of organic matter from soil samples collected in a Brazilian tropical environment.
The concentration of organic matter is inversely proportional to its spectral response [12,13].Baumgardner et al. [14] reported that this property influences the spectral response of the soil when its content is above 2% (20 g kg −1 ).According to Viscarra Rossel et al. [7], high levels of organic matter cause an intense decline in reflectance across the spectrum, masking other soil attributes.On the other hand, when the content is below 20 g kg −1 , other soil constituents, such as 1:1 and 2:1 clay minerals and Fe and Al oxides, become more influential in the spectral behavior than organic matter.
Mathews et al. [15] observed that there was a significant decrease in reflectance in the region from 500 to 1150 nm in soil samples with high SOM content (128 g kg −1 ), but this behavior was not observed with samples containing"' between 20 and 30 g kg −1 of organic matter.When evaluating the best wavelengths to predict MOS content, Krishnan et al. [16] concluded that the visible region provided the best correlations, with maximum correlation coefficients of 0.98 for the 564 and 623 nm bands.
In a more recent study, Chicati et al. [17], using an imaging sensor (600-1100 nm), found a correlation coefficient of 0.65 for the prediction of SOM.On the other hand, Nanni et al. [18] obtained greater responses for the wavelengths of 580,1401,1900,1940, 2180 and 2200 nm, reaching a determination coefficient equal to 0.90 when using a hyperspectral imaging sensor to estimate organic matter.
Reis et al. [19], using a hyperspectral imaging sensor, obtained a SOM determination coefficient equal to 0.75, with greater responses at wavelengths close to 600 and 900 nm.In contrast, Cezar et al. [20], using a non-imaging sensor associated with chemometrics, did not obtain satisfactory results for the prediction of organic matter.It should be noted that these researchers worked with a spectral model that is considered large, but the spatial variability of the samples was very high, which was not captured by the calibrated model.
In turn, Lazaar et al. [6], using a non-imaging sensor to estimate soil organic matter from two spectral reading protocols, obtained an average determination coefficient above 0.85.Such researchers found in the study that the wavelengths that most contributed to the prediction were those present in the near infrared region.The authors concluded that the use of Vis/NIR/SWIR spectroscopy associated with partial least-squares regression (PLSR) is a useful tool to analyze and predict soil organic matter.
Guerrero et al. [21], after employing several model calibration strategies, using the spiking technique, observed a notable improvement in the accuracy of soil organic carbon prediction, using the SWIR region.
Qiao et al. [22], studying the estimation of soil organic matter using a multispectral sensor associated with several spectral data pre-processing techniques, concluded that the applied techniques can significantly improve the quality of prediction models.In this study, an r 2 value equal to 0.98 and a high correlation between organic matter content and wavelengths located at 417, 1853, 1000 and 2412 nm were observed.However, as already highlighted by Liu et al. [8], although there are different results as well as different algorithms used to estimate soil organic matter from spectroscopy, a key issue to be evaluated concerns the efficiency in producing global, state (large) and local (small) models for prediction of this and other chemical and physical attributes of the soil.
Currently, the use of spectral libraries has gained strength in Europe [23], Australia [24] and Brazil [25,26].However, its use does not guarantee satisfactory prediction, since calibrated models using spectral data may not be robust enough to estimate soil attributes in new areas with samples external to the spectral library-a common condition that may generate inaccurate or biased results [27].
Adding to this, such libraries are created using thousands of soil samples, which might be expansive since, in addition to obtaining the spectral curves, laboratory analytical results (wet chemistry) are necessary for calibrating the models for the spectral prediction of soil attributes.
In this context of uncertainties, the need to find more economical and operationally alternatives has become necessary.Developing prediction models on a local and regional scale seems to be a plausible alternative.However, reducing the number of samples used in the calibration of prediction models requires attention, as it can lead to a reduction in accuracy.According to Shi et al. [28], to adequately describe the spatial variability of soil properties, a sufficient number of samples must be collected for spectroscopic modeling.
Considering that the ideal number of samples is unknown, as it can vary from region to region due to soil characteristics, use, geology, terrain geomorphology and SOM content, among others, a way to overcome this limitation would be through the selection and introduction of some spectral samples obtained from new areas, within local, regional or global models (Spiking).According to Wetterlind et al. [29] and Guerrero et al. [30], this process tends to improve the prediction of soil attributes.
In the light of the current limitations, the objective of this research was to evaluate the performance of a local model for predicting organic matter, recalibrated with soil samples selected from a target area located in the Brazilian Cerrado.It is expected that the use of the spiking technique could help expand the prediction potential of the recalibrated local models, thus allowing the use of smaller models, which would lead to cost reduction and faster determination of the SOM.

Study Area
The two study areas (Figure 1) are located in the municipality of Passos, state of Minas Gerais, Brazil, and are currently used for agriculture, pasture and forest.The test area has 32 ha and is located at coordinates 20  [31].The average annual temperature is 21.5 • C, and the rainfall is 1288 mm [32].
Regarding current use, the test area has been sporadically used for agriculture and pasture, spending most of the year fallow; on the other hand, the target area has been used for agriculture.The relief of both properties presents a slope that varies from 0 to 8%.The geology is formed by a predominance of silt-clay metasediments, represented by shales [33].The soil of both areas is classified as Ferralsol with medium texture [34].The test area has a value of 320 g kg −1 clay, 90 g kg −1 silt and 590 g kg −1 sand at the depths of 0-0.20 m and 0.20-0.40m.The target area has a value of 310 g kg −1 clay, 210 g kg −1 silt and 480 g kg −1 sand, for 0-0.20 m, and 350 g kg −1 clay, 200 g kg −1 silt and 450 g kg −1 sand, for 0.20-0.40m depths.
Both areas are within the Cerrado biome, where arable fields tend to have low levels of organic matter.This region demands thousands of soil analyses annually, as it has low natural fertility, which forces farmers to invest in physical and chemical soil analyses for knowledge and maintenance of the productive potential.

Soil Samples
Soil samples were collected at 0-0.2 m and 0.2-0.4m depths, with free walking in a zigzag pattern.A total of 90 soil samples were collected from 45 points demarcated in the test area and 46 soil samples from 23 points demarcated in the target area.After collection, the samples were sent to the foliar and soil analysis laboratory at the State University of Minas Gerais for the determination of soil organic carbon.
Initially, all samples were dried in an oven at 45 °C, crushed and subjected to a 2 mm mesh sieve (TFSA).Organic carbon was determined following the methodology recommended by the Agronomic Institute of Campinas (IAC) [35].The organic matter was obtained by multiplying the total organic carbon by 1.724, since it has been found that in the average humus composition, carbon participates with 58% [36].

Obtaining the Spectra of Soil Samples
After separating a small amount of soil from each sample described in the previous section, these soils were placed in petri dishes measuring 9 cm in diameter by 1.5 cm in height for later reading.Spectral readings were taken in a controlled environment of humidity and light using a non-imaging spectroradiometer, ASD Fieldspec 3 JR, which covers the spectral range from 350 nm to 2500 nm and has a spectral resolution from 3 nm to 700 nm and 10 nm from 700 nm to 2500 nm.The equipment was programmed to perform 50 readings per sample, thus generating an average spectral curve.To collect radiometric data, the spectroradiometer was initially optimized to eliminate internal noise.The sensors were calibrated using a standard white Spectralon plate with 100% reflectance [37] as performed by Rodrigues et al. [38].
The optical fiber reader was placed in a vertical position 8 cm away from the support platform for samples, thus generating a reading area of approximately 2 cm 2 .A 650 W lamp was used as the light source, with a non-collimated beam for the target plane positioned 35 cm from the platform and at an angle of 30° in relation to the horizontal plane [39].A summarized scheme of the analyses can be observed in the flowchart (Figure 2).Both areas are within the Cerrado biome, where arable fields tend to have low levels of organic matter.This region demands thousands of soil analyses annually, as it has low natural fertility, which forces farmers to invest in physical and chemical soil analyses for knowledge and maintenance of the productive potential.

Soil Samples
Soil samples were collected at 0-0.2 m and 0.2-0.4m depths, with free walking in a zigzag pattern.A total of 90 soil samples were collected from 45 points demarcated in the test area and 46 soil samples from 23 points demarcated in the target area.After collection, the samples were sent to the foliar and soil analysis laboratory at the State University of Minas Gerais for the determination of soil organic carbon.
Initially, all samples were dried in an oven at 45 • C, crushed and subjected to a 2 mm mesh sieve (TFSA).Organic carbon was determined following the methodology recommended by the Agronomic Institute of Campinas (IAC) [35].The organic matter was obtained by multiplying the total organic carbon by 1.724, since it has been found that in the average humus composition, carbon participates with 58% [36].

Obtaining the Spectra of Soil Samples
After separating a small amount of soil from each sample described in the previous section, these soils were placed in petri dishes measuring 9 cm in diameter by 1.5 cm in height for later reading.Spectral readings were taken in a controlled environment of humidity and light using a non-imaging spectroradiometer, ASD Fieldspec 3 JR, which covers the spectral range from 350 nm to 2500 nm and has a spectral resolution from 3 nm to 700 nm and 10 nm from 700 nm to 2500 nm.The equipment was programmed to perform 50 readings per sample, thus generating an average spectral curve.To collect radiometric data, the spectroradiometer was initially optimized to eliminate internal noise.The sensors were calibrated using a standard white Spectralon plate with 100% reflectance [37] as performed by Rodrigues et al. [38].
The optical fiber reader was placed in a vertical position 8 cm away from the support platform for samples, thus generating a reading area of approximately 2 cm 2 .A 650 W lamp was used as the light source, with a non-collimated beam for the target plane positioned 35 cm from the platform and at an angle of 30 • in relation to the horizontal plane [39].A summarized scheme of the analyses can be observed in the flowchart (Figure 2).

Data Processing and Statistical Analysis
Raw data were pre-processed to improve the stability of the regression models described by Milos et al. [40].Each spectral curve was submitted to correction by the d trending method, which removes non-linear trends in spectroscopic data [41].
The recalibrated models were built from multivariate PLSR statistics (partial leas squares regressions) using Unscrambler version 10.3 software package (CAMO, Inc., Osl Norway).Their performance was evaluated following the methodology described by B et al. [42] (2021), using the coefficient of determination (r 2 ), square root of the mea prediction error (RMSEP), interquartile performance rate (RPIQ) and systematic err (BIAS), which were calculated by the following equations: where n is the number of samples; yi is the observed organic matter value for sample  is the predicted organic matter for sample I; and  is the mean organic matter for a samples [42].The interquartile performance ratio is the difference between the third an first quartiles (IQ = Q3 − Q1).The predictive power of the models was evaluate considering strong predictive ability when r 2 ≥ 0.75, acceptable ability when 0.5 ≤ r 2 < 0. and unacceptable ability when r 2 < 0.5 [43].

Data Processing and Statistical Analysis
Raw data were pre-processed to improve the stability of the regression models as described by Milos et al. [40].Each spectral curve was submitted to correction by the de-trending method, which removes non-linear trends in spectroscopic data [41].
The recalibrated models were built from multivariate PLSR statistics (partial leastsquares regressions) using Unscrambler version 10.3 software package (CAMO, Inc., Oslo, Norway).Their performance was evaluated following the methodology described by Bao et al. [42] (2021), using the coefficient of determination (r 2 ), square root of the mean prediction error (RMSEP), interquartile performance rate (RPIQ) and systematic error (BIAS), which were calculated by the following equations: where n is the number of samples; yi is the observed organic matter value for sample I; ŷi is the predicted organic matter for sample I; and y is the mean organic matter for all samples [42].The interquartile performance ratio is the difference between the third and first quartiles (IQ = Q3 − Q1).The predictive power of the models was evaluated considering strong predictive ability when r 2 ≥ 0.75, acceptable ability when 0.5 ≤ r 2 < 0.75 and unacceptable ability when r 2 < 0.5 [43].
The quality classification of the models considering the RPIQ was adopted following the criteria defined by Veum et al. [44] and Thomas et al. [45], where RPIQ ≥ 2.70 represents models with good performance, 2.69 > RPIQ ≥ 1.89 represents models with moderate performance and RPIQ < 1.88 represents models with low performance.The BIAS was obtained by calculating the difference between the reference and predicted values through the spectral curves for the Vis/NIR/SWIR regions [46].For the RMSE, although there are no fixed value ranges for its classification since it is a dimensionless metric, low values indicate good calibration of the predictive models [47].

Selection of Samples from the Target Area for Recalibration of the Test Area Models
A total of 10%, 25% and 50% of samples from the target area were selected for recalibration of the local organic matter prediction models obtained for the test area.The selection was performed according to Cezar et al. [46], applying the principal component analysis on the spectral curves at this stage in order to define which would be the most representative samples and capable of transferring the maximum existing variability in the target area to the main model.In this step, samples distributed in the center and edges of the spectral space were selected, considering the first two principal components (PC1 and PC2).As described in Section 2.4, the Unscrambler software was used in the selection process through the principal component analysis module.

Evaluation of Local Prediction Models Adjusted for the Test Area
Initially, a local prediction model was generated using the test area dataset (not recalibrated).This model was fitted with 89 soil samples from this area and validated with 46 soil samples from the target area.At a second stage, a second local prediction model was generated using the spiking technique, which serves to mark some spectra of samples of interest from the target area and introduce them into the original calibration matrix of the test area.This may allow the new recalibrated model to capture most of the existing variability in the target area, thus enabling better estimates of soil attributes [48][49][50].
Therefore, 94 (89 + 5), 101 (89 + 12) and 112 (89 + 23) soil samples were used for recalibration, while model validation was tested with 41, 34 and 23 samples, respectively, named independent samples (Figure 3).To assess whether sample selection and recalibration were efficient in improving the models created on a local scale, statistical parameters were compared before and after the recalibration process, as presented in Section 2.4.
Remote Sens. 2023, 15, x FOR PEER REVIEW 6 of 16 The quality classification of the models considering the RPIQ was adopted following the criteria defined by Veum et al. [44] and Thomas et al. [45], where RPIQ ≥ 2.70 represents models with good performance, 2.69 > RPIQ ≥ 1.89 represents models with moderate performance and RPIQ < 1.88 represents models with low performance.The BIAS was obtained by calculating the difference between the reference and predicted values through the spectral curves for the Vis/NIR/SWIR regions [46].For the RMSE, although there are no fixed value ranges for its classification since it is a dimensionless metric, low values indicate good calibration of the predictive models [47].

Selection of Samples from the Target Area for Recalibration of the Test Area Models
A total of 10%, 25% and 50% of samples from the target area were selected for recalibration of the local organic matter prediction models obtained for the test area.The selection was performed according to Cezar et al. [46], applying the principal component analysis on the spectral curves at this stage in order to define which would be the most representative samples and capable of transferring the maximum existing variability in the target area to the main model.In this step, samples distributed in the center and edges of the spectral space were selected, considering the first two principal components (PC1 and PC2).As described in Section 2.4, the Unscrambler software was used in the selection process through the principal component analysis module.

Evaluation of Local Prediction Models Adjusted for the Test Area
Initially, a local prediction model was generated using the test area dataset (not recalibrated).This model was fitted with 89 soil samples from this area and validated with 46 soil samples from the target area.At a second stage, a second local prediction model was generated using the spiking technique, which serves to mark some spectra of samples of interest from the target area and introduce them into the original calibration matrix of the test area.This may allow the new recalibrated model to capture most of the existing variability in the target area, thus enabling better estimates of soil attributes [48][49][50].
Therefore, 94 (89 + 5), 101 (89 + 12) and 112 (89 + 23) soil samples were used for recalibration, while model validation was tested with 41, 34 and 23 samples, respectively, named independent samples (Figure 3).To assess whether sample selection and recalibration were efficient in improving the models created on a local scale, statistical parameters were compared before and after the recalibration process, as presented in Section 2.4.

Descriptive Statistics
The statistics of soil organic matter content in both study areas point to a higher average value in the target area (Table 1).It was found that the maximum SOM value found for the target area was close to 50 g dm −3 , e.g., 5%, a content considered high for soil patterns in the Brazilian Cerrado.The histograms demonstrated that only the target area showed normal distribution (Figure 4).The Shapiro-Wilk test for the target dataset showed W = 0.97 and p = 0.24, that is, p > 0.05.On the other hand, the test area showed an asymmetrical distribution for the organic matter attribute, with W = 0.96 and p = 0.006.

Descriptive Statistics
The statistics of soil organic matter content in both study areas point to a higher average value in the target area (Table 1).It was found that the maximum SOM value found for the target area was close to 50 g dm −3 , e.g., 5%, a content considered high for soil patterns in the Brazilian Cerrado.The histograms demonstrated that only the target area showed normal distribution (Figure 4).The Shapiro-Wilk test for the target dataset showed W = 0.97 and p = 0.24, that is, p > 0.05.On the other hand, the test area showed an asymmetrical distribution for the organic matter attribute, with W = 0.96 and p = 0.006.
When working with soil attributes, it is common to obtain a non-normal distribution due to its complexity [51], which is maximized in the study region, given the wide variation in relief and cultural management adopted by rural producers.Thus, it was decided not to transform the data, as this would change the real scale of soil organic matter values, compromising the relationship between with the spectral curves.

Description of Spectral Curves
The spectral curves obtained for the test area showed a relatively similar pattern in terms of absorption bands for depths from 0 to 0.20 m and from 0.20 to 0.40 m (Figure 5).When working with soil attributes, it is common to obtain a non-normal distribution due to its complexity [51], which is maximized in the study region, given the wide variation in relief and cultural management adopted by rural producers.Thus, it was decided not to transform the data, as this would change the real scale of soil organic matter values, compromising the relationship between with the spectral curves.

Description of Spectral Curves
The spectral curves obtained for the test area showed a relatively similar pattern in terms of absorption bands for depths from 0 to 0.20 m and from 0.20 to 0.40 m (Figure 5).
In the spectral region from 450 to 480 nm, a characteristic peak of the presence of goethite was observed.From 850 to 900 nm, absorption characteristics of the presence of hematite and goethite iron oxides were detected, while at 1400 nm and 1900 nm, the absorption occurred due to the presence of water and OH − ions [52].According to Ten Caten et al. [53], when the absorption bands occur at the same time at 1400 nm and 1900 nm as detected in this research, this characterizes the presence of water bound to the soil matrix.On the other hand, if they occur only at 1400 nm, this indicates the presence of hydroxyl present in minerals of soil.In the spectral region from 450 to 480 nm, a characteristic peak of the presence o goethite was observed.From 850 to 900 nm, absorption characteristics of the presence o hematite and goethite iron oxides were detected, while at 1400 nm and 1900 nm, the ab sorption occurred due to the presence of water and OH − ions [52].According to Ten Caten et al. [53], when the absorption bands occur at the same time at 1400 nm and 1900 nm a detected in this research, this characterizes the presence of water bound to the soil matrix On the other hand, if they occur only at 1400 nm, this indicates the presence of hydroxy present in minerals of soil.
In the regions centered at 2200 nm and 2265 nm, absorption characteristics of th presence of Kaolinite and Gibbsite, respectively, were observed, as discussed by Dematt et al. [54], Poppiel et al. [55] and Rodrigues et al. [56].Similar results for absorption band and points of greatest reflectance were observed for the spectral curves of the target area (Figure 6).Likewise, the reflectance factor intensities were concentrated between 0.20 and 0.50, with the higher values associated with 0.20-0.40m depth.

Statistical Indices of Predictive Models to Organic Matter
The results obtained during the calibration, cross-validation and prediction phase ar shown below in Table 2.In the regions centered at 2200 nm and 2265 nm, absorption characteristics of the presence of Kaolinite and Gibbsite, respectively, were observed, as discussed by Demattê et al. [54], Poppiel et al. [55] and Rodrigues et al. [56].Similar results for absorption bands and points of greatest reflectance were observed for the spectral curves of the target area (Figure 6).Likewise, the reflectance factor intensities were concentrated between 0.20 and 0.50, with the higher values associated with 0.20-0.40m depth.In the spectral region from 450 to 480 nm, a characteristic peak of the presence o goethite was observed.From 850 to 900 nm, absorption characteristics of the presence o hematite and goethite iron oxides were detected, while at 1400 nm and 1900 nm, the ab sorption occurred due to the presence of water and OH − ions [52].According to Ten Caten et al. [53], when the absorption bands occur at the same time at 1400 nm and 1900 nm a detected in this research, this characterizes the presence of water bound to the soil matrix On the other hand, if they occur only at 1400 nm, this indicates the presence of hydroxy present in minerals of soil.
In the regions centered at 2200 nm and 2265 nm, absorption characteristics of th presence of Kaolinite and Gibbsite, respectively, were observed, as discussed by Demattê et al. [54], Poppiel et al. [55] and Rodrigues et al. [56].Similar results for absorption band and points of greatest reflectance were observed for the spectral curves of the target area (Figure 6).Likewise, the reflectance factor intensities were concentrated between 0.20 and 0.50, with the higher values associated with 0.20-0.40m depth.

Statistical Indices of Predictive Models to Organic Matter
The results obtained during the calibration, cross-validation and prediction phase are shown below in Table 2.

Statistical Indices of Predictive Models to Organic Matter
The results obtained during the calibration, cross-validation and prediction phase are shown below in Table 2.
The results evidenced the highest accuracies when 112 samples were used for calibration and cross-validation.In all cases, the increase in the number of samples in the recalibrated set reflected in a reduction in error (RMSEC, RMSECV, RMSEP) as well as in BIAS (Table 2) when compared to the model without recalibration (89 samples).In turn, the RPIQ values followed the opposite trend, reaching a maximum value of 4.58 for the model generated with 112 soil samples.

Analysis of Spectral Curves
The relatively similar pattern in terms of absorption bands for the test areas at 0-0.20 m and 0.20-0.40m depths (Figure 5) occurred due to the fact that the soil in the study area was formed by silt-clay metasediments which tend to present mineralogy and texture with little variation along the profile (see Section 2.1).
Regarding the reflectance intensity, it was observed that most of the time, the highest responses occurred for the spectral curves belonging to the samples collected at a 0.20-0.40m depth.In this layer, it was found that the average contents of organic matter (16.33 g dm −3 ) at 0-0.20 m depth (22.56 g dm −3 ) were lower, promoting a greater overlap of spectral response (and higher reflectance) of the sand fraction, with a concentration of 590 g kg −1 , in agreement with Demattê et al. [52], Nanni et.al. [39] and Heil and Schmidhalter [9].
The similarities observed in the intensity of certain spectral curves between different depths were associated with the soil tillage in the area, which was sporadically used to control weeds and loosen soil compaction.In this case, a change in the concentration of organic matter was detected between the superficial and subsurface layers due to remobilization.
The target area had similar spectral behavior to the test area.It was found that the average levels of soil organic matter (15.65 g dm −3 ) at 0.20-0.40m depth were lower than those found at 0-0.20 m (21.06 g dm −3 ).The similarity between the spectral curves of both areas was associated with the source material, which was the same texture, which was variable but not much (see Section 2.1), as well as to the organic matter contents, which, despite having distant average values, presented intervals (minimum and maximum values) with about 1% difference (Table 1).

Local Soil Organic Matter Prediction Models
The recalibrated prediction models (using the spiking technique) for the test area, containing 94, 101 and 112 soil samples, showed slightly better results when compared with the non-recalibrated model from 89 samples (Table 2).The relative improvement of most statistical indices after the recalibration of the models is linked to the increase in the number of samples used in the recalibration process, as highlighted by Hong et al. [50].
However, despite this improvement, from the r 2 results obtained, the recalibrated models from 10, 25 and 50% of the selected samples were classified as having low predictive capacity [43], even though the values of RPIQ demonstrate that the models fall into the good performance class [44].
It was expected that the use of 50% of samples from the selected target area for recalibration of the local model would be enough to significantly improve its accuracy, as discussed by Guerrero et al. [49], leading to higher-quality SOM prediction models; however, this did not happen since the samples selected from the target area were not able to transmit all of the existing variability in the soil to the local models, as highlighted by Guerrero et al. [21].A similar result was obtained by Cezar et al. [46] during the evaluation of strategies for estimating organic matter using the spiking technique, a condition that leads us to believe that the type of sample selected, as well as the selection strategy, significantly influences the recalibration of the local model, agreeing with Nawar and Mouazen [57].
The aforementioned statement can be ratified through analysis of the regression coefficients (β) of the SOM prediction models (Figure 7), demonstrating the influence of each spectral band on the PLSR models [38].Similar patterns of PLSR models can be seen for all datasets, indicating the same structure even after recalibration.
most statistical indices after the recalibration of the models is linked to the increase in the number of samples used in the recalibration process, as highlighted by Hong et al. [50].
However, despite this improvement, from the r 2 results obtained, the recalibrated models from 10, 25 and 50% of the selected samples were classified as having low predictive capacity [43], even though the values of RPIQ demonstrate that the models fall into the good performance class [44].
It was expected that the use of 50% of samples from the selected target area for recalibration of the local model would be enough to significantly improve its accuracy, as discussed by Guerrero et al. [49], leading to higher-quality SOM prediction models; however, this did not happen since the samples selected from the target area were not able to transmit all of the existing variability in the soil to the local models, as highlighted by Guerrero et al. [21].A similar result was obtained by Cezar et al. [46] during the evaluation of strategies for estimating organic matter using the spiking technique, a condition that leads us to believe that the type of sample selected, as well as the selection strategy, significantly influences the recalibration of the local model, agreeing with Nawar and Mouazen [57].
The aforementioned statement can be ratified through analysis of the regression coefficients (β) of the SOM prediction models (Figure 7), demonstrating the influence of each spectral band on the PLSR models [38].Similar patterns of PLSR models can be seen for all datasets, indicating the same structure even after recalibration.The non-recalibrated local model (89 soil samples) and the models recalibrated using the spiking methodology (94, 101 and 112 soil samples) were similar in terms of the intensity and spectral location of significant bands or intervals.The most important bands in all situations were those centered in the range of 552, 760, 1064, 1408, 1718, 2193 and 2268 nm, in agreement with Milos et al. [40].This behavior is related to the overtones and combinations of fundamental vibrations and reflects the stretching and bending of chemical bonds, such as O-H, C-H, and N-H [8,58], present in the structures of organic matter.
Another explanation for why the post-calibration results were lower than expected is that, although the selected samples from the target area had different chemical and The non-recalibrated local model (89 soil samples) and the models recalibrated using the spiking methodology (94, 101 and 112 soil samples) were similar in terms of the intensity and spectral location of significant bands or intervals.The most important bands in all situations were those centered in the range of 552, 760, 1064, 1408, 1718, 2193 and 2268 nm, in agreement with Milos et al. [40].This behavior is related to the overtones and combinations of fundamental vibrations and reflects the stretching and bending of chemical bonds, such as O-H, C-H, and N-H [8,58], present in the structures of organic matter.
Another explanation for why the post-calibration results were lower than expected is that, although the selected samples from the target area had different chemical and physical properties\from those of the test area, they did not have a marked spectral variation.Thus, both sets of data occupied the same spectral space, as observed by Nawar and Mouazen [57], which led the recalibration to lack a significant effect, agreeing with Cezar et al. [20].
The principal component analysis demonstrated large similarity between the spectral set of the target and test areas (Figure 8), except for three samples.There, three samples were held out of the ellipse, generated by the test of Hotelling's T 2 at 1% probability, which evidences the presence of outliers [59].
These results indicate that the recalibration of models on a local scale, using the spiking technique, will have a greater effect when the samples selected from the target area have a significant level of spectral difference and spatial variability in relation to the data set from the test area.Thus, the predictive capacity of the recalibrated models might be enhanced when applied to new areas since the range of soil attributes has a larger possibility to meet the ranges found for the samples used in the recalibration, as pointed out by Nawar and Mouazen [60].
physical properties\from those of the test area, they did not have a marked spectral variation.Thus, both sets of data occupied the same spectral space, as observed by Nawar and Mouazen [57], which led the recalibration to lack a significant effect, agreeing with Cezar et al. [20].
The principal component analysis demonstrated large similarity between the spectral set of the target and test areas (Figure 8), except for three samples.There, three samples were held out of the ellipse, generated by the test of Hotelling's T 2 at 1% probability, which evidences the presence of outliers [59].These results indicate that the recalibration of models on a local scale, using the spiking technique, will have a greater effect when the samples selected from the target area have a significant level of spectral difference and spatial variability in relation to the data set from the test area.Thus, the predictive capacity of the recalibrated models might be enhanced when applied to new areas since the range of soil attributes has a larger possibility to meet the ranges found for the samples used in the recalibration, as pointed out by Nawar and Mouazen [60].
Since the taxonomic classification for the soil class of both areas was similar, as well as their geology, with variances in only the granulometric aspects and soil use, low spectral variability was observed (Figures 5 and 6).According to Ramirez Lopes et al. [61], spectral similarities may reflect similarity in soil composition.Therefore, it is also considered that the average SOM content was below 2% for the test area and close to 3% for the target area (Table 1), without a significant overlap of the effects of organic matter on the other soil attributes or large spectral variability [11,14].
Although the areas have been managed differently over the years, it should be noted that, as they are located in a tropical region, the accumulation and maintenance of SOM is very slow, requiring more than 10 years to achieve significant increments in its concentration [62].Thus, large variations in terms of the concentration of this attribute are not expected for agricultural soils distributed in the Brazilian Cerrado, except for places where a mix of crop types, associated with crop rotation for long years, is adopted, which does not occur in the study area.
Despite what was observed, it should be noted that when compared to work carried out by other researchers in other countries (but mainly in Brazil), the results obtained here are encouraging for estimating organic matter in the Cerrado environment by remote sensing; however, many challenges must be overcome.Nanni et al. [39], using this technique to estimate organic matter in Paranaense soils (not Cerrado), obtained an r 2 value equal to 0.31, an RMSEP equal to 6.88 g dm −3 and a Bias equal to 4.26, while our research obtained an r 2 value equal to 0.43, an RMSEP equal to 2.34 g dm −3 and Bias equal to −0.27.Since the taxonomic classification for the soil class of both areas was similar, as well as their geology, with variances in only the granulometric aspects and soil use, low spectral variability was observed (Figures 5 and 6).According to Ramirez Lopes et al. [61], spectral similarities may reflect similarity in soil composition.Therefore, it is also considered that the average SOM content was below 2% for the test area and close to 3% for the target area (Table 1), without a significant overlap of the effects of organic matter on the other soil attributes or large spectral variability [11,14].
Although the areas have been managed differently over the years, it should be noted that, as they are located in a tropical region, the accumulation and maintenance of SOM is very slow, requiring more than 10 years to achieve significant increments in its concentration [62].Thus, large variations in terms of the concentration of this attribute are not expected for agricultural soils distributed in the Brazilian Cerrado, except for places where a mix of crop types, associated with crop rotation for long years, is adopted, which does not occur in the study area.
Despite what was observed, it should be noted that when compared to work carried out by other researchers in other countries (but mainly in Brazil), the results obtained here are encouraging for estimating organic matter in the Cerrado environment by remote sensing; however, many challenges must be overcome.Nanni et al. [39], using this technique to estimate organic matter in Paranaense soils (not Cerrado), obtained an r 2 value equal to 0.31, an RMSEP equal to 6.88 g dm −3 and a Bias equal to 4.26, while our research obtained an r 2 value equal to 0.43, an RMSEP equal to 2.34 g dm −3 and Bias equal to −0.27.Cezar et al. [20], when employing the use of remote sensing associated with the spiking technique to estimate soil organic matter in a subtropical environment in Brazil (not Cerrado), reached r 2 , RMSEP and Bias values equal to 0.41, 4.6 g dm −3 and 0.49, respectively.
On the other hand, Reis et al. [19], using an Aisafenix hyperspectral sensor (Specim, Finland) in estimating soil organic matter in a subtropical environment, obtained superior results, with r 2 , RMSEP and BIAS values equal to 0.75, 3.44 g dm −3 and 0.58, respectively.In this case, the positive result may be linked to the sensor, which has a higher resolution and the ability to capture small variations in soil organic matter content, allowing for more effective modeling.Using the same AisaFenix imager (Specim, Finland), Nanni et al. [18] achieved superior results when compared to our study for the prediction of organic matter, reaching an r 2 value equal to 0.67, an RMSEP equal to 2.16 g dm −3 and BIAS equal to −0.22.Paz-Kagan et al. [63], also using the Aisafenix hyperspectral sensor to estimate soil attributes in Israel and Germany, achieved good results for organic matter, reaching an r 2 value equal to 0.61 for the estimation in Israel and 0.95 in Germany.
In all these cases where the results were superior, the authors used a new hyperspectral sensor, which may indicate that is also necessary to change the sensor to test the possibility of having more robust data modeling in the Cerrado biome.Figure 9 presents the scatterplots between the predicted and reference values of SOM, corroborating the aforementioned statements.The correlation found between the predicted and reference values demonstrates that they are far from the regression line.However, the estimated values are within the upper and lower limits of the confidence interval set at the 95% probability level.
and the ability to capture small variations in soil organic matter content, allowing for more effective modeling.Using the same AisaFenix imager (Specim, Finland), Nanni et al. [18] achieved superior results when compared to our study for the prediction of organic matter, reaching an r 2 value equal to 0.67, an RMSEP equal to 2.16 g dm −3 and BIAS equal to −0.22.Paz-Kagan et al. [63], also using the Aisafenix hyperspectral sensor to estimate soil attributes in Israel and Germany, achieved good results for organic matter, reaching an r 2 value equal to 0.61 for the estimation in Israel and 0.95 in Germany.
In all these cases where the results were superior, the authors used a new hyperspectral sensor, which may indicate that is also necessary to change the sensor to test the possibility of having more robust data modeling in the Cerrado biome.Figure 9 presents the scatterplots between the predicted and reference values of SOM, corroborating the aforementioned statements.The correlation found between the predicted and reference values demonstrates that they are far from the regression line.However, the estimated values are within the upper and lower limits of the confidence interval set at the 95% probability level.Finally, it should be noted that by adopting 50% of samples from the target area for recalibration of the Vis/NIR/SWIR spectral models, the results were superior in relation to Finally, it should be noted that by adopting 50% of samples from the target area for recalibration of the Vis/NIR/SWIR spectral models, the results were superior in relation to the other models (Table 2), agreeing with Shi et al. [28].This demonstrates that the selection and insertion of some samples from the target area in the spectral prediction model are important to improve the estimation, which has also been highlighted by Wetterlind et al. [29], Guerrero et al. [30] and Guy et al. [48].
These results are important for the SOM estimation in the Brazilian Cerrado since demand has grown over the years, requiring a faster and cleaner methodology for estimating these important soil attributes linked to physical, fertility and grain productivity in a region naturally formed by poor soils.However, considering the lack of similar research in the study area, it is necessary to further investigate the development of spectral models on a local scale and their contribution to improve the identification of SOM spatial variability.

Conclusions
The use of the spiking technique improved the predictive capacity of the recalibrated spectral models for the Cerrado by 12% compared to the non-recalibrated model.The use of local models for predicting organic matter for the Brazilian Cerrado showed potential for use when associated with the spiking technique without using spectral libraries.The development of local spectral models for estimating SOM is a potential alternative for areas in Cerrado, since the use of a model that is considered small will contribute to reduced costs in relation to models generated with large amounts of soil samples.Complementary studies should be carried out, taking into account new areas, uses and vegetation cover, as well as variations in the type and quantity of samples selected for recalibration of spectral models for estimating SOM in the Brazilian Cerrado.

Figure 1 .
Figure 1.Location of the test and target areas in the municipality of Passos, State of Minas Gerais, Brazil.

Figure 1 .
Figure 1.Location of the test and target areas in the municipality of Passos, State of Minas Gerais, Brazil.

Figure 2 .
Figure 2. Flowchart of the methodology for assessing organic matter using Vis/NIR/SW hyperspectral sensors and traditional laboratory analysis.

Figure 2 .
Figure 2. Flowchart of the methodology for assessing organic matter using Vis/NIR/SWIR hyperspectral sensors and traditional laboratory analysis.

Figure 3 .
Figure 3. Scheme used to represent the experiment.(a) Initial calibration (IC) unspiked constructed only with the test area dataset; (b) Initial calibration spiked with a spiking subset (SS) selected in the target site (TS); source: Adapted of the Guerrero et al. [21].

Figure 3 .
Figure 3. Scheme used to represent the experiment.(a) Initial calibration (IC) unspiked constructed only with the test area dataset; (b) Initial calibration spiked with a spiking subset (SS) selected in the target site (TS); source: Adapted of the Guerrero et al. [21].

Figure 4 .
Figure 4. Representative histograms from the soil organic matter datasets for the test and target areas.

Figure 4 .
Figure 4. Representative histograms from the soil organic matter datasets for the test and target areas.

Figure 5 .
Figure 5. Spectral curves of soil samples collected in the test area, at 0 to 0.20 m and from 0.20 m t 0.40 m depths.

Figure 6 .
Figure 6.Spectral curves of soil samples collected in the target area, considering depths from 0 t 0.20 m and from 0.20 m to 0.40 m.

Figure 5 .
Figure 5. Spectral curves of soil samples collected in the test area, at 0 to 0.20 m and from 0.20 m to 0.40 m depths.

Figure 5 .
Figure 5. Spectral curves of soil samples collected in the test area, at 0 to 0.20 m and from 0.20 m to 0.40 m depths.

Figure 6 .
Figure 6.Spectral curves of soil samples collected in the target area, considering depths from 0 to 0.20 m and from 0.20 m to 0.40 m.

Figure 6 .
Figure 6.Spectral curves of soil samples collected in the target area, considering depths from 0 to 0.20 m and from 0.20 m to 0.40 m.

Figure 7 .
Figure 7. Representative regression coefficient of the non-recalibrated local model (A) and of the recalibrated local models using the spiking technique (B-D).

Figure 7 .
Figure 7. Representative regression coefficient of the non-recalibrated local model (A) and of the recalibrated local models using the spiking technique (B-D).

Figure 8 .
Figure 8. Principal component (PC) similarity maps of the test and target area datasets.Red scores were obtained by the calibration model using test area spectra.Green scores were obtained by the calibration model using target area spectra.The ellipse is the limit for the Hotelling's T 2 test (p-value of 1%).

Figure 8 .
Figure 8. Principal component (PC) similarity maps of the test and target area datasets.Red scores were obtained by the calibration model using test area spectra.Green scores were obtained by the calibration model using target area spectra.The ellipse is the limit for the Hotelling's T 2 test (p-value of 1%).

Figure 9 .
Figure 9. Scatterplots obtained during the prediction phase.(A) Unspiked local model; (B) Spiked local model tested with 41 samples; (C) Spiked local model tested with 34 samples; (D) Spiked local model tested with 23 samples.Regression line (solid line); confidence interval (red line).

Figure 9 .
Figure 9. Scatterplots obtained during the prediction phase.(A) Unspiked local model; (B) Spiked local model tested with 41 samples; (C) Spiked local model tested with 34 samples; (D) Spiked local model tested with 23 samples.Regression line (solid line); confidence interval (red line).
• 46 34.10 S latitude and 46 • 31 46.79 W longitude, while the target area has 23 ha and is located at coordinates 20 • 46 29.41 S latitude and 46 • 31 50.82W longitude, both belonging to datum WGS 84.The climate of the both areas is classified as Cwa (humid temperate climate with dry winter and hot summer) according to the Koppen climate classification

Table 1 .
Descriptive statistics obtained for soil organic matter in the study areas.
n of test area: 89 soil samples (one sample was withdrawn due to a discrepant value); n of the target area: 46 soil samples; Swilk prob p-value (0.05).

Table 1 .
Descriptive statistics obtained for soil organic matter in the study areas.

Table 2 .
Statistical indices obtained during the generation phase of the recalibrated and nonrecalibrated models.