Spectral Heterogeneity Analysis and Soil Organic Matter Inversion across Differences in Soil Types and Organic Matter Content in Dryland Farmland in China

: Soil organic matter (SOM) plays an important role in agricultural production and arable land quality improvement. Hyperspectral technology enables frequent surveys over large areas. In this study, we explored the spectral heterogeneity of differences in soil types and SOM content, and proposed a method for measuring SOM content in large areas using spectroscopy. The results indicate regional variations in factors affecting soil spectral absorption peaks, with noticeable latitudinal disparities. The ﬁrst-order differential partial-least-squares method provided the best prediction for the SOM inversion. The coefﬁcient of determination ( R 2 ) for the SOM inversion model was 0.93, and the root mean square error (RMSE) was 3.42, with an 8.49 g/kg difference in the SOM content. When the difference in SOM content fell between 8 and 15 g/kg, the inversion effect model performed best. The optimal model R 2 exceeded 0.95, and the RMSE was less than 5. The comprehensive analysis showed that the organic matter content was an important factor affecting the SOM content estimate and must be considered in the real process. In addition, it is crucial to categorize soil samples on the basis of distinct soil types while maintaining a consistent range of SOM content within the same soil type, ideally between 8 and 15 g/kg. Subsequently, the ﬁrst-order differential partial least squares method is applicable. These results are expected to contribute to the acquisition of high-quality information on variations in the SOM of complex large-scale areas.


Introduction
Soil organic matter (SOM) is an important component of soil for preserving ecological equilibrium and agricultural sustainability [1].Its impact on soil texture, structure, water retention, and its role in carbon and nutrient cycling is profound [2].Consequently, a comprehensive understanding of the distribution and SOM content is imperative for effective land resource management, agricultural productivity, and food security preservation [3].Although conventional soil sampling and analytical techniques have been valuable in research, they have certain limitations [4].These methods typically require significant time and resources, mainly when collecting and analyzing soil samples across extensive areas [5].Consequently, there is a pressing demand for a new approach to efficiently and accurately assess SOM content to advance agricultural modernization and enhance overall agricultural efficiency [6].
Numerous studies have provided evidence of the viability of hyperspectral techniques in the quantity monitoring of SOM content [7,8].Bowers and Hanks demonstrated a marked correlation between organic matter content and the visible light band of soil [9].In a separate study, observations indicated correlations between SOM content and specific bands, including the ultraviolet (UV) band at 376.8 nm, the visible band at 616.5 nm, and the near-infrared (NIR) band at 724.1 nm [10].Laamrani et al. conducted an assessment to determine the optimal hyperspectral bands for detecting organic matter content in agricultural soils using visible near infrared (VNIR) and shortwave infrared (SWIR) bands [11].Despite the valuable insights gained from these findings regarding the hyperspectral inversion of SOM, it is important to acknowledge certain limitations.The first is the small area of monitoring.Many studies usually focus on smaller areas, and the inversion models developed are often localized [12,13].Although these models may be effectively employed within particular limits, their applicability to other areas necessitates further investigation.Consequently, they cannot accurately predict the SOM content over a larger area.Furthermore, monitoring SOM through remote sensing is a complex process influenced by numerous factors.It is worth noting that several studies have overlooked the impact of significant soil variations resulting from diverse geographical regions on the spectral characteristics of organic matter [14].In addition, most SOM inversion models are statistical models, and their performance is highly dependent on the selection of modeling samples [15].Significant fluctuations in the SOM content can substantially influence the efficacy of model inversion.While scholars have examined the spectral characteristics of diverse SOM from various perspectives, they have yet to analyze the underlying causes of these phenomena [16].No corresponding solutions have been proposed in the process of SOM inversion for some identified problems.Numerous scholars have also examined the use of multispectral data, such as Landsat8 and Sentinel-2, to predict SOM content [17,18].Despite this approach's potential for extensive monitoring, multispectral remote sensing images possess limited bands and spectral resolution, which do not provide sufficiently detailed spectral information on SOM.Thus, SOM inversion is not ideal.A key problem in current research is how to realize hyperspectral remote sensing mapping of SOM in complex large regions while guaranteeing accuracy.
Different scholars have extensively studied the spectral characteristics of soil in complex large-scale areas [19,20].Condit et al. conducted a study in which they carefully selected 160 soil samples from 36 states across the United States [21].They determined the spectral characteristics of the near ultraviolet and visible bands and subsequently established an empirical regression equation to describe the reflectance characteristics of the soil.Stoner et al. conducted spectral reflectance measurements on 485 soil samples from the United States and Brazil.These samples were subsequently classified on the basis of the influence of their organic matter and iron elements [22].Similarly, Wang Qianlong et al. collected 1661 soil samples from 13 provinces in China to develop a hyperspectral inversion model for determining soil nitrogen content.To validate their model, they tested it on 104 rice soil samples collected from Zhejiang Province [23].Goge et al. used over 2000 sets of spectral data from various regions across France.They employed a local partial-least-squares model following neighborhood selection to estimate parameters such as organic carbon and cation exchange capacity [24].However, there is currently a lack of research on the achievement of high-precision inversion results of SOM across a large geographical area encompassing differences in soil types and SOM content.
Given that the dryland farmland in China possesses fertile soils and serves as a crucial food production hub, it is imperative to promptly and accurately determine the distribution of organic matter within these areas.Sun et al. conducted a comprehensive examination of the primary determinants of organic carbon density in various spatial regions within dry farmland [25].Huang et al. elucidated the spatial and temporal evolution of key indicators of cropland in dryland farming by rigorously defining the scope and content of this practice [26].Dryland farmland in China encompasses two distinct climatic zones, and the spectral response of SOM exhibits noticeable latitudinal zonal variations.Consequently, the application of existing small-area modeling or multispectral imaging techniques for accurately monitoring organic matter content in the dryland farmland is challenging.
In view of the above problems, we provide an important reference for SOM mapping in complex large-scale areas.The objectives of this study were to (1) further clarify the spectral response characteristics of differences in soil types and SOM content, (2) establish a methodology for hyperspectral inversion modeling of SOM in large-scale areas, and (3) analyze the spatial distribution pattern of SOM in dryland farmland in China.

Study Area
The delineation of dry farmland was established by employing a topographic gradient below 5 • and a proportion of dryland and cultivated land exceeding 40% within a 1 km 2 grid.The dry farmland studied here is situated in the eastern region of China with an average elevation of 150 m above sea level, and the topography is relatively flat.The major climate types in the region are temperate and subtropical monsoon climates, typically involving four distinct seasons with rain and heat over the same period and sufficient light [27,28].The study area was located from 112 The main soil type in the northern part of the study area is black soil, which is dark in color, fertile, and high in organic matter.In the southern part, it is predominantly loess, with a low organic matter content and a loose texture.

Data Acquisition
In the study area, the sampling points were uniformly distributed, resulting in 399 sampling locations.The sampling depth ranged from 0 to 10 cm (Figure 1).The collected soil samples were placed in a dry and ventilated place for natural air drying and sieved to a size of <100 µm.After drying and sieving, the soil samples were divided into two parts to determine the SOM content and soil spectral reflectance.We analyzed the SOM content using the potassium dichromate volumetric method [29].A FieldSpec4 portable object spectrometer was used to measure the spectral reflectance, and the measurement wavelengths ranged from 350 to 2500 nm.We chose 12 V, 50 W halogen lamps as a light source.The fiber optic probe was positioned 7 cm vertically above the soil sample, with the light source inclined at a 45-degree angle and a direct distance of 60 cm from the soil sample [30,31].We completed a whiteboard calibration and experimented with the stabilization of the instrument before use.Each sample was sampled at 1 nm intervals, repeated 10 times, and the resulting average value was chosen as the sample's spectral reflectance.
To more accurately depict the impacts of differences in soil types and SOM content on spectral characteristics, we categorized soil samples within the research region into seven distinct soil types, as defined by the China Soil Classification System [32,33], published by the National Soil Census Office in 1993.Given the large geographical span of the study area according to its longitude and latitude, we stratified the samples into five grades (I-V) based on the disparity in SOM content between the highest and lowest levels.Comprehensive details are provided in Tables 1 and 2.

Research Methods
We used the Kennard-Stone (K-S) algorithm to uniformly select samples with clear differences based on the Euclidean distance between spectral variables.This method ensures the rationality of the spatial distribution of the samples in the SOM inversion model [34,35].The SOM content was inverted using the partial-least-squares regression (PLSR) and random forest regression (RFR) methods.The coefficient of determination (R 2 ) and root mean square error (RMSE) were introduced as precision indices to measure the degree of prediction of the model.The coefficient of determination reflects the fitting degree and stability of the model.The closer R 2 is to one, the greater the stability of the model.The RMSE reflects the ability of the model to estimate reality-that is, the smaller the value, the better the model's estimation [36,37].

Spectral Data Preprocessing
To reduce noise arising from the experimental environment and instrumentation, we excluded the spectral bands of 350-399 and 2401-2500 nm.Second, we applied the multiplicative scatter correction (MSC) and Savitzky-Golay (S-G) convolution smoothing techniques to refine the original spectral data.The smoothed spectral data were transformed using first-order differential reflectance (FDR), second-order differential reflectance (SDR), and continuum removal (CR).The experiments revealed that the combination of the S-G convolution smoothing method and the FDR method yielded the most effective spectral transformation [38][39][40].To mitigate data redundancy and minimize data storage requirements, we employed a wavelet neural network for spectral data compression [41,42].

Partial Least Squares Regression (PLSR)
Compared with conventional models, PLSR allows the use of fewer samples than variables and does not consider multivariate correlation [43,44].Its model building method has been widely used in the field of spectral data processing.
To create the model, we established a spectral matrix X of m × n and the organic matter quantity measurement matrix Y of n × 1, in which m is the number of spectral bands and n is the number of samples, as given in Equations ( 1) and ( 2): where P and Q are the loads, E and F are the residual matrices from the PLSR simulation, and U and T are the score matrices.Equation ( 3) provides the correlation coefficient matrix for B, the linear regression for U and T, and the prediction formula for the volume of organic matter as follows:

Random Forest Regression (RFR)
RFR is a machine learning method based on a classification and regression decision tree.Its main principle is that the bootstrap aggregation (bagging) integrated classifier, composed of weak classifiers and nodes, is divided into several decision trees using random classification technology [30,45].The best classification results are selected through voting.The basic steps of random forest construction are as follows: k sample sets are extracted from the samples using bagging resampling technology, and K decision trees are constructed using the k sample sets, in which the k out-of-pocket data are composed of data that are not extracted each time.In the process of decision tree generation, a tree has n features, and m features (m < n) are randomly extracted at each node [46,47].
The final classification decision is given in Equation ( 4): where H(x) is a combined classification model system, h i is a single decision tree, Y is the objective variable, and I() is the demonstrative function.
The margin function is introduced to measure the error degree of the correct classification number that exceeds the average error number.The larger the margin, the more reliable is the classification, as given in Equation (5): where h i (x) is a single decision tree classifier, (Y, X) is a randomly distributed dataset, and I() is a demonstrative function.

SOM Mapping
To effectively predict the spatial variability of SOM in dry farmland, the optimal SOM inversion model was employed, in conjunction with the geostatistical method, to optimally interpolate and visualize the spatial distribution of SOM.Additionally, the SOM content was classified according to China's soil-nutrient-grading standards.

Soil Attributes Descriptive Statistics
The tables show that the distribution of different soil types in the study area is relatively scattered.The range of the SOM content was 4.96 to 85.77 g/kg.With regard to spatial distribution, the SOM content and soil types showed clear latitudinal zonal disparities.The amount of organic matter in the soils of the three northeastern provinces was higher than that in the other areas.The SOM content of the different soil types also varied considerably; dark-brown earth, black soils, and chernozems had a higher organic matter content, and these soils were distributed mainly in Heilongjiang and Jilin.The organic matter content in yellow-cinnamon soils and brown earth was the most deficient and was distributed mainly in the southern provinces of Henan.
We constructed the mean spectral reflectance curves for different soil types (Figure 2).The results showed that the general trend of the soil spectral reflectance curves was the same, but the values of soil spectral reflectance among the different soil types were different.The values for chernozems, black soils, and dark-brown earth, with a greater organic matter content, were typically lower, indicating that the quantity of spectral reflectance decreased as the amount of organic matter in the soil increased.The spectral curves of dark-brown earth, black soils, and chernozems were relatively smooth, whereas the other soil types had clear spectral absorption peaks.

Analysis of Soil Spectral Characteristics
Figure 3a,b show that the variation trend of the first-order differential curve o spectrum was generally the same.The spectral absorption bands were around 800, 1400, 1900, and 2200 nm.With a decrease in latitude and a relative increase in the am of organic matter, the reflectance numerical value of the first-order differential spec in the absorption peak band became more apparent, and the clearest reflectance ab tion peaks were those of chernozems and black soils.Figure 3c shows that the corre coefficient of the first-order differential curve of the different soil types exhibited si cant fluctuations.The curve displayed minor fluctuations in the 400-900 nm range most regions showing negative correlations.In the 900-2400 nm range, the curve exh more pronounced fluctuations, and the correlation coefficients varied between po and negative values.Figure 3d shows a noticeable grading phenomenon in the corre curve within the 400-800 nm range.As the difference in the organic material conte

Analysis of Soil Spectral Characteristics
Figure 3a,b show that the variation trend of the first-order differential curve of the spectrum was generally the same.The spectral absorption bands were around 800, 1000, 1400, 1900, and 2200 nm.With a decrease in latitude and a relative increase in the amount of organic matter, the reflectance numerical value of the first-order differential spectrum in the absorption peak band became more apparent, and the clearest reflectance absorption peaks were those of chernozems and black soils.Figure 3c shows that the correlation coefficient of the first-order differential curve of the different soil types exhibited significant fluctuations.The curve displayed minor fluctuations in the 400-900 nm range, with most regions showing negative correlations.In the 900-2400 nm range, the curve exhibited more pronounced fluctuations, and the correlation coefficients varied between positive and negative values.Figure 3d shows a noticeable grading phenomenon in the correlation curve within the 400-800 nm range.As the difference in the organic material content increased, the degree of negative correlation also increased.

Modeling and Analysis
The results of the validation using prediction models with the highest accuracy, according to the lowest RMSE and the highest R 2 values, are summarized in Tables 3 and 4.There was no significant difference between the two models for the full sample.For the soil types, except for chernozems, the PLSR method was superior to the RFR method, and the RMSE showed no significant difference, but R 2 was generally increased by more than 0.2.When compared with the model for the full sample, R 2 generally increased by more than 0.4; the inversion effect of SOM content in yellow-cinnamon soil was the best, and R 2 and RMSE were 0.98 and 0.52, respectively.The results in Figure 4a show that the influence of different soil types on the inversion effect should be considered when estimating the SOM content using hyperspectral techniques.Figure 4b shows that the difference in the organic matter content influenced the inversion effect of the two methods.For the RFR method, with an increase in the difference in quantity, model R 2 and RMSE gradually increased.For the PLSR method, the model worked best when we used the second-order difference for the SOM content; R 2 and RMSE were 0.93 and 3.42, respectively.

SOM Inversion Using Hyperspectral Data 3.3.1. Modeling and Analysis
The results of the validation using prediction models with the highest accuracy, according to the lowest RMSE and the highest R 2 values, are summarized in Tables 3 and 4.There was no significant difference between the two models for the full sample.For the soil types, except for chernozems, the PLSR method was superior to the RFR method, and the RMSE showed no significant difference, but R 2 was generally increased by more than 0.2.When compared with the model for the full sample, R 2 generally increased by more than 0.4; the inversion effect of SOM content in yellow-cinnamon soil was the best, and R 2 and RMSE were 0.98 and 0.52, respectively.The results in Figure 4a show that the influence of different soil types on the inversion effect should be considered when estimating the SOM content using hyperspectral techniques.Figure 4b shows that the difference in the organic matter content influenced the inversion effect of the two methods.For the RFR method, with an increase in the difference in quantity, model R 2 and RMSE gradually increased.For the PLSR method, the model worked best when we used the second-order difference for the SOM content; R 2 and RMSE were 0.93 and 3.42, respectively.

Effect of Content Differences on Inversion of SOM of Fluvo-Aquic Soil
To avoid the impact of varying soil types, tidal soil was used as an example to investigate the influence of difference in content on SOM inversion.The samples were divided into five categories (I-V), and the SOM inversion models were established using the firstorder differential PLSR method (Table 5, Figure 5).

Effect of Content Differences on Inversion of SOM of Fluvo-Aquic Soil
To avoid the impact of varying soil types, tidal soil was used as an example to investigate the influence of difference in content on SOM inversion.The samples were divided into five categories (I-V), and the SOM inversion models were established using the first-order differential PLSR method (Table 5, Figure 5).As shown in Figure 5, the inversion effect of the fluvo-aquic soil samples and a other samples was the same for the difference in SOM content.The coefficient of det nation showed a trend of initially increasing and then decreasing.The model precis grades II to V was the highest; R 2 was greater than 0.95, and the R 2 of grade II wa highest, which was 0.98.The RMSE increased as the differences in the amount incre The maximum RMSE was 6.66 for the fifth-order difference, and the prediction abil the model was poor.We can see that the predictive ability and stability of the m reached a good range when the difference in SOM content was grade II.Generall effects of differences in soil types and SOM content should be fully considered wh verting SOM in complex large-scale areas.

Spatial Distribution of SOM in Dry Farmland
The optimal SOM inversion model was employed in conjunction with geostat techniques to interpolate and visually represent the spatial distribution of SOM i farmland (Figure 6).The SOM content in the dry farmland is generally high in the and low in the south, and more than 70% of the area has an organic matter content o than the medium level.The highest organic matter content was found in the northern of the region, especially in Heilongjiang Province, where the organic matter conten above the medium level.The SOM contents in the central and southern regions are erally low, with the lowest organic matter content in the western hills of Liaoxi an central Huanghuaihai Plain.The SOM inversion model based on hyperspectral an combined with the PLSR method of first-order differentiation can invert the spatial d bution trend and classification characteristics of SOM in complex large-scale areas.As shown in Figure 5, the inversion effect of the fluvo-aquic soil samples and all the other samples was the same for the difference in SOM content.The coefficient of determination showed a trend of initially increasing and then decreasing.The model precision of grades II to V was the highest; R 2 was greater than 0.95, and the R 2 of grade II was the highest, which was 0.98.The RMSE increased as the differences in the amount increased.The maximum RMSE was 6.66 for the fifth-order difference, and the prediction ability of the model was poor.We can see that the predictive ability and stability of the model reached a good range when the difference in SOM content was grade II.Generally, the effects of differences in soil types and SOM content should be fully considered when inverting SOM in complex large-scale areas.

Spatial Distribution of SOM in Dry Farmland
The optimal SOM inversion model was employed in conjunction with geostatistical techniques to interpolate and visually represent the spatial distribution of SOM in dry farmland (Figure 6).The SOM content in the dry farmland is generally high in the north and low in the south, and more than 70% of the area has an organic matter content of less than the medium level.The highest organic matter content was found in the northern part of the region, especially in Heilongjiang Province, where the organic matter content was above the medium level.The SOM contents in the central and southern regions are generally low, with the lowest organic matter content in the western hills of Liaoxi and the central Huanghuaihai Plain.The SOM inversion model based on hyperspectral analysis combined with the PLSR method of first-order differentiation can invert the spatial distribution trend and classification characteristics of SOM in complex large-scale areas.

Feasibility of Inverse SOM Using Hyperspectral Data
In this study, soil sampling sites were uniformly obtained from dry farmland in China, focusing on the effects of differences in soil types and SOM content due to latitudinal zonal disparities, to assess the feasibility of hyperspectral techniques in inverting SOM over complex large-scale areas.Some studies have shown that differential spectral transformation can effectively solve these problems [48,49].The spectral bands sensitive to soil organic matter were isolated using various spectral transforms in conjunction with correlation analysis (Figure 3).The extracted correlation peaks aligned with the findings of Chang et al. [50].Furthermore, the correlation coefficients showed a strong association between these bands and the SOM content.It has been observed that variations in soil composition and organic matter concentration do not exert any influence on the location of spectral absorption peaks, but they do have an impact on the correlation coefficients.This observation shows the existence of dissimilar spectral sensitivities of organic matter within different soil samples [37,51].Inverse modeling of the SOM using hyperspectral data was promising (Tables 3 and 4).However, the evaluation of model performance, as measured using R 2 and RMSE, varies significantly across different studies because of variations in sample collection methods, spectral processing techniques, and model construction approaches.These variations can significantly impact the accuracy and reliability of spectral modeling [52,53].The PLSR model can identify system information and noise and permits regression modeling provided that the number of sample points is less than the number of variables.This model has various applications in spectral inversion [54,55].On the other hand, Sun et al. concluded that the sample segmentation method affects the estimation accuracy [56].A similar performance can also be observed in the estimation of the SOM content with GA-PLSR.This study yielded similar findings, highlighting significant disparities in the inversion results derived from diverse modeling sample sets based on differences in soil types and SOM content.Regrettably, such discrepancies are frequently overlooked in numerous studies.On the basis of the analysis of several soil

Feasibility of Inverse SOM Using Hyperspectral Data
In this study, soil sampling sites were uniformly obtained from dry farmland in China, focusing on the effects of differences in soil types and SOM content due to latitudinal zonal disparities, to assess the feasibility of hyperspectral techniques in inverting SOM over complex large-scale areas.Some studies have shown that differential spectral transformation can effectively solve these problems [48,49].The spectral bands sensitive to soil organic matter were isolated using various spectral transforms in conjunction with correlation analysis (Figure 3).The extracted correlation peaks aligned with the findings of Chang et al. [50].Furthermore, the correlation coefficients showed a strong association between these bands and the SOM content.It has been observed that variations in soil composition and organic matter concentration do not exert any influence on the location of spectral absorption peaks, but they do have an impact on the correlation coefficients.This observation shows the existence of dissimilar spectral sensitivities of organic matter within different soil samples [37,51].Inverse modeling of the SOM using hyperspectral data was promising (Tables 3 and 4).However, the evaluation of model performance, as measured using R 2 and RMSE, varies significantly across different studies because of variations in sample collection methods, spectral processing techniques, and model construction approaches.These variations can significantly impact the accuracy and reliability of spectral modeling [52,53].The PLSR model can identify system information and noise and permits regression modeling provided that the number of sample points is less than the number of variables.This model has various applications in spectral inversion [54,55].On the other hand, Sun et al. concluded that the sample segmentation method affects the estimation accuracy [56].A similar performance can also be observed in the estimation of the SOM content with GA-PLSR.This study yielded similar findings, highlighting significant disparities in the inversion results derived from diverse modeling sample sets based on differences in soil types and SOM content.Regrettably, such discrepancies are frequently overlooked in numerous studies.On the basis of the analysis of several soil samples, we concluded that the duplicated division of soil samples based on differences in soil types and SOM content is beneficial for reducing the chance of an abnormal estimation accuracy.Combined with the inversion effect, model accuracy, and other factors, the use of hyperspectral data is a promising method for inverting the SOM content of complex large-scale areas.

Influence of a Large Areas on SOM Hyperspectral Inversion
The results revealed that the accuracy of the SOM hyperspectral inversion models based on different samples was relatively diverse.Combined with the research results of other scholars, the main factors affecting the model inversion effect are soil Fe and Mn compounds, and SOM content [45,57].First, this study showed clear absorption peaks for the spectral curves at 800, 1000, 1400, and 1900 nm.The results of previous studies showed that the characteristics of the spectral curves at 800-1000 nm were mainly due to the electron jump in metal ions (Fe 2+ , Fe 3+ , and Mn 2+ ) [58].The values of dark-brown soil, black soil, and chernozems with a higher amount of organic matter were generally lower, indicating that the higher the amount of organic matter, the lower the hyperspectral reflectance.This agrees with the results of Qinhong Liao and Zhou Shi et al. [59].The effects of Fe-Mn compounds on the hyperspectral model of SOM are mainly based on two aspects.On the one hand, Fe-Mn compounds are important dyes in soil that can change its color, and color is a decisive factor affecting the spectral reflectance of soil.In addition, Fe-Mn compounds have a strong masking effect on the expression of soil spectral characteristics, as has been proven in many studies [60,61].
From Table 6, we can see the difference in the SOM content and the Fe-Mn compounds of different soil types.Combined with the results of model inversion, it can be seen that soil with a high content of Fe-Mn compounds generally has a poor modeling effect.This is a good method for establishing an SOM content inversion model by classification according to soil type.Compared with sample-wide modeling, R 2 and RMSE increased by approximately 0.6 and 0.5, respectively.This difference not only manifested for different soil types but also showed a regional pattern.Because of the difference in soil types and the natural environment, the SOM content in the northern region was higher than that in the southern region (Figure 1, Table 1) [62].Organic matter was the main factor in the northern part of the study area, whereas ferromanganese compounds were the main factor in the southern part, similar to the results of related research [63].Based on this study, we recommend that when employing hyperspectral techniques to estimate SOM content, soil samples should be categorized according to different soil types while maintaining the variance in organic matter content within the same soil type within the range of 8-15.Further research is required to determine how to increase the precision of an SOM inversion model when the concentration of Fe-Mn compounds is high.In addition, many other factors affected the soil spectral properties.Lv et al. suggested that soil moisture is the main cause of pseudospectra and noise changes [64].Meanwhile, due to the complexity of the soil formation process in the study area, the mineral composition of different soils may also be an important influencing factor [65].Therefore, the influence and regulation of other factors on the spectral characteristics of soil and the comprehensive influence degree and regularity of various factors require further study.

Conclusions
The expeditious acquisition of precise regional SOM maps is a crucial requirement for safeguarding land resources and devising pertinent policies.In this paper, we propose a reproducible workflow for realizing SOM inversion in complex large-scale areas using hyperspectral data.We further improved the accuracy of generating SOM maps of dry farmland in China.Our study revealed significant variations in the spectral characteristics of soils across various regions.Therefore, it is crucial to thoroughly account for the influences of differences in soil types and organic matter when conducting research in this field, as these factors can greatly impact the accuracy of inversion results.In the northern section of the dry farmland, SOM emerged as the prevailing determinant, whereas in the south-central region, iron, manganese, and other heavy metals were dominant.Differences in spectrally sensitive bands were observed among the various soil types, with correlation coefficients exceeding 0.8 at the 429 nm band for yellow-brown soils and 2262 nm for black calcium soils, corresponding to correlation coefficients of 0.91 and 0.82, respectively.When comparing inversion with difference in SOM content, the most optimal modeling was achieved with a content difference of 8.49, yielding an R 2 of 0.93 and an RMSE of 3.42.Our methodology and workflow can provide a reference for soil mapping in complex large-scale areas.
However, there remains a significant challenge in elucidating and quantifying the spectral characteristics of diverse soils through a more comprehensive delineation of factors that contribute to soil differentiation.Subsequent research endeavors should strive to enhance and streamline the process of SOM inversion and develop meticulous inversion techniques tailored to specific soil classifications, thereby enabling continuous and comprehensive monitoring of SOM over extended periods.

Figure 1 .
Figure 1.Study area location and sampling points distribution map.

Figure 1 .
Figure 1.Study area location and sampling points distribution map.

Figure 2 .
Figure 2. Comparison of the mean spectral curves of different soil types in the studied area.

Figure 2 .
Figure 2. Comparison of the mean spectral curves of different soil types in the studied area.

Sustainability 2023 , 16 Figure 3 .
Figure 3. Curves for spectral first−order differential and correlation coefficient between SOM content and spectral reflectance: (a) FDR of different soil types; (b) FDR of difference in SOM content; (c) correlation coefficient curve of different soil types; and (d) correlation coefficient curve for grades I-V (difference in SOM content).

Figure 3 .
Figure 3. Curves for spectral first-order differential and correlation coefficient between SOM content and spectral reflectance: (a) FDR of different soil types; (b) FDR of difference in SOM content; (c) correlation coefficient curve of different soil types; and (d) correlation coefficient curve for grades I-V (difference in SOM content).

Figure 4 .
Figure 4. SOM inversion method verification (comparison results of actual measured values and spectral data predicted values) based on the PLSR model for (a) difference in soil types and (b) difference in SOM content (Grade I to V). (Black solid lines represent one-to-one bisectors.The black dotted line is the trend line for all soil samples.).

Figure 4 .
Figure 4. SOM inversion method verification (comparison results of actual measured values and spectral data predicted values) based on the PLSR model for (a) difference in soil types and (b) difference in SOM content (Grade I to V). (Black solid lines represent one-to-one bisectors.The black dotted line is the trend line for all soil samples.).

Figure 5 .
Figure 5. SOM estimation method verification (comparison results of actually measured valu predicted values from spectral data) of fluvo−aquic soil with difference in SOM content bas the PLSR method.

Figure 5 .
Figure 5. SOM estimation method verification (comparison results of actually measured values and predicted values from spectral data) of fluvo-aquic soil with difference in SOM content based on the PLSR method.
• 33 54.8 E to 135 • 3 55.953E longitude and from 32 • 8 24.506 N to 48 • 27 21.127N latitude.The study was conducted in 7 provinces and 694 counties, covering a total land area of 57.68 million square kilometers.

Table 1 .
SOM content of the soil types and sites tested.Given are the maximum, minimum, and mean content.

Table 2 .
Classification of SOM content (Grade I to V) based on the difference between maximum and minimum content.

Table 1 .
SOM content of the soil types and sites tested.Given are the maximum, minimum, and mean content.

Table 2 .
Classification of SOM content (Grade I to V) based on the difference between maximum and minimum content.

Table 3 .
SOM inversion results based on the RFR and PLSR models for different soil types.

Table 3 .
SOM inversion results based on the RFR and PLSR models for different soil types.

Table 4 .
SOM inversion results based on difference in SOM content (Grade I to V).

Table 4 .
SOM inversion results based on difference in SOM content (Grade I to V).

Table 5 .
Classification (Grades I-V) based on difference in SOM content of fluvo-aquic soil.

Table 5 .
Classification (Grades I-V) based on difference in SOM content of fluvo-aquic soil.

Table 6 .
Statistics of average SOM, iron, and manganese in the different soil types tested.