Visible Near-Infrared Hyperspectral Soil Organic Matter Prediction Based on Combinatorial Modeling

: Non-destructive, fast, and accurate prediction of soil organic matter content in farmland is of great significance for soil fertility assessment and rational fertilization. In the process of soil organic matter prediction, it is important to give full play to the advantages of different prediction models and to integrate different prediction models to innovatively construct a combined prediction model of soil organic matter content so as to improve the prediction accuracy and generalization ability of the model. In this study,


Introduction
Soil organic matter plays a vital role in soil health and crop growth and is a key factor in measuring soil fertility, maintaining soil ecosystem balance, and promoting sustainable agriculture [1].
When determining the organic matter content in traditional laboratories, it is often necessary to collect a large number of samples, which is destructive, costly, inefficient, discrete, etc.In the process of determining the organic matter content, it is necessary to carry out ablation treatment to destroy the complex structure of the soil and release the organic matter, and a large amount of waste liquid is generated in the process of ablation, which has a certain impact on the environment, and a large number of chemical reagents are used which pose a certain danger to the experimenter.The use of a large number of chemical reagents, to a certain degree, is harmful to the health of laboratory personnel and the environment [2].
Despite the continuous progress of IoT sensor technology, it is still difficult to accurately determine the content of soil organic matter.IoT sensors generate large amounts of real-time data, requiring powerful data processing and analysis capabilities and increasing cost and complexity.Sensors require stable communication connections with high energy and communication requirements.IoT sensors are costly to deploy and maintain and require regular calibration to ensure their accuracy, and soil organic matter data may contain sensitive information that requires appropriate encryption and security measures to protect data security and privacy [3].
The rapid development and effective application of hyperspectral technology provide favorable conditions for non-destructive, rapid, low-cost, and high spatial resolution determination of organic matter content [4].Scholars conducted a great deal of research in the use of hyperspectral data to predict soil organic matter content, for one, by using different hyperspectral data, mainly high-resolution hyperspectral satellite/airborne data [5,6], visible, near-infrared hyperspectral data [7,8].Secondly, due to the large number of hyperspectral data bands and the redundancy of information, there have been studies using different spectral transforms for the feature selection; spectral transforms mainly include S-G denoising, differential transforms, wavelet transforms, scattering correction, etc. [9][10][11][12], and feature selection or dimensionality reduction and other methods to screen the sensitive bands, mainly Competitive Adaptive Reweighted Sampling (CARS), Successive Projection Algorithms (SPA), Particle Swarm Optimization algorithms (PSO), Recursive Feature Elimination (RFE), Random Forest importance feature selection, PCA, and other dimensionality reduction [13][14][15][16][17][18].Third, different prediction methods are used for the calculation of relevant parameters and model construction, mainly using a combination of various optimization algorithms and machine learning models, mainly genetic optimization, PSO, and other algorithms combined with machine learning and deep learning algorithms, such as Partial Least Squares Regression (PLSR), Random Forests (RF), Support Vector Machines (SVM), Decision Trees, Artificial Neural Network (ANN), Long Short-Term Memory (LSTM), Ridge regression, ensemble method and other machine learning and deep learning algorithms for model construction [16][17][18][19][20][21][22][23][24].
Studies have been conducted in the soil organic matter content prediction model selection mostly using a single model; each model has its advantages and disadvantages and may be affected by random factors in order to reduce the possible bias of a single model so as to obtain more accurate and stable prediction results.Researchers have explored prediction models constructed based on integrated learning methods such as Bagging, Boosting, Stacking, and fusion models constructed based on weighting methods to achieve more accurate prediction results by ensembling the advantages of different models.For example, Zhang Xiuquan et al. [16] demonstrated better accuracy in predicting soil organic matter content with a stacked generalization model constructed using four single-prediction models.Although the integrated learning model utilizes the degree of difference in structure and prediction of multiple single models, it does not consider the degree of contribution of multiple models in the integrated stacked model.Currently, common combined prediction models are constructed with the criterion that the sum of the squares of the absolute errors or the sum of the absolute values of the deviations is minimized [24], and because different time series sometimes have different magnitudes or different values that makes the sum of squares of absolute errors or the sum of absolute values of deviations not comparable.Therefore, this study proposes a combined forecasting model, which is built on the basis of multiple single forecasting models, constructs a combined model that considers the standard deviation of forecasting accuracy and takes the maximum of the constructed forecasting effectiveness model as the goal, and uses the planning and solving idea to obtain the weights of each single forecasting model, and then constructs a combined forecasting model with the weights weighted by these weights.

Sample Collection and Measurement
Soil samples were collected from the test site, which has a warm temperate continental climate with four distinct seasons, an altitude of 776 m, an average annual temperature of 11.5 • C, an average annual rainfall of 456.7 mm, an annual sunshine hour of 2540.7 h, and a frost-free period of 176 days.The soil type of this test field is cinnamon soil, and the planting crop is soybean-corn strip compound planting; the corn variety is Qiangsheng 199, and the soybean variety is Zhonghuang 45, and the area of the test field is 3.333 ha, with a flat topography, and the organic matter content of the surface and deep soil varies greatly.The sampling time was in mid-October 2022, after the harvesting of farmland crops.During the sampling process, soil samples were collected at the relative center of each sampling unit in accordance with the principles of equal quantity, randomization, and 5-point mixing, and a total of 312 soil samples were collected.All the samples were brought back to the laboratory to remove debris, dried naturally, ground, sieved, and then divided into two parts, which were used for the determination of soil organic matter content and the collection of visible and near-infrared spectroscopic data, respectively.
The potassium dichromate external heating method was used for soil organic matter content determination [25].Under the condition of oil bath (the temperature of oil bath was 170~180 • C, boiling for 5 min), a certain concentration of potassium dichromatesulfuric acid solution was added to the soil samples; the carbon in the soil organic matter was oxidized to carbon dioxide by the potassium dichromate, and at the same time, the hexavalent chromium in the potassium dichromate was reduced to trivalent chromium; the remaining potassium dichromate was titrated with the standard solution of divalent iron; the amount of ferrous sulfate was calculated according to the potassium dichromate consumption before and after the oxidization of organic carbon, and the amount of ferrous sulfate consumed by potassium dichromate was calculated.Based on the amount of ferrous sulfate consumed by potassium dichromate before and after the organic carbon was oxidized, the organic carbon content was calculated; the measured result could only oxidize 90% of the organic carbon, so the measured organic carbon was multiplied by a correction factor of 1.1 to calculate the amount of organic carbon; ultimately, the organic matter content of the soil was obtained by multiplying the organic carbon content by a conversion factor of 1.724 [26].
The statistical results of its soil organic matter content are shown in Table 1; the minimum value of the organic matter content of 312 samples was 1.976 g/kg; the maximum value was 32.228 g/kg; the mean value was 12.885 g/kg; the standard deviation was 5.441 g/kg, and the coefficient of variation was 42.227%; 312 samples were divided into five intervals according to the organic matter content from low to high (0.000, 5.000], (5.000, 10.000], (10.000, 15.000], (15.000, 20.000], (20.000, 32.228]; the number of samples in the five intervals were 31, 67, 101, 92, 21, and the mean values of organic matter content were 3.676 g/kg, 7.901 g/kg, 12.897 g/kg, 17.068 g/kg, 23.989 g/kg, and the coefficients of variation were 24.266%, 16.897%, 11.111%, 7.822%, and 13.710%, respectively.The spectral data acquisition of soil samples was performed with a hyperspectral imager manufactured by Headwall Photonics, Fitchburg, MA, USA.The object distance of the platform was set to 20 nm; the moving speed was 15.55 nm/s; the exposure time was 0.9 ms; the spectral range was 379~1705 nm, and the spectral resolutions of the visible and near-infrared were 0.727 nm and 4.715 nm, respectively.The scanned hyperspectral images were based on the ENVI5.3platform, and the desired soil sample part was cropped by selecting the region of interest; the spectral reflectance values were extracted by quick statistics from the cropped soil sample images, and the arithmetic mean was taken as the sample.The spectral reflectance value of the soil was extracted from the cropped soil sample image by fast statistics, and the arithmetic mean value was taken as the spectral curve of the sample, which was used as the basic data set for subsequent data processing.Due to the interference of instrumental noise, spectral scattering, and spectral covariance, the accuracy of predicting soil property content using raw spectral data is often low; in order to reduce the error caused by instrumental measurements, the spectral data between 390~1704 nm were selected; at the same time, in order to minimize the influence of noise, S-G smoothing was carried out.It has been shown that the spectral data of the S-G smoothing were effective, which could effectively remove high-frequency noise and interference signals and improve the quality of the data [12][13][14]16,22], and the denoising results are shown in Figure 1.As can be seen from Figure 1, the trends of spectral reflectance of soils with different contents of organic matter were similar.The size of the spectral reflectance of soil organic matter content increases with wavelength, but the growth rate of spectral reflectance decreases.In order to effectively eliminate or attenuate the effect of soil background and to highlight the subtle variations in the spectral curves over different organic matter contents, the S-G smoothing spectral curves were first-order differentiated, and the results of the first-order differential processing are shown in Figure 2. The difference features of the spectral curves after the first-order differential transform are more prominent and can be effectively used for subsequent spectral feature extraction.In order to solve the problem of many hyperspectral bands, large data volume, and redundancy, the arithmetic is reduced to enhance the generalization ability of the model.By adding the L1 paradigm as a penalty term to the loss function, the weight vectors in the model are sparsified, and the weights are compressed to 0, which helps to identify the features that contribute less to the model and, thus, removes redundant or irrelevant features.LASSO regression, by adjusting the strength of the L1 paradigm of the weight vectors, achieves a balance between complexity and overfitting [27,28].For this reason, this experiment incorporates LASSO regression using L1-paradigm regularization for feature selection.

Single-Prediction Model Construction
The spectral dataset was divided into three parts, training set, validation set, and test set, in the ratio of 7:2:1, with 70 spectral data selected by features as the independent variable and soil organic matter content as the dependent variable, using 8 algorithms, such as LASSO Regression (LASSO) (Model 1), Multilayer Perceptron (MLP) (Model 2), Random Forest (RF) (Model 3), Gaussian Kernel Regression (GKR) (Model 4), Ridge Regression (Model 5), Long Short-Term Memory (LSTM) (Model 6), Convolutional Neural Networks (CNN) (Model 7), Support Vector Regression (SVR) (Model 8), to construct a singleprediction model for soil organic matter content, and a superiority of the 8 single-prediction models was proposed.
The LASSO regression [29] is a linear regression using the L1 regularization method that fits the data by minimizing the sum of squares of the residuals with a penalty term, where the penalty term is the L1 norm of the weight vector.By adjusting the strength of the penalty term, LASSO regression can find a balance between complexity and overfitting.In its mathematical formulation, it consists of a linear model with a ℓ priori regular terms.Its minimized objective function is ‖ − ‖ + ‖‖ , where  is a constant, and ‖‖ is the norm of the parameter vector.Multilayer Perceptron (MLP) [30] is a feed-forward neural network, including at least three layers of nodes: input layer; hidden layer; and output layer.Its basic structure is shown in Figure 3.In addition to the input nodes, each node is a neuron with nonlinear activation.The regression training process consists of two steps: forward propagation; and backpropagation.The input data for forward propagation are fed to the input layer, which is then passed on to the hidden layer and ultimately generates the output of the output

Single-Prediction Model Construction
The spectral dataset was divided into three parts, training set, validation set, and test set, in the ratio of 7:2:1, with 70 spectral data selected by features as the independent variable and soil organic matter content as the dependent variable, using 8 algorithms, such as LASSO Regression (LASSO) (Model 1), Multilayer Perceptron (MLP) (Model 2), Random Forest (RF) (Model 3), Gaussian Kernel Regression (GKR) (Model 4), Ridge Regression (Model 5), Long Short-Term Memory (LSTM) (Model 6), Convolutional Neural Networks (CNN) (Model 7), Support Vector Regression (SVR) (Model 8), to construct a singleprediction model for soil organic matter content, and a superiority of the 8 single-prediction models was proposed.
The LASSO regression [29] is a linear regression using the L1 regularization method that fits the data by minimizing the sum of squares of the residuals with a penalty term, where the penalty term is the L1 norm of the weight vector.By adjusting the strength of the penalty term, LASSO regression can find a balance between complexity and overfitting.In its mathematical formulation, it consists of a linear model with a ℓ priori regular terms.Its minimized objective function is ‖ − ‖ + ‖‖ , where  is a constant, and ‖‖ is the norm of the parameter vector.Multilayer Perceptron (MLP) [30] is a feed-forward neural network, including at least three layers of nodes: input layer; hidden layer; and output layer.Its basic structure is shown in Figure 3.In addition to the input nodes, each node is a neuron with nonlinear activation.The regression training process consists of two steps: forward propagation; and backpropagation.The input data for forward propagation are fed to the input layer, which is then passed on to the hidden layer and ultimately generates the output of the output

Construction of the Model 2.2.1. Single-Prediction Model Construction
The spectral dataset was divided into three parts, training set, validation set, and test set, in the ratio of 7:2:1, with 70 spectral data selected by features as the independent variable and soil organic matter content as the dependent variable, using 8 algorithms, such as LASSO Regression (LASSO) (Model 1), Multilayer Perceptron (MLP) (Model 2), Random Forest (RF) (Model 3), Gaussian Kernel Regression (GKR) (Model 4), Ridge Regression (Model 5), Long Short-Term Memory (LSTM) (Model 6), Convolutional Neural Networks (CNN) (Model 7), Support Vector Regression (SVR) (Model 8), to construct a single-prediction model for soil organic matter content, and a superiority of the 8 singleprediction models was proposed.
The LASSO regression [29] is a linear regression using the L1 regularization method that fits the data by minimizing the sum of squares of the residuals with a penalty term, where the penalty term is the L1 norm of the weight vector.By adjusting the strength of the penalty term, LASSO regression can find a balance between complexity and overfitting.In its mathematical formulation, it consists of a linear model with a ↕ 1 priori regular terms.Its minimized objective function is min w where α is a constant, and ∥w∥ 1 is the norm of the parameter vector.
Multilayer Perceptron (MLP) [30] is a feed-forward neural network, including at least three layers of nodes: input layer; hidden layer; and output layer.Its basic structure is shown in Figure 3.In addition to the input nodes, each node is a neuron with nonlinear activation.The regression training process consists of two steps: forward propagation; and backpropagation.The input data for forward propagation are fed to the input layer, which is then passed on to the hidden layer and ultimately generates the output of the output layer.During the backpropagation training process, the outputs are compared to the desired outputs, and an error value is generated.This error is then backpropagated into the network, and the weights are updated accordingly.The MLP uses the mean squared error (MSE) as a loss function and train the model by minimizing this loss function.In the training process, in addition to adjusting the weights and bias terms, regularization techniques can be used to prevent overfitting and improve the generalization ability of the model.layer.During the backpropagation training process, the outputs are compared to the desired outputs, and an error value is generated.This error is then backpropagated into the network, and the weights are updated accordingly.The MLP uses the mean squared error (MSE) as a loss function and train the model by minimizing this loss function.In the training process, in addition to adjusting the weights and bias terms, regularization techniques can be used to prevent overfitting and improve the generalization ability of the model.Random Forest (RF) [31] regression is an algorithm based on integrated learning, which performs the regression task by constructing multiple decision trees and integrating their predictions.In Random Forest, each decision tree is independent and trained on a randomly selected subsample, which effectively reduces the risk of overfitting.Random Forests obtain the final regression result by averaging or weighted averaging the predictions of multiple decision trees.
Gaussian Kernel Regression (GKR) [32] is a nonlinear regression model based on the kernel function.It maps the input space to a high-dimensional feature space by introducing a Gaussian kernel function, thus realizing linear regression in a high-dimensional feature space.The Gaussian kernel function is a commonly used kernel function in the form of K(, ) = exp − ‖ ‖ , where  and  are the samples in the input space, and σ is the width parameter of the Gaussian kernel function, which determines the degree of smoothing of the function.
Ridge Regression [33] is based on the least squares estimation and constrains the size of the coefficient vectors by adding the L2 regularization term, thus solving the instability and inaccuracy of the least squares method on covariate data.The loss function of Ridge regression is as follows: Loss = Loss(, f(; )) +  (∑  ) , where  is the actual la- beled value; f(; ) is the predicted value of the model; w is the parameter of the model; βi is the ith element of the coefficient vector β, and α is the regularization strength parameter (also known as the ridge parameter or the penalty parameter); by adjusting the value of α, a model complexity by adjusting the value of α, a trade-off can be made between model complexity and prediction accuracy.
Long Short-Term Memory (LSTM) [34] introduces input gate, forget gate, and forget gate on the basis of Recurrent Neural Network (RNN).These gates can selectively control the flow of information, aiming to solve the problem of gradient vanishing and gradient explosion in traditional RNNs while being able to better capture long-term dependencies.Its general structure is shown in Figure 4.The input gate determines how much new information is added to the cell; the forget gate controls whether the information is forgotten at each moment, and the output gate determines whether the information is output at each moment, which is calculated by the following formula (Equation (1)): Random Forest (RF) [31] regression is an algorithm based on integrated learning, which performs the regression task by constructing multiple decision trees and integrating their predictions.In Random Forest, each decision tree is independent and trained on a randomly selected subsample, which effectively reduces the risk of overfitting.Random Forests obtain the final regression result by averaging or weighted averaging the predictions of multiple decision trees.
Gaussian Kernel Regression (GKR) [32] is a nonlinear regression model based on the kernel function.It maps the input space to a high-dimensional feature space by introducing a Gaussian kernel function, thus realizing linear regression in a high-dimensional feature space.The Gaussian kernel function is a commonly used kernel function in the form of , where x and y are the samples in the input space, and σ is the width parameter of the Gaussian kernel function, which determines the degree of smoothing of the function.
Ridge Regression [33] is based on the least squares estimation and constrains the size of the coefficient vectors by adding the L2 regularization term, thus solving the instability and inaccuracy of the least squares method on covariate data.The loss function of Ridge regression is as follows: Loss = Loss(y, f(x; w)) + α × ∑ β 2 i , where y is the actual labeled value; f(x; w) is the predicted value of the model; w is the parameter of the model; β i is the ith element of the coefficient vector β, and α is the regularization strength parameter (also known as the ridge parameter or the penalty parameter); by adjusting the value of α, a model complexity by adjusting the value of α, a trade-off can be made between model complexity and prediction accuracy.
Long Short-Term Memory (LSTM) [34] introduces input gate, forget gate, and forget gate on the basis of Recurrent Neural Network (RNN).These gates can selectively control the flow of information, aiming to solve the problem of gradient vanishing and gradient explosion in traditional RNNs while being able to better capture long-term dependencies.Its general structure is shown in Figure 4.The input gate determines how much new information is added to the cell; the forget gate controls whether the information is forgotten at each moment, and the output gate determines whether the information is output at each moment, which is calculated by the following formula (Equation ( 1)): In Figure 4 and Equation ( 1), w and b denote the weights and bias vectors of the above gates, respectively; C t denotes the memory cell; σ and tanh denote the sigmoid activation function and hyperbolic tangent activation function, respectively.F t is called a forget gate, indicating which features of C t−1 are used to calculate C t .F t is a vector located within the range of [0, 1], typically using sigmoid as the activation function.⊗ represents the unit multiplication relationship between F t and C t−1 .
∼ C t represents the updated value of the unit state, which is obtained from the input data x t and hidden node h t−1 through a neural network layer, The activation function for the updated value of the unit state is usually tanh.I t is called an input gate, which is a vector between [0, 1] intervals, calculated by x t and h t−1 through the activation function sigmoid.I t is used to control which features of In Figure 4 and Equation ( 1),  and  denote the weights and bias vectors of the above gates, respectively;  denotes the memory cell;  and ℎ denote the sigmoid activation function and hyperbolic tangent activation function, respectively. is called a forget gate, indicating which features of  are used to calculate  . is a vector located within the range of [0, 1], typically using sigmoid as the activation function.⊗ represents the unit multiplication relationship between  and  . represents the updated value of the unit state, which is obtained from the input data  and hidden node ℎ through a neural network layer, The activation function for the updated value of the unit state is usually ℎ. is called an input gate, which is a vector between [0, 1] intervals, calculated by  and ℎ through the activation function sigmoid. is used to control which features of  are used to update  .ℎ is obtained from the output gate  and the unit state  .Convolutional Neural Networks (CNN) [34] are mainly based on convolutional operation and backpropagation algorithm; each convolutional layer consists of a number of convolutional units, and the parameters of each convolutional unit are optimized by the backpropagation algorithm.In the training process, the input data are processed by the convolutional layer and activation function to obtain the output result, and then, the output result is compared with the real value to calculate the error.According to the size of the error, the weights and bias terms of the neurons are adjusted layer by layer by the backpropagation algorithm to reduce the error and gradually improve the prediction accuracy.
The goal of Support Vector Regression (SVR) [35] aims to find a suitable  − tube so that as many sample points as possible are within this tube, and a very small number of sample points outside the tube are considered outliers.When calculating the loss value, although there is a deviation between the sample points in the tube and the predicted value of the objective function, their loss is considered to be 0, which allows for an error of  between () and .When the absolute difference between the two is less than , the loss is considered to be 0. SVR problem based on the training data pairs is as follows: () = ,  + , where , x represents the dot product of the two vectors  and x.If the Convolutional Neural Networks (CNN) [34] are mainly based on convolutional operation and backpropagation algorithm; each convolutional layer consists of a number of convolutional units, and the parameters of each convolutional unit are optimized by the backpropagation algorithm.In the training process, the input data are processed by the convolutional layer and activation function to obtain the output result, and then, the output result is compared with the real value to calculate the error.According to the size of the error, the weights and bias terms of the neurons are adjusted layer by layer by the backpropagation algorithm to reduce the error and gradually improve the prediction accuracy.
The goal of Support Vector Regression (SVR) [35] aims to find a suitable ε − tube so that as many sample points as possible are within this tube, and a very small number of sample points outside the tube are considered outliers.When calculating the loss value, although there is a deviation between the sample points in the tube and the predicted value of the objective function, their loss is considered to be 0, which allows for an error of ε between f (x) and y.When the absolute difference between the two is less than ε, the loss is considered to be 0. SVR problem based on the training data pairs is as follows: f (x) = w, x + b, where w, x represents the dot product of the two vectors w and x.If the regression function f (x) can estimate all the training sample points within the precision ε, the minimization problem could be transformed into the following optimization problem (Equation ( 2)):

Combinatorial Predictive Modeling
The combined prediction model is constructed using a linear combination of singleprediction methods [36,37], which is based on the principle (Equation ( 3)): where f t is the predicted value of the combined prediction model; f it is the predicted value of the tth sample of the ith single-prediction method, a total of 8 (m = 8) single-prediction methods, 312 (t = 312) samples of the data selected for this experiment; w 1 , w 2 , • • • , w m for the weighting coefficients of the m single-prediction methods, the weighting coefficients are obtained through the planning of the solution to the combination of the model as a superior prediction model and satisfy (Equation ( 4)): The process of constructing the combined prediction model is shown in Figure 5.
Agronomy 2024, 14, x FOR PEER REVIEW 8 of 19 regression function f(x) can estimate all the training sample points within the precision ε, the minimization problem could be transformed into the following optimization problem (Equation ( 2)):

Combinatorial Predictive Modeling
The combined prediction model is constructed using a linear combination of singleprediction methods [36,37], which is based on the principle (Equation ( 3)): where  is the predicted value of the combined prediction model; fit is the predicted value of the tth sample of the ith single-prediction method, a total of 8 (m = 8) single-prediction methods, 312 (t = 312) samples of the data selected for this experiment;  ,  , ⋯ ,  for the weighting coefficients of the m single-prediction methods, the weighting coefficients are obtained through the planning of the solution to the combination of the model as a superior prediction model and satisfy (Equation ( 4)): The process of constructing the combined prediction model is shown in Figure 5.

Calculation of Accuracy Validation Metrics
The model prediction validity is used for the evaluation of prediction accuracy, and its validity is calculated as follows (Equations ( 5)-( 9)): Agronomy 2024, 14, 789 9 of 18

Calculation of Accuracy Validation Metrics
The model prediction validity is used for the evaluation of prediction accuracy, and its validity is calculated as follows (Equations ( 5)-( 9)): where M i is the predictive validity of the ith prediction method; the larger value indicates that the ith prediction method is more effective; E(A i ) is the mathematical expectation of the sequence of prediction accuracy of the ith prediction method; the larger the value indicates the higher prediction accuracy; σ(A i ) is the standard deviation of the sequence of prediction accuracy of the ith prediction method; the smaller value indicates that the prediction is more stable; i is the number of single-prediction methods, and this paper has used 8 prediction methods; t is the number of prediction samples, and the number of prediction samples in this paper is 312; x t is the measured value of t samples; x it is the predicted value of the tth sample of the ith prediction method; e it is the relative error of prediction of the tth sample of the ith prediction method; A it is the prediction accuracy of the ith prediction method in the tth sample.When A it = 0, it indicates that the prediction of the ith prediction method in the tth sample is invalid prediction.The combination-prediction model solution method and process are as follows (Equations ( 10)-( 14)): (1) Calculate the combined predicted value of xt where w 1 , w 2 , • • • , w m are the weighting coefficients of m single-prediction methods and satisfy (2) Calculate the relative error e t and the prediction accuracy A t of the combination prediction for the tth sample (3) Calculate the combination-prediction validity where E(A) is the mathematical expectation of the sequence of prediction accuracy of the combination of prediction methods; σ(A) is the standard deviation of the sequence of prediction accuracy of the combination of prediction methods, which is calculated using Equations ( 5) and ( 6), and E(A) and σ(A) are the functions of weighting coefficients w 1 , w 2 , • • • , w m of the various single-prediction methods, so that M is also a function of , and the larger value of M indicates that the combination-prediction method is more effective.The combination-prediction model is calculated and obtained as follows: denoted as M min and M max are the minimum and maximum prediction validity of m prediction methods, respectively, and M is the prediction validity of the combination model.Then when M < M min , the combination-prediction model is an inferior combination prediction; when M min < M < M max , the combination-prediction model is a non-inferior combination prediction; when M > M max , the combination-prediction model is a superior combination prediction; (4) Approximate solution of the combined prediction model.Since the objective function is not derivable, i.e., the model is non-frivolous nonlinear programming, coupled with the fact that the computational complexity of the model is larger when n and m are larger, the non-frivolous nonlinear programming is transformed into frivolous nonlinear programming to solve the problem.The combined prediction model is equivalent to the following model (Equation ( 15)): where α 0 ∈ [0, 1] is a constant.The degree of inconsistency of the sign of the relative error of each single-prediction method in the same sample t is taken as different α 0 .The more serious the degree of inconsistency of the sign of the relative error is, the smaller the value of α 0 , and if it is completely consistent, then the value of α 0 is taken as 1.In this paper, the magnitude of α 0 is determined according to the ratio of the positive number and the negative number of the relative error in each method and takes a value of 0.9; ρ ij is the correlation coefficient of the prediction accuracy sequence of the ith single-prediction method and the jth single-prediction method, and when ρ ij ∈ (−1,1), the optimal solution of the combination-prediction model corresponding to the combination-prediction method is the superiority combination prediction.The optimal solution of the combination-prediction model is obtained based on the idea of a nonlinear programming solution.

The Planning and Solving Algorithm for the Combinatorial Predictive Modeling
The optimal solution of the composite forecasting model is obtained based on the Nonlinear Generalized Reduced Gradient (GRG) algorithm [38].The optimization problem can be simplified as follows (Equation ( 16)): where In the solution process, first, all components of X are decomposed into two parts, i.e., X = [X B , X N ] T , where X B is the basis vector of dimension m, and X N is the non-basis vector of dimension n-m; correspondingly, According to the implicit function theorem, it is known that there exists a continuous mapping X B = V(X N ); hence, the objective function F(X) is transformed into (X) = F(X B , X N ) = f (X N ), making the original objective function F(X) of n variables into a function f (X N ) of n-m variables.Therefore, the gradient of f (X N ) at X k with respect to X N is the reduced gradient; then, the reduced gradient of f (X N ) with respect to X N is The reduced gradient is denoted briefly as T ; when x k m+j = l m+j and r j > 0 or when x k m+j = u m+j and r j < 0, then s k j = 0; in other cases, = 0 to obtain the optimized value.

L1-Paradigm Hyperspectral Feature Selection
Feature selection is carried out by the L1 paradigm; the number of feature selection is set from 10 to 110 in steps of 10 with 11 gradients, and the coefficient of determination R 2 and the root mean square error RMSE of the training, validation, and test sets are used as the accuracy validations to analyze the optimal number of feature selection, and the results are shown in Figure 6.With the increase in the number of features, the coefficient of determination R 2 of the training set, validation set, and test set shows an overall increasing trend, and the root mean square error RMSE shows a decreasing trend, but after the number of features reaches 70, the upward trend of the coefficient of determination R 2 slows down, and the coefficient of determination of the validation set and the test set shows a decreasing trend, and the root mean square error shows a decreasing trend instead of an obvious increase after the number of features is 70, and the model's performance is not as high as the number of features in the training set, showing the generalization ability of the model.Among the 70 bands selected, the sensitive bands are mainly located at 400-600 nm, 700-900 nm, 1000-1100 nm, and 1400-1600 nm, which is in general agreement with the results of many studies [3,5,16].
In the solution process, first, all components of X are decomposed into two parts, i.e.,  =  ,  , where  is the basis vector of dimension m, and  is the non-basis vector of dimension n-m; correspondingly,  =  ,  ;  =  ,  .
According to the implicit function theorem, it is known that there exists a continuous mapping  = ( ) ; hence, the objective function () is transformed into () = ( ,  ) = ( ), making the original objective function () of n variables into a function ( ) of n-m variables.Therefore, the gradient of ( ) at  with respect to  is the reduced gradient; then, the reduced gradient of ( ) with respect to  is The reduced gradient is denoted briefly as ∇( ) =  ,  , ⋯ ,  ; define S =  ,  , ⋯ , ; when  =  and  0 or when  =  and  0 , then  = 0; in other cases,  = − .Make  =  + αS , then let  =  and iterate using the following formula:  =  − ∇ (  ) ( ,  ) , where  = 1,2, ⋯ , finally obtaining  that satisfies ( ,  ) = 0 to obtain the optimized value.

L1-Paradigm Hyperspectral Feature Selection
Feature selection is carried out by the L1 paradigm; the number of feature selection is set from 10 to 110 in steps of 10 with 11 gradients, and the coefficient of determination R 2 and the root mean square error RMSE of the training, validation, and test sets are used as the accuracy validations to analyze the optimal number of feature selection, and the results are shown in Figure 6.With the increase in the number of features, the coefficient of determination R 2 of the training set, validation set, and test set shows an overall increasing trend, and the root mean square error RMSE shows a decreasing trend, but after the number of features reaches 70, the upward trend of the coefficient of determination R 2 slows down, and the coefficient of determination of the validation set and the test set shows a decreasing trend, and the root mean square error shows a decreasing trend instead of an obvious increase after the number of features is 70, and the model's performance is not as high as the number of features in the training set, showing the generalization ability of the model.Among the 70 bands selected, the sensitive bands are mainly located at 400-600 nm, 700-900 nm, 1000-1100 nm, and 1400-1600 nm, which is in general agreement with the results of many studies [3,5,16].

Results of Single-Prediction Model
Numerous studies have shown that the LASSO, Ridge, MLP, RF, LSTM, CNN, and SVR models used in this study have good prediction results in terms of soil nutrient content, but different spectral pre-processing, different selection of characteristic bands, and different settings of each model parameter in different soil types can affect the prediction accuracy of the model for soil nutrient content [12,17,39].For example, Zhong Liang et al. [40] explored the accuracy of SOM content of LeNet-5, MLP-5, RF, SVR, and CNN models under different pre-processing, and the results showed that CNN showed a good prediction effect, and SVR was poorer than the other models.Wang Haifeng et al. [41] investigated the organic matter content of desert soil based on the grayscale correlation-ridge regression, and the results showed the ridge regression model could realize a better inversion effect by using only 4% of the full-spectrum band after the transformation of standard normal variables.Deng Yun et al. [22] compared and analyzed the modeling effects of LSTM, PLSR, SVR, and time-convolution networks in different spectral pre-processing of the organic matter content of red soil.Adding self-attention layers to the residual structure of time-convolutional networks can improve learning ability.

Combined Prediction Model
Based on the prediction results of eight single-prediction models, the mathematical expectation E i , standard deviation σ i , and correlation coefficient ρ ij of the prediction accuracy series of each single-prediction model are calculated.Based on the principle of the combined prediction model and the solution process, the obtained mathematical expectation E i , standard deviation σ i , and correlation coefficient ρ ij are substituted into the optimization model to obtain the maximum prediction validity of the combined model.The objective function max M(w 1 , w 2 , • • • , w 8 ) model is solved using the nonlinear Generalized Reduced Gradient (GRG) programming, and the results are as follows: w The prediction results of the combined prediction model are shown in Table 3. From Table 3, it can be seen that the mathematical expectation for the prediction accuracy of the combination model is 0.893, which is greater than the maximum prediction accuracy of a single model.The prediction standard deviation of the combination model is 0.129, which is smaller than the minimum value of the single model prediction standard deviation.The predictive validity of the combination model is 0.778, which is greater than the maximum predictive validity of a single model.Therefore, this combination model is an optimization model, which improves the prediction accuracy compared to a single-prediction model.Further analysis was conducted on the fitting degree between the true and predicted soil organic matter content values of the eight established single and combination models, and the fitting degree was evaluated through Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Coefficient of determination (R 2 ) (Figure 7).From Figure 7, it can be seen that the MAEs of the eight models are 1.496, 1.410, 1.567, 1.232, 1.478, 1.231, 1.509, and 1.614, respectively.The MAE of the combined prediction model is 1.161, which is smaller than the minimum MAE in a single model and a 19.487% decrease compared to the average MAE of the eight single models.The RMSE of the eight models are 1.898, 2.365, 2.139, 1.887, 1.878, 1.818, 2.028, and 2.034, respectively.The RMSE of the combined prediction model is 1.601, which is smaller than the minimum RMSE in a single model and a decrease of 20.189% compared to the average MAE of the eight single models.The R 2 values of the eight models were 0.879, 0.818, 0.900, 0.904, 0.881, 0.898, 0.860, and 0.863, respectively.The R 2 value of the combined prediction model reached 0.923, which was better than the maximum R 2 value in a single model and an increase of 5.486% in the average determination coefficient R 2 compared to the eight single models.The combination model has improved prediction accuracy compared to single models, which is consistent with the results of existing combination models [24].

Residual Analysis
Randomness and unpredictability are key components of any regression model, and residual analysis is a statistical technique used to evaluate whether the regression model accurately predicts observed data.In regression analysis, the relationship between the inde-

Residual Analysis
Randomness and unpredictability are key components of any regression model, and residual analysis is a statistical technique used to evaluate whether the regression model accurately predicts observed data.In regression analysis, the relationship between the independent and dependent variables is modeled, and the model is used to predict the value of the dependent variable [16].Residual is the difference between observed values and model predictions, which can be used to evaluate the fit and prediction accuracy of the model.This study tested the rationality of the model and the reliability of the data through residual plots (Figure 8).From Figure 8, it can be seen that the residuals of the eight single-prediction models and combination-prediction models are all randomly distributed without heteroscedasticity features and do not contain trend information.Both single and combination models are reasonable prediction models, and the residuals of the combination-prediction model are less than eight single-prediction models; 94.23% of the standardized residuals fall between −2 and 2, and the standardized residuals follow a normal distribution.

Discussion
In this study, nine methods, L1-paradigm feature selection, L2-paradigm feature selection, tree model feature selection, Kendall feature selection, Pearson feature selection, PCA downscaling, SPE downscaling, t-SNE downscaling, and MDE feature selection, were firstly selected for prediction modeling based on the same dataset, and among the nine feature selection algorithms, L1-paradigm feature selection had the highest prediction accuracy, the main reason being that L1-paradigm minimization made the coefficients of some features become 0, automatically selecting the most important features and eliminating irrelevant or redundant features, and adopting efficient solution methods, such as coordinate descent method and minimum angle regression, etc.What is more, L1-paradigm regularization can efficiently prevent overfitting and improve the model's generalization ability.For this reason, in this study based on the L1-paradigm feature selection method to select the appropriate number of features according to the trend of the model determination coefficient and the root mean square error with the number of features, a total of 70 sensitive bands were selected, accounting for 8.43% of the total number of bands.This greatly reduces the band information and solves the problem of a large number of bands and large computation in organic matter content prediction.
From the results of the prediction accuracy of single-prediction models (Table 2), it can be seen that for the same set of data sets, the prediction accuracy of different prediction methods on the prediction set, validation set, and training set shows inconsistent phenomena, and the phenomena of underfitting and overfitting may occur; for this reason, it is necessary to take advantage of the advantages of the individual single-prediction models, and innovatively construct the combination model of single models.In the single-prediction models for hyperspectral prediction of organic matter content, the evaluation of prediction accuracy is mostly based on the coefficient of determination, mean absolute error,

Discussion
In this study, nine methods, L1-paradigm feature selection, L2-paradigm feature selection, tree model feature selection, Kendall feature selection, Pearson feature selection, PCA downscaling, SPE downscaling, t-SNE downscaling, and MDE feature selection, were firstly selected for prediction modeling based on the same dataset, and among the nine feature selection algorithms, L1-paradigm feature selection had the highest prediction accuracy, the main reason being that L1-paradigm minimization made the coefficients of some features become 0, automatically selecting the most important features and eliminating irrelevant or redundant features, and adopting efficient solution methods, such as coordinate descent method and minimum angle regression, etc.What is more, L1-paradigm regularization can efficiently prevent overfitting and improve the model's generalization ability.For this reason, in this study based on the L1-paradigm feature selection method to select the appropriate number of features according to the trend of the model determination coefficient and the root mean square error with the number of features, a total of 70 sensitive bands were selected, accounting for 8.43% of the total number of bands.This greatly reduces the band information and solves the problem of a large number of bands and large computation in organic matter content prediction.
From the results of the prediction accuracy of single-prediction models (Table 2), it can be seen that for the same set of data sets, the prediction accuracy of different prediction methods on the prediction set, validation set, and training set shows inconsistent phenomena, and the phenomena of underfitting and overfitting may occur; for this reason, it is of the prediction accuracy in the single model 0.883, which is a non-inferior model; the weight coefficients of RF and CNN single-prediction models in the superior model are 0, which indicates that these two single-prediction models are redundant prediction methods compared to the combined prediction model obtained by the inverse error method, which is a good choice for improving the accuracy and generalization ability of the combined prediction model and also simplifying the complexity of the model.And the combined prediction model is at least a non-inferior prediction when the correlation coefficient ρ ij ∈ (−1, 1) of the prediction accuracy series of any two single-prediction models, the combined prediction method is superior combined prediction method, and the condition of correlation coefficient ρ ij ∈ (−1, 1) is easy to satisfy in general so that the combined prediction model makes use of information provided by the individual prediction models to a greater extent.This study is only a preliminary analysis of the combined prediction model in the prediction of organic matter content with respect to the prediction effectiveness indexes, and all the sample data selected in the construction of the combined prediction model are obtained by calculation, and the number of single-prediction models selected is also limited; the processing of the spectral data is also only first-order differentiation, and the feature extraction is only the L1-paradigm feature algorithm, and different fractional orders can be introduced in the next step.The next step could be to introduce different fractional-order differential processing techniques and multiple feature extraction and to select more single-prediction models for benchmarking in order to establish a more comprehensive and appropriate combined prediction model to improve the accuracy and generalization of prediction.A more comprehensive and appropriate combined prediction model is needed in the future to improve the accuracy of SOM prediction.

Conclusions
Compared with the single model, the combined forecasting model adopted in this paper can integrate the forecasting results of multiple single models, and make the final forecasting results more accurate through certain weight distribution.Because different models may be affected by different factors, the combined forecasting model can make full use of the advantages of each model to reduce the deviation of the prediction results of a single model.
Compared with the single model, the combination forecasting model adopted in this paper can reduce the risk of the single model by integrating multiple models, making the final forecasting results more stable and reliable.
The combination-prediction model based on prediction effectiveness used in this article is at least a non-inferiority combination model, which can determine whether a single-prediction model used in constructing a composite model is redundant, achieving higher prediction accuracy and model stability with fewer single models.

Figure 2 .
Figure 2. Spectral curve of first−order differential transformation.

Figure 2 .
Figure 2. Spectral curve of first−order differential transformation.

Figure 2 .
Figure 2. Spectral curve of first−order differential transformation.

Figure 6 .
Figure 6.Accuracy verification of the number of different feature bands.Figure 6. Accuracy verification of the number of different feature bands.

Figure 6 .
Figure 6.Accuracy verification of the number of different feature bands.Figure 6. Accuracy verification of the number of different feature bands.

Agronomy 2024 , 19 Figure 7 .
Figure 7.The degree of fit and error analysis between the predicted and measured values of each model.

Figure 7 .
Figure 7.The degree of fit and error analysis between the predicted and measured values of each model.

Figure 8 .
Figure 8. Residual analysis results of the model.

Table 1 .
Statistics of Organic Matter content in soil.

Table 2 .
Prediction accuracy of single-prediction models.

Table 3 .
Prediction accuracy of combination-prediction model.