# Using Near-Infrared Spectroscopy and Stacked Regression for the Simultaneous Determination of Fresh Cattle and Poultry Manure Chemical Properties

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

_{2}O

_{5}), K as potassium oxide (K

_{2}O), calcium (Ca) as calcium oxide (CaO), and magnesium (Mg) as magnesium oxide (MgO), among others.

^{13}C nuclear magnetic resonance (NMR) spectroscopy [10], solution NMR, X-ray absorption near-edge structure (XANES) spectroscopy [11], and near-infrared (NIR) spectroscopy [12]. However, each of these techniques has advantages and disadvantages. Solution

^{31}P NMR, for instance, can provide relevant data on the organic P forms in animal manures but not on the inorganic P solid phases. Organic and mineral P fractions of manure have also been identified using XANES spectroscopy. Since no liquid extraction is required, this method has an advantage over solution NMR [11]. In comparison to AES, AAS is more specific for some elements, coupled with the relative ease of its use. As for the disadvantage, AAS is not appropriate for P analysis, and only one element can be studied during each run. ICP-AES, on the other hand, enables rapid multi-element analysis. For many elements, its detection limits are comparable to or lower than those of AAS; nevertheless, compared to AAS, the costs associated with its initial acquisition, use, and maintenance are higher [9]. In general, the use of the abovementioned methods for assessment of various chemical nutrients present in animal manure require significant time and other resources (e.g., training, finances, etc.) to carry out the analyses.

_{4}) (designated as NH

_{4}in this study), total N (designated as N in this study), P

_{2}O

_{5}, CaO, MgO, and K

_{2}O. In so doing, we determined the most suitable techniques for the simultaneous determination of the abovementioned chemical components. Results of this study show that stacked regression that collated the performance of the various abovementioned machine learning techniques appears to be a robust machine algorithm for the simultaneous quantification of the seven chemical components in fresh cattle and poultry manure.

## 2. Materials and Methods

#### 2.1. Dataset

_{4}losses during storage and were homogenized in the laboratory by crushing them in their frozen state using a blender-cutter (Blixer Dito K45, Electrolux, Senlis, France) [22]. Further details about the materials and reagents, as well as equipment used are provided and explained explicitly in the manuscript by Gogé et al. [22]. A brief explanation of these is provided in the next sections.

#### 2.2. Equipment and Sample Analyses

_{4}, N, P

_{2}O

_{5}, CaO, MgO, and K

_{2}O. DM, initially at 40 °C, was oven dried at 103 ± 2 °C until it reached a constant weight. Total NH

_{4}and total N were measured by the Kjeldahl method of nitrogen analysis. P

_{2}O

_{5}, CaO, MgO, and K

_{2}O, on the other hand, were measured by ICP (Element XR Thermo Scientific, Waltham, MA, USA) of which only 158 out of 332 samples were analyzed. The descriptive statistics of the different chemical components are summarized in Table 1 [22].

#### 2.3. Data Preprocessing

_{4}, and N) and 110 (for P

_{2}O

_{5}, CaO, MgO, and K

_{2}O) for the training set, and 100 (for DM, NH

_{4}, and N) and 48 (for P

_{2}O

_{5}, CaO, MgO, and K

_{2}O) for the test set in a stratified manner based on the type of manure (cattle and poultry manure) so that each of the sets would have the same distribution as the original dataset of cattle and poultry manure before splitting. The rsample package version 1.0.0 in R was used in data splitting [23].

_{4}, and N in our training set. The training set was then randomly divided into 10 folds (sometimes called groups) of approximately equal size. Each k-1 (i.e., 9) fold was used for an analysis set and the left out (i.e., 1) fold (i.e., 10% of 232) was used for the assessment (testing) set. With the use of five repeats, there were five groups of 10 or a total of 50 splits created.

#### 2.4. Individual Machine Learning and Stacked Regression Analyses

- (i)
- SVR is a technique in which a model learns a variable’s importance for characterizing the relationship between the input and output. It formulates an optimization problem to learn a regression function that uses the input predictor variables and map these to the output responses. The optimization is represented by using support vectors (i.e., a small set of training data samples) where the optimization solution depends on the number of support vectors instead on the dimension of the input data [26]. Linear (SVRLin), polynomial (SVRPoly), and radial (SVRRad)-basis kernels were utilized in this study. SVR for linear, polynomial, and radial-basis kernels was performed using the ‘kernlab’ package version 0.9.30 in R [27,28].
- (ii)
- LASSO regression aims to identify the variables and the corresponding regression coefficients leading to a statistical model that minimizes the errors of prediction. This is achieved by imposing a constraint on the model parameters, thus, shrinking the regression coefficients toward zero [29].
- (iii)
- (iv)
- ENET provides a bridge between LASSO and RIDGE, thereby improving the prediction accuracy by shrinking some of the regression coefficients to approximately zero as the strength of the penalty parameter increases [32,33]. LASSO, RIDGE, and ENET were conducted using the ‘glmnet’ package version 4.1.2 in R [34,35].
- (v)
- PLS is a data reduction technique that compresses a large number of measured collinear variables into a few orthogonal latent variables (i.e., principal components). The optimum number of latent variables to be used in the analysis is then determined by minimizing the root mean square error (RMSE) between the predicted and observed response variables [36]. PLS was fitted using the ‘mixOmics’ package version 6.17.26 in R [37]
- (vi)
- RF builds a predictor ensemble using a set of decision trees that grow in randomly selected subspaces of data [38]. The random sampling and ensemble strategies utilized in this method enable it to achieve predictions and better generalizations [39]. The ‘random forest’ package version 4.7.1.1 in R was used for RF analysis [40].
- (vii)
- RPART is a regression method often used for the prediction of binary outcomes that avoids the assumptions of linearity [41]. It builds classification or regression models of a very general structure using a two-step process; the resulting models can be represented as binary trees. RPART was performed using the ‘rpart’ package version 4.1.16 in R [42]. RPART was performed using the ‘rpart’ package in R [42].
- (viii)
- XGB is a highly effective and widely used machine learning technique that combines multiple decision trees to create a more powerful model [43,44]. It builds trees in a serial manner, where each tree tries to correct the mistake of the previous one. Each tree can provide good predictions and, in the process, more and more trees are added to iteratively improve the performance of the predictive model [44]. XGB was conducted using the ‘xgboost’ package version 1.5.1.1 in R [45].

- (ix)
- Stacked regression is an ensemble learning technique that collates the performance of the abovementioned individual machine learning techniques to optimize model performance [48]. The R package ‘stacks’ version 0.2.3 is part of the tidymodels ecosystem and was used for stacked regression. Individual statistical models (e.g., support vector regression, linear regression (LASSO, RIDGE, and ENET), etc.) were first defined and formed as candidate members (SVRLin1, SVRPoly1, SVRRad1, etc.) of the ensemble (Level 1 models) with each having different parameter values or model configurations in which all of them share the same resampling and repeated k-fold cross-validation. The Level 1 models were then stacked together (data stack) in a tibble format where the first column was the true outcome in the training set and the rest of the columns were the predictions for each candidate member of the ensemble. A regularized model (elastic net) was then fitted on each of the candidate members’ predictions to figure out how they can be combined to predict the true outcome (Level 2 modeling). In this stage, the stacking coefficients were determined with non-zero values retained and became members of the model stack, which were then trained on the full training set. The final model stack was then used to make the final and ultimate predictions on the test set, which was set aside previously, and the performance metrics were then determined (Figure 1).

#### 2.5. Comparative Analysis of the Individual Machine Learning Techniques and Stacked Regression

^{2}) for each of the chemical constituents in the training and testing datasets for all machine learning techniques, as well as for that of the stacked regression. The F-test was also used to compare the RMSE values of each individual regression technique at a 95% level of significance. That is, to assess whether the two regression techniques are statistically significantly different, a method was adapted from the previous manuscript published by Payne and Wolfrum [50]. To do this, standard error (SE) values were first calculated between the two machine learning algorithms being compared (i.e., SE = RMSE

^{2})—these are the variance measures. The ratio of these two SE values (i.e., ratio = SE

_{2}/SE

_{1}) was then determined ensuring a value greater than 1.0. We calculated the critical F-value using the correct number of degrees of freedom (e.g., 231 for DM, NH

_{4}, and N; 109 for P

_{2}O

_{5}, CaO, MgO, and K

_{2}O) with a probability confidence level of 0.05. The calculated ratio was then compared with the F-value obtained at a 95% critical level of significance and using the correct number of degrees of freedom, as mentioned. Critical F-value calculations were performed using Free Statistics Calculators version 4.0 [51]. If the obtained ratio is less than the critical F-value, RMSE values are not significantly different. Detailed calculations comparing the ratio of the standard errors of the two different machine learning techniques with that of the critical F-value are provided in Tables S4 and S5.

## 3. Results

#### 3.1. Signal Pretreatment and Descriptive Statistics of the Chemical Components of Poultry and Cattle Manure

_{4}and N, while 158 samples were tested for P

_{2}O

_{5}, CaO, MgO and K

_{2}O. The unprocessed NIR spectra of the training and testing set, which plotted absorbance versus wavelength (500–2500 nm) are shown in Figure 2. Savitzky-Golay smoothing was applied for preprocessing to reduce the frequency noise while maintaining relevant spectral information (Figure 3). Similar to the descriptive statistics for the training set (Table 2) and testing set samples (Table 3), the descriptive statistics for the chemical components of the manure samples in percent fresh-weight basis (Table 1) reveal a wide range of values for each chemical constituent.

#### 3.2. Root Mean Square Error of Cross-Validation (RMSECV) and R^{2} Analyses of the Seven Chemical Components of Fresh Homogenized Samples in the Training Set

_{2}O components (RMSECV

_{MgO}= 0.074%, R

^{2}

_{MgO}= 0.786, RMSECV

_{K2O}= 0.252%, R

^{2}

_{K2O}= 0.820), most specifically using the RMSECV parameter. SVRPoly performed optimally well in DM (RMSECV

_{DM}= 4.543%, R

^{2}

_{DM}= 0.946). SVRRad, on the other hand, performed optimally in NH

_{4}, N, P

_{2}O

_{5}, and CaO (RMSECV

_{NH4}= 0.066%, R

^{2}

_{NH4}= 0.943, RMSECV

_{N}= 0.254%, R

^{2}

_{N}= 0.946, RMSECV

_{P2O5}= 0.176%, R

^{2}

_{P2O5}= 0.849, RMSECV

_{CaO}= 0.232%, R

^{2}

_{CaO}= 0.779) (Table 4 and Table 5). Overall, SVRPoly (RMSECV

_{average}= 0.817%, R

^{2}

_{average}= 0.851) and SVRRad (RMSECV

_{average}= 0.819%, R

^{2}

_{average}= 0.866) were the best-performing algorithms across all components in the training set. Overall, results of our study show that SVRPoly is not significantly different than that of the other variants of SVR (i.e., SVRLin and SVRRad), LASSO, as well as ENET by comparing the RMSECV values. SVRPoly is not, however, significantly different than that of RIDGE, PLS, RF, RPART, and XGB across all chemical components.

_{average}= 1.182%, R

^{2}

_{average}= 0.771) (Table 4 and Table 5). RPART (RMSECV

_{average}= 1.614%, R

^{2}

_{average}= 0.656) and RIDGE (RMSECV

_{average}= 1.748%, R

^{2}

_{average}= 0.736) were the least performing techniques in the training set across all chemical components (Table 4 and Table 5). SVRPoly is not significantly different than that of SVRRad using the RMSECV values for the DM chemical constituent. However, it was found that SVRPoly is significantly different than that of SVRLin and the rest of the other machine learning algorithms in the same chemical component.

_{4}and CaO, SVRRad is not significantly different than that of SVRPoly but is significantly different than that of SVRLin and all other machine learning techniques. For N, SVRRad is significantly different than that of all other algorithms.

_{2}O

_{5}. The SVRRad for this chemical component was found to be not significantly different than that of SVRLin but is significantly different than that of the rest of the machine learning algorithms. SVRLin is the most optimally performing machine learning technique for MgO and was found to be not significantly different than that of SVRPoly and SVRRad but is significantly different than that of the other machine learning algorithms. Similar to MgO, SVRLin was found to have garnered the most optimally performing algorithm in the training set for K

_{2}O. This SVRLin for this chemical constituent is not significantly different than that of the SVRPoly, SVRRad, LASSO, and ENET. However, it is significantly different than that of RIDGE, PLS, RF, RPART, and XGB.

#### 3.3. Root Mean Square Error of Prediction (RMSEP) and R^{2} Analyses of the Seven Chemical Components of Fresh Homogenized Samples in the Testing Set

_{MgO}= 0.078%, R

^{2}

_{MgO}= 0.837) (Table 6 and Table 7).

_{4}, N, P

_{2}O

_{5}, CaO, and K

_{2}O chemical constituents (RMSEP

_{DM}= 4.088%, R

^{2}

_{DM}= 0.965, RMSEP

_{NH4}= 0.055%, R

^{2}

_{NH4}= 0.966, RMSEP

_{N}= 0.217%, R

^{2}

_{N}= 0.965, RMSEP

_{P2O5}= 0.269%, R

^{2}

_{P2O5}= 0.875, RMSEP

_{CaO}= 0.309%, R

^{2}

_{CaO}= 0.743, RMSEP

_{K2O}= 0.373%, R

^{2}

_{K2O}= 0.736) (Table 6 and Table 7). For DM and NH4, stacked regression is significantly different than that of all other machine learning algorithms using the RMSEP values of the aforementioned algorithms. For N, on the other hand, stacked regression was found to be not significantly different than that of SVRRad, but is significantly different than that of all the other machine learning approaches. For P

_{2}O

_{5}and MgO, stacked regression was found to be not significantly different than that of SVR kernels, LASSO, ENET, and PLS, but is significantly different than that of RIDGE, RF, RPART, and XGB using their respective RMSEP values.

_{2}O and across all chemical components, stacked regression was found to be not significantly different than that of all other machine learning algorithms. Using the developed calibration model from the stacked regression, the predicted vs. measured concentrations of the chemical constituents show good linearity in the test set (Figure 4a–g).

_{2}O in the testing set (i.e., 1.044 % wt) (Table 3) with that of RMSEP

_{K2O}by stacked regression (i.e., 0.373 % wt) (Table 6) generated ~36% fluctuations in the K

_{2}O. Disparities in the RMSEP values relative to the mean value of the experimentally determined K

_{2}O, may be primarily due to the skewed distribution of the K

_{2}O chemical measurement results (Figure S10e), as well as the small sample size in the testing set (n = 48). Such disparities in the results could be further improved by taking K

_{2}O chemical measurements spanning wide concentration values, as well as increasing the number of samples in the testing set analyses, particularly the poultry manure samples. It should also be noted that during the splitting of the data into training and testing sets, we took into serious consideration an equal distribution of cattle and poultry manure samples in the aforementioned datasets to avoid biases. Such random stratification may lead to a skewed distribution of the K

_{2}O chemical measurements leading to fluctuations in the RMSEP values relative to the mean value of the experimentally generated K

_{2}O chemical results. This is an inherent disadvantage of data splitting. That is, the predictive accuracy of the model is primarily determined by the function of the resulting sample size as a result of data splitting [54]. Fluctuations in the RMSEP values relative to the mean values of the experimentally determined chemical results for the other components (e.g., P

_{2}O

_{5}, CaO, and MgO) may probably be explained by the same aforementioned justification (Figure S10d,f,g). It is also worth further exploring and considering the possible limitations of an ICP for the analysis of P

_{2}O

_{5}, CaO, MgO, and K

_{2}O that might lead to the abovementioned disparities in the results. Common limitations of an ICP (i.e., ICP-OES in particular) may include sample drift, poor precision, non-ideal limit of detection, and inaccurate identification that may limit accurate and precise analysis of the analyte of interest [55,56,57,58]. ICP-MS, on the other hand, may suffer from severe matrix effects [59].

^{2}) can be observed between the stacked regression predicted vs. measured concentrations for most of the chemical constituents, a lower R

^{2}value (i.e., 0.743) for the CaO component was obtained (Table 7). This may be attributed to several factors such as the skewed distribution of the CaO chemical measurement values (Figure S10d), as well as the small sample size in the testing set (n = 48). Thus, as mentioned earlier, this limitation can be improved by increasing the sample size of the testing set data and also expanding the concentration matrices to include a wide range of CaO measured values [54,55,56,57,58,59].

#### 3.4. Ratio of Performance to Deviation (RPD) Analyses of the Testing Test

_{4}, N, P

_{2}O

_{5}, and overall across all seven chemical constituents in the testing set (RPD

_{DM}= 4.745, RPD

_{NH4}= 5.002, RPD

_{N}= 5.062, RPD

_{P2O5}= 2.274, RPD

_{average}= 3.232) (Table 8). Fair models were obtained, on the other hand, for CaO and K

_{2}O using stacked regression (RPD

_{CaO}= 1.814, RPD

_{K2O}= 1.788). A fair model was also obtained for MgO using ENET (RPD

_{MgO}= 1.988). Overall, results using the RPD analyses show that the generated models in the testing set across all chemical components and machine learning techniques can be categorized as either excellent or fair with the stacked regression performing exceptionally robust across all chemical components (RPD

_{average}= 3.232) (Table 8).

## 4. Discussion

_{4}, N, P

_{2}O

_{5}, CaO, MgO, and K

_{2}O contents in both cattle and poultry manure collected from livestock production. While previous studies have traditionally utilized a PLS approach for the analysis of the abovementioned chemical constituents using NIR systems [14,60,61], alternative machine learning algorithms may provide better and more accurate results.

_{average}= 2.073) (Table 8), its performance is less superior as compared to the stacked regression technique.

^{2}and RMSE values. That is, models (i.e., machine learning algorithms) with the lowest RMSE and highest R

^{2}values were highly ranked (Figures S1–S7). As was evident, PLS was not the top-performing algorithm for each of the chemical components in the workflow rank. The top-performing models were not guaranteed to be included in the Level 2 modeling.

## 5. Conclusions

^{2}values were evaluated as excellent and outperformed several other machine learning techniques including PLS. Therefore, our study supports the use of stacked regression analysis as a stand-alone technique for analyzing poultry and cattle manure, exhibiting proof-of-principle and superior features amenable to machine learning.

## Supplementary Materials

_{4}) analysis for the stacked regression; Figure S3: Workflow rank of the machine learning technique used in the total nitrogen (N) analysis for the stacked regression; Figure S4: Workflow rank of the machine learning technique used in the P

_{2}O

_{5}analysis for the stacked regression; Figure S5: Workflow rank of the machine learning technique used in the CaO analysis for the stacked regression; Figure S6: Workflow rank of the machine learning technique used in the MgO analysis for the stacked regression; Figure S7: Workflow rank of the machine learning technique used in the K

_{2}O analysis for the stacked regression; Figure S8: Histograms for the 332 samples; Figure S9: Histograms for the 232 samples; Figure S10: Histograms for the 110 samples; Table S1: Ranges of hyperparameters used in tuning of best results for various machine learning techniques. A space-filling design with a grid number of 100 is used. There were 100 equally spaced values between (including) each hyperparameter’s minimum and maximum values that were used for tuning. For hyperparameters that are meaningful only when the values are integers, i.e., the latent variable (LV) in partial least squares (PLS), non-integer values are just skipped during tuning; Table S2: Optimized parameters obtained from different machine learning models; Table S3: The top 10 (or 7) highest weighted (stacking coefficient) members of a stacked ensemble of different models with non-zero coefficients for each of the chemical contents: dry matter (DM), total ammonium nitrogen (NH

_{4}), total nitrogen (N), phosphorus pentoxide (P

_{2}O

_{5}), calcium oxide (CaO), magnesium oxide (MgO), and potassium oxide (K

_{2}O); Table S4: Statistical significance table that compares the ratio of the standard errors between two algorithms with that of the critical F-value in the training set; Table S5: Statistical significance table that compares the ratio of the standard errors between two algorithms with that of the critical F-value in the testing set.

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

**ASTI**) of the Philippine Department of Science and Technology’s (

**DOST)**Computing and Archiving Research Environment (

**COARE**). We would like to thank Thierry Morvan and Youssef Fouad for providing us guidance as we navigate and use their datasets.

## Conflicts of Interest

## References

- Farooqi, Z.U.R.; Sabir, M.; Zeeshan, N.; Naveed, K.; Hussain, M.M. Enhancing Carbon Sequestration Using Organic Amendments and Agricultural Practices; IntechOpen: London, UK, 2018; ISBN 978-1-78923-765-8. [Google Scholar]
- Rahman, F.; Rahman, M.M.; Rahman, G.K.M.M.; Saleque, M.A.; Hossain, A.T.M.S.; Miah, M.G. Effect of Organic and Inorganic Fertilizers and Rice Straw on Carbon Sequestration and Soil Fertility under a Rice–Rice Cropping Pattern. Carbon Manag.
**2016**, 7, 41–53. [Google Scholar] [CrossRef] - Bhunia, S.; Bhowmik, A.; Mallick, R.; Mukherjee, J. Agronomic Efficiency of Animal-Derived Organic Fertilizers and Their Effects on Biology and Fertility of Soil: A Review. Agronomy
**2021**, 11, 823. [Google Scholar] [CrossRef] - Jiaying, M.; Tingting, C.; Jie, L.; Weimeng, F.; Baohua, F.; Guangyan, L.; Hubo, L.; Juncai, L.; Zhihai, W.; Longxing, T.; et al. Functions of Nitrogen, Phosphorus and Potassium in Energy Status and Their Influences on Rice Growth and Development. Rice Sci.
**2022**, 29, 166–178. [Google Scholar] [CrossRef] - MacDonald, J.M.; Ribaudo, M.; Livingston, M.; Beckman, J.; Huang, W. Manure Use for Fertilizer and for Energy: Report to Congress. Available online: http://www.ers.usda.gov/publications/pub-details/?pubid=42740 (accessed on 23 August 2022).
- Khoshnevisan, B.; Duan, N.; Tsapekos, P.; Awasthi, M.K.; Liu, Z.; Mohammadi, A.; Angelidaki, I.; Tsang, D.C.W.; Zhang, Z.; Pan, J.; et al. A Critical Review on Livestock Manure Biorefinery Technologies: Sustainability, Challenges, and Future Perspectives. Renew. Sustain. Energy Rev.
**2021**, 135, 110033. [Google Scholar] [CrossRef] - Pagliari, P.; Wilson, M.; He, Z. Animal Manure Production and Utilization: Impact of Modern Concentrated Animal Feeding Operations. In ASA Special Publications; Waldrip, H.M., Pagliari, P.H., He, Z., Eds.; American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America: Madison, WI, USA, 2020; pp. 1–14. ISBN 978-0-89118-371-6. [Google Scholar]
- Kacprzak, M.; Malińska, K.; Grosser, A.; Sobik-Szołtysek, J.; Wystalska, K.; Dróżdż, D.; Jasińska, A.; Meers, E. Cycles of Carbon, Nitrogen and Phosphorus in Poultry Manure Management Technologies—Environmental Aspects. Crit. Rev. Environ. Sci. Technol.
**2022**, 1–25. [Google Scholar] [CrossRef] - Peters, J.; Combs, S.; Hoskins, B.; Jarman, J.; Kovar, J.; Watson, M.; Wolf, A.; Wolf, N. Recommended Methods of Manure Analysis; State of Wisconsin Department of Agriculture, Trade and Consumer Protection: Madison, WI, USA, 2003; p. 62.
- He, Z.; Pagliari, P.H.; Waldrip, H.M. Applied and Environmental Chemistry of Animal Manure: A Review. Pedosphere
**2016**, 26, 779–816. [Google Scholar] [CrossRef] - Pagliari, P.H.; Laboski, C.A.M. Investigation of the Inorganic and Organic Phosphorus Forms in Animal Manure. J. Environ. Qual.
**2012**, 41, 901–910. [Google Scholar] [CrossRef] [PubMed] - Horf, M.; Vogel, S.; Drücker, H.; Gebbers, R.; Olfs, H.-W. Optical Spectrometry to Determine Nutrient Concentrations and Other Physicochemical Parameters in Liquid Organic Manures: A Review. Agronomy
**2022**, 12, 514. [Google Scholar] [CrossRef] - Horf, M.; Gebbers, R.; Vogel, S.; Ostermann, M.; Piepel, M.-F.; Olfs, H.-W. Determination of Nutrients in Liquid Manures and Biogas Digestates by Portable Energy-Dispersive X-Ray Fluorescence Spectrometry. Sensors
**2021**, 21, 3892. [Google Scholar] [CrossRef] - Feng, X.; Larson, R.A.; Digman, M.F. Evaluation of Near-Infrared Reflectance and Transflectance Sensing System for Predicting Manure Nutrients. Remote Sens.
**2022**, 14, 963. [Google Scholar] [CrossRef] - Chen, L.; Xing, L.; Han, L. Review of the Application of Near-Infrared Spectroscopy Technology to Determine the Chemical Composition of Animal Manure. J. Environ. Qual.
**2013**, 42, 1015–1028. [Google Scholar] [CrossRef] - Roggo, Y.; Chalus, P.; Maurer, L.; Lema-Martinez, C.; Edmond, A.; Jent, N. A Review of near Infrared Spectroscopy and Chemometrics in Pharmaceutical Technologies. J. Pharm. Biomed. Anal.
**2007**, 44, 683–700. [Google Scholar] [CrossRef] - Kumaravelu, C.; Gopal, A. A Review on the Applications of Near-Infrared Spectrometer and Chemometrics for the Agro-Food Processing Industries. In Proceedings of the 2015 IEEE Technological Innovation in ICT for Agriculture and Rural Development (TIAR), Chennai, India, 10–12 July 2015; pp. 8–12. [Google Scholar]
- Huang, G.; Han, L.; Yang, Z.; Wang, X. Evaluation of the Nutrient Metal Content in Chinese Animal Manure Compost Using near Infrared Spectroscopy (NIRS). Bioresour. Technol.
**2008**, 99, 8164–8169. [Google Scholar] [CrossRef] - Devianti, D.; Sufardi, S.; Mustaqimah, M.; Munawar, A.A. Near Infrared Technology in Agricultural Sustainability: Rapid Prediction of Nitrogen Content from Organic Fertilizer. Transdiscipl. J. Eng. Sci.
**2022**, 13, 1–12. [Google Scholar] [CrossRef] - Devianti, D.; Yusmanizar, Y.; Syakur, S.; Munawar, A.A.; Yunus, Y. Organic Fertilizer from Agricultural Waste: Determination of Phosphorus Content Using near Infrared Reflectance. OP Conf. Ser. Earth Environ. Sci.
**2021**, 644, 012002. [Google Scholar] [CrossRef] - Guindo, M.L.; Kabir, M.H.; Chen, R.; Liu, F. Particle Swarm Optimization and Multiple Stacked Generalizations to Detect Nitrogen and Organic-Matter in Organic-Fertilizer Using Vis-NIR. Sensors
**2021**, 21, 4882. [Google Scholar] [CrossRef] [PubMed] - Gogé, F.; Thuriès, L.; Fouad, Y.; Damay, N.; Davrieux, F.; Moussard, G.; Roux, C.L.; Trupin-Maudemain, S.; Valé, M.; Morvan, T. Dataset of Chemical and Near-Infrared Spectroscopy Measurements of Fresh and Dried Poultry and Cattle Manure. Data Brief
**2021**, 34, 106647. [Google Scholar] [CrossRef] [PubMed] - Silge, J.; Chow, F.; Kuhn, M.; Wickham, H. Rsample: General Resampling Infrastructure. 2022. Available online: https://rsample.tidymodels.org/ (accessed on 1 August 2022).
- Stevens, A.; Ramirez-Lopez, L. An Introduction to the Prospectr Package. 2022. Available online: https://github.com/l-ramirez-lopez/prospectr (accessed on 1 August 2022).
- Schmid, M.; Rath, D.; Diebold, U. Why and How Savitzky–Golay Filters Should Be Replaced. ACS Meas. Sci. Au
**2022**, 2, 185–196. [Google Scholar] [CrossRef] - Zhang, F.; O’Donnell, L.J. Support Vector Regression. In Machine Learning; Elsevier: Amsterdam, The Netherlands, 2020; pp. 123–140. ISBN 978-0-12-815739-8. [Google Scholar]
- Karatzoglou, A.; Smola, A.; Hornik, K.; Australia (NICTA), N.I.; Maniscalco, M.A.; Teo, C.H. Kernlab: Kernel-Based Machine Learning Lab. 2022. Available online: https://CRAN.R-project.org/package=kernlab (accessed on 1 August 2022).
- Karatzoglou, A.; Smola, A.; Hornik, K.; Zeileis, A. Kernlab-an S4 Package for Kernel Methods in R. J. Stat. Softw.
**2004**, 11, 1–20. [Google Scholar] [CrossRef] [Green Version] - Ranstam, J.; Cook, J.A. LASSO Regression. Br. J. Surg.
**2018**, 105, 1348. [Google Scholar] [CrossRef] - McDonald, G.C. Ridge Regression. WIREs Comput. Stat.
**2009**, 1, 93–100. [Google Scholar] [CrossRef] - Arashi, M.; Saleh, A.K.M.E.; Kibria, B.M.G. Theory of Ridge Regression Estimation with Applications; John Wiley & Sons: Hoboken, NJ, USA, 2019; ISBN 978-1-118-64452-2. [Google Scholar]
- Jin, B.; Lorenz, D.A.; Schiffler, S. Elastic-Net Regularization: Error Estimates and Active Set Methods. Inverse Probl.
**2009**, 25, 115022. [Google Scholar] [CrossRef] [Green Version] - Ciaburro, G. Regression Analysis with R: Design and Develop Statistical Nodes to Identify Unique Relationships within Data at Scale; Packt Publishing Ltd.: Birmingham, UK, 2018; ISBN 978-1-78862-270-7. [Google Scholar]
- Friedman, J.H.; Hastie, T.; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw.
**2010**, 33, 1–22. [Google Scholar] [CrossRef] [Green Version] - Simon, N.; Friedman, J.H.; Hastie, T.; Tibshirani, R. Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. J. Stat. Softw.
**2011**, 39, 1–13. [Google Scholar] [CrossRef] [PubMed] - Kassambara, A. Machine Learning Essentials: Practical Guide in R; 2018; ISBN 978-1-986406-85-7. Available online: https://books.google.com.hk/books?hl=zh-TW&lr=&id=745QDwAAQBAJ&oi=fnd&pg=PP2&dq=Machine+Learning+Essentials:+Practical+Guide+in+R+-+Alboukadel+Kassambara+-+Google+Books&ots=5EOsxRV1Mu&sig=CndMacT8zaX4mFhoM25OsMP3eEY&redir_esc=y#v=onepage&q=Machine%20Learning%20Essentials%3A%20Practical%20Guide%20in%20R%20-%20Alboukadel%20Kassambara%20-%20Google%20Books&f=false (accessed on 28 August 2022).
- Rohart, F.; Gautier, B.; Singh, A.; Lê Cao, K.-A. MixOmics: An R Package for ‘omics Feature Selection and Multiple Data Integration. PLOS Comput. Biol.
**2017**, 13, e1005752. [Google Scholar] [CrossRef] [Green Version] - Biau, G. Analysis of a Random Forests Model. J. Mach. Learn. Res.
**2012**, 13, 33. [Google Scholar] - Qi, Y. Random Forest for Bioinformatics. In Ensemble Machine Learning: Methods and Applications; Zhang, C., Ma, Y., Eds.; Springer: Boston, MA, USA, 2012; pp. 307–323. ISBN 978-1-4419-9326-7. [Google Scholar]
- Liaw, A.; Wiener, M. Classification and Regression by RandomForest. Forest
**2001**, 2, 18–22. [Google Scholar] - Newman, T.B.; Mcculloch, C.E. Statistical Interpretation of Data. In Goldman’s Cecil Medicine; Elsevier: Amsterdam, The Netherlands, 2012; pp. e1–e6. ISBN 978-1-4377-1604-7. [Google Scholar]
- Therneau, T.; Atkinson, B.; Port, B.R. (Producer of the initial R.; maintainer 1999–2017) Rpart: Recursive Partitioning and Regression Trees. 2022. Available online: https://cran.r-project.org/web/packages/rpart/ (accessed on 1 August 2022).
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery, New York, NY, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Müller, A.C.; Guido, S. Introduction to Machine Learning with Python: A Guide for Data Scientists; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2016; ISBN 978-1-4493-6989-7. [Google Scholar]
- Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Mitchell, R.; Cano, I.; Zhou, T.; et al. Xgboost: Extreme Gradient Boosting. 2022. Available online: https://github.com/dmlc/xgboost (accessed on 1 August 2022).
- Kuhn, M.; Vaughan, D. Parsnip: A Common API to Modeling and Analysis Functions. 2022. Available online: https://parsnip.tidymodels.org/ (accessed on 1 August 2022).
- Kuhn, M.; Vaughan, D. Tidymodels. Available online: https://www.tidymodels.org/ (accessed on 1 August 2022).
- Couch, S.P.; Kuhn, M. Stacks: Stacked Ensemble Modeling with Tidy Data Principles. J. Open Source Softw.
**2022**, 7, 4471. [Google Scholar] [CrossRef] - Faber, N.M. Estimating the Uncertainty in Estimates of Root Mean Square Error of Prediction: Application to Determining the Size of an Adequate Test Set in Multivariate Calibration. Chemom. Intell. Lab. Syst.
**1999**, 49, 79–89. [Google Scholar] [CrossRef] - Payne, C.E.; Wolfrum, E.J. Rapid Analysis of Composition and Reactivity in Cellulosic Biomass Feedstocks with Near-Infrared Spectroscopy. Biotechnol. Biofuels
**2015**, 8, 43. [Google Scholar] [CrossRef] [Green Version] - Free Critical F-Value Calculator—Free Statistics Calculators. Available online: https://www.danielsoper.com/statcalc/calculator.aspx?id=4 (accessed on 23 August 2022).
- Murphy, D.J.; O’Brien, B.; O’Donovan, M.; Condon, T.; Murphy, M.D. A near Infrared Spectroscopy Calibration for the Prediction of Fresh Grass Quality on Irish Pastures. Inf. Process. Agric.
**2022**, 9, 243–253. [Google Scholar] [CrossRef] - Chang, C.-W.; Laird, D.A.; Mausbach, M.J.; Hurburgh, C.R., Jr. Near-Infrared Reflectance Spectroscopy–Principal Components Regression Analyses of Soil Properties. Soil Sci. Soc. Am. J.
**2001**, 65, 480–490. [Google Scholar] [CrossRef] - Ette, E.I.; Williams, P.J. Pharmacometrics: The Science of Quantitative Pharmacology; John Wiley & Sons: Hoboken, NJ, USA, 2013; ISBN 978-1-118-67951-7. [Google Scholar]
- Levine, M. The Strengths and Limitations of ICP-OES Analysis. Available online: https://www.analyticalcannabis.com/articles/icp-oes-icp-chemistry-icp-oes-analysis-strengths-and-limitations-312835 (accessed on 28 August 2022).
- Nizio, K.D.; Harynuk, J.J. Analysis of Alkyl Phosphates in Petroleum Samples by Comprehensive Two-Dimensional Gas Chromatography with Nitrogen Phosphorus Detection and Post-Column Deans Switching. J. Chromatogr. A
**2012**, 1252, 171–176. [Google Scholar] [CrossRef] [PubMed] - Merson, S.; Evans, P. A High Accuracy Reference Method for the Determination of Minor Elements in Steel by ICP-OES. J. Anal. At. Spectrom.
**2003**, 18, 372–375. [Google Scholar] [CrossRef] - Jantzi, S.C.; Motto-Ros, V.; Trichard, F.; Markushin, Y.; Melikechi, N.; De Giacomo, A. Sample Treatment and Preparation for Laser-Induced Breakdown Spectroscopy. Spectrochim. Acta Part B At. Spectrosc.
**2016**, 115, 52–63. [Google Scholar] [CrossRef] - Olesik, J. ICP-OES Capabilities, Developments, Limitations, and Any Potential Challengers? Spectroscopy
**2020**, 35, 18–21. [Google Scholar] - Gogé, F.; Thuriès, L.; Fouad, Y.; Damay, N.; Davrieux, F.; Moussard, G.; Roux, C.L.; Trupin-Maudemain, S.; Valé, M.; Morvan, T. Performance of near Infrared Spectroscopy of a Solid Cattle and Poultry Manure Database Depends on the Sample Preparation and Regression Method Used. J. Near Infrared Spectrosc.
**2021**, 29, 226–235. [Google Scholar] [CrossRef] - Xing, L.; Chen, L.J.; Han, L.J. Rapid Analysis of Layer Manure Using Near-Infrared Reflectance Spectroscopy. Poult. Sci.
**2008**, 87, 1281–1286. [Google Scholar] [CrossRef] - Pirouz, D.M. An Overview of Partial Least Squares. SSRN Electron. J.
**2006**, 1–16. [Google Scholar] [CrossRef] [Green Version] - Trygg, J.; Wold, S. O2-PLS, a Two-Block (X-Y) Latent Variable Regression (LVR) Method with an Integral OSC Filter. J. Chemom.
**2003**, 17, 53–64. [Google Scholar] [CrossRef] - Xia, Y. Chapter Eleven—Correlation and Association Analyses in Microbiome Study Integrating Multiomics in Health and Disease. In Progress in Molecular Biology and Translational Science; Sun, J., Ed.; The Microbiome in Health and Disease; Academic Press: Cambridge, MA, USA, 2020; Volume 171, pp. 309–491. [Google Scholar]
- Solomon, K.R.; Brock, T.C.M.; Zwart, D.D.; Dyer, S.D.; Posthuma, L.; Richards, S.; Sanderson, H.; Sibley, P.; van den Brink, P.J. Extrapolation Practice for Ecotoxicological Effect Characterization of Chemicals; CRC Press: Boca Raton, FL, USA, 2008; ISBN 978-1-4200-7392-8. [Google Scholar]
- Willaby, H.W.; Costa, D.S.J.; Burns, B.D.; MacCann, C.; Roberts, R.D. Testing Complex Models with Small Sample Sizes: A Historical Overview and Empirical Demonstration of What Partial Least Squares (PLS) Can Offer Differential Psychology. Pers. Individ. Differ.
**2015**, 84, 73–78. [Google Scholar] [CrossRef] - Su, R.; Liu, X.; Xiao, G.; Wei, L. Meta-GDBP: A High-Level Stacked Regression Model to Improve Anticancer Drug Response Prediction. Brief. Bioinform.
**2020**, 21, 996–1005. [Google Scholar] [CrossRef] [PubMed] - Fu, P.; Meacham-Hensold, K.; Guan, K.; Bernacchi, C.J. Hyperspectral Leaf Reflectance as Proxy for Photosynthetic Capacities: An Ensemble Approach Based on Multiple Machine Learning Algorithms. Front. Plant Sci.
**2019**, 10, 730. [Google Scholar] [CrossRef] - Zhang, K.; Zhu, D.; Li, J.; Gao, X.; Gao, F.; Lu, J. Learning Stacking Regression for No-Reference Super-Resolution Image Quality Assessment. Signal Process.
**2021**, 178, 107771. [Google Scholar] [CrossRef] - Kessy, S.R.; Sherris, M.; Villegas, A.M.; Ziveyi, J. Mortality Forecasting Using Stacked Regression Ensembles. Scand. Actuar. J.
**2021**, 2022, 591–626. [Google Scholar] [CrossRef] - Cheng, Q.; Xu, H.; Fei, S.; Li, Z.; Chen, Z. Estimation of Maize LAI Using Ensemble Learning and UAV Multispectral Imagery under Different Water and Fertilizer Treatments. Agriculture
**2022**, 12, 1267. [Google Scholar] [CrossRef] - Seireg, H.R.; Omar, Y.M.K.; El-Samie, F.E.A.; El-Fishawy, A.S.; Elmahalawy, A. Ensemble Machine Learning Techniques Using Computer Simulation Data for Wild Blueberry Yield Prediction. IEEE Access
**2022**, 10, 64671–64687. [Google Scholar] [CrossRef] - Anbananthen, K.S.M.; Subbiah, S.; Chelliah, D.; Sivakumar, P.; Somasundaram, V.; Velshankar, K.H.; Khan, M.K.A.A. An Intelligent Decision Support System for Crop Yield Prediction Using Hybrid Machine Learning Algorithms. F1000Research
**2021**, 10, 1143. [Google Scholar] [CrossRef]

**Figure 1.**The visual outline of the steps in stack regression. The models were first defined in which each Level 1 model has different parameter values. The models with different configurations were then stacked together and a regularized model was then fitted on each of the candidate members to determine which members have non-zero coefficients that are to be used for final and ultimate predictions (SVRLin = support vector regression with linear kernel; SVRPoly = support vector regression with polynomial kernel; SVRRad = support vector regression with radial kernel; LASSO = least absolute shrinkage and selection operator; RIDGE = ridge regression; ENET = elastic net regression; PLS = partial least squares; RF = random forests; RPART = recursive partitioning and regression trees; XGB = boosted trees) [48].

**Figure 2.**Near-infrared spectra of fresh homogenized cattle and poultry manure samples as shown in different line colors.

**Figure 3.**Near-infrared spectra after standardization (mean = 0, standard deviation = 1) and Savitzky-Golay smoothing of fresh homogenized cattle and poultry manure samples as shown in different line colors.

**Figure 4.**Predicted vs. measured concentrations of chemical constituents expressed as % wet weight of the fresh cattle and poultry manure for (

**a**) dry matter, (

**b**) total ammonium nitrogen (NH

_{4}), (

**c**) total nitrogen (N), (

**d**) P

_{2}O

_{5}, (

**e**) CaO, (

**f**) MgO, and (

**g**) K

_{2}O using the stacked regression ensemble approach.

**Table 1.**Descriptive statistics of the chemical components of poultry and cattle manure in fresh-weight basis (%) (n = number of samples, sd = standard deviation, min = minimum value, max = maximum value).

Chemicals | n | Mean | Median | sd | Min | Max |
---|---|---|---|---|---|---|

DM | 332 | 37.285 | 27.885 | 20.063 | 11.255 | 82.480 |

NH_{4} | 332 | 0.262 | 0.095 | 0.276 | 0.001 | 1.086 |

N | 332 | 1.369 | 0.672 | 1.093 | 0.255 | 4.152 |

P_{2}O_{5} | 158 | 0.477 | 0.224 | 0.585 | 0.091 | 3.020 |

CaO | 158 | 0.575 | 0.330 | 0.556 | 0.094 | 3.108 |

MgO | 158 | 0.227 | 0.156 | 0.186 | 0.062 | 1.054 |

K_{2}O | 158 | 1.022 | 0.826 | 0.648 | 0.187 | 3.845 |

**Table 2.**Descriptive statistics of the chemical components of poultry and cattle manure in fresh-weight basis (%) of the 232 (for dry matter (DM), total ammonium nitrogen (NH

_{4}), and total nitrogen (N)) and 110 (for P

_{2}O

_{5}, CaO, MgO, and K

_{2}O) samples for the training set. (n = number of samples, sd = standard deviation, min = minimum value, max = maximum value.)

Chemicals | n | Mean | Median | sd | Min | Max |
---|---|---|---|---|---|---|

DM | 232 | 38.035 | 28.972 | 20.340 | 12.690 | 81.990 |

NH_{4} | 232 | 0.262 | 0.091 | 0.277 | 0.001 | 0.968 |

N | 232 | 1.384 | 0.675 | 1.092 | 0.311 | 4.152 |

P_{2}O_{5} | 110 | 0.468 | 0.225 | 0.575 | 0.098 | 3.020 |

CaO | 110 | 0.563 | 0.329 | 0.556 | 0.094 | 3.108 |

MgO | 110 | 0.222 | 0.147 | 0.190 | 0.068 | 1.054 |

K_{2}O | 110 | 1.013 | 0.862 | 0.642 | 0.187 | 3.845 |

**Table 3.**Descriptive statistics of the chemical components of poultry and cattle manure in fresh-weight basis (%) of the 100 (for dry matter (DM), total ammonium nitrogen (NH

_{4}), and total nitrogen (N)) and 48 (for P

_{2}O

_{5}, Cao, MgO, and K

_{2}O) samples for the test set, which are then set aside for the final arbitration on the performance of the different models. (n = number of samples, sd = standard deviation, min = minimum value, max = maximum value.)

Chemicals | n | Mean | Median | sd | Min | Max |
---|---|---|---|---|---|---|

DM | 100 | 35.546 | 27.240 | 19.396 | 11.255 | 82.480 |

NH_{4} | 100 | 0.262 | 0.105 | 0.276 | 0.001 | 1.086 |

N | 100 | 1.334 | 0.663 | 1.099 | 0.255 | 3.650 |

P_{2}O_{5} | 48 | 0.497 | 0.211 | 0.612 | 0.091 | 2.437 |

CaO | 48 | 0.602 | 0.401 | 0.561 | 0.120 | 2.503 |

MgO | 48 | 0.237 | 0.176 | 0.179 | 0.062 | 0.705 |

K_{2}O | 48 | 1.044 | 0.825 | 0.668 | 0.307 | 2.649 |

**Table 4.**Comparison of the root mean square error of cross validation (RMSECV) (% wet weight) among the seven chemical components (i.e., dry matter (DM), total ammonium nitrogen (NH

_{4}), total nitrogen (N), P

_{2}O

_{5}, CaO, MgO, and K

_{2}O) of the fresh homogenized samples using various machine learning techniques (SVRLin = support vector regression with linear kernel; SVRPoly = support vector regression with polynomial kernel; SVRRad = support vector regression with radial kernel; LASSO = least absolute shrinkage and selection operator; RIDGE = ridge regression; ENET = elastic net regression; PLS = partial least squares; RF = random forests; RPART = recursive partitioning and regression trees; XGB = boosted trees). Best results are indicated in bold.

Algorithm | DM | NH_{4} | N | P_{2}O_{5} | CaO | MgO | K_{2}O | Average |
---|---|---|---|---|---|---|---|---|

SVRLin | 5.461 | 0.077 | 0.291 | 0.198 | 0.300 | 0.074 | 0.252 | 0.950 |

SVRPoly | 4.543 | 0.070 | 0.296 | 0.207 | 0.264 | 0.077 | 0.258 | 0.817 |

SVRRad | 4.656 | 0.066 | 0.254 | 0.176 | 0.232 | 0.079 | 0.269 | 0.819 |

LASSO | 5.930 | 0.087 | 0.315 | 0.218 | 0.313 | 0.090 | 0.283 | 1.034 |

RIDGE | 10.189 | 0.128 | 0.536 | 0.289 | 0.364 | 0.108 | 0.624 | 1.748 |

ENET | 5.928 | 0.087 | 0.315 | 0.218 | 0.312 | 0.089 | 0.284 | 1.033 |

PLS | 6.787 | 0.093 | 0.387 | 0.256 | 0.352 | 0.106 | 0.296 | 1.182 |

RF | 6.880 | 0.092 | 0.380 | 0.285 | 0.317 | 0.093 | 0.348 | 1.199 |

RPART | 9.338 | 0.126 | 0.540 | 0.373 | 0.365 | 0.112 | 0.446 | 1.614 |

XGB | 5.683 | 0.082 | 0.346 | 0.243 | 0.303 | 0.092 | 0.326 | 1.011 |

**Table 5.**Comparison of R

^{2}in the training set among the seven chemical components: dry matter (DM), total ammonium nitrogen (NH

_{4}), total nitrogen (N), P

_{2}O

_{5}, CaO, MgO, and K

_{2}O of the fresh homogenized samples using various machine learning techniques (SVRLin = support vector regression with linear kernel; SVRPoly = support vector regression with polynomial kernel; SVRRad = support vector regression with radial kernel; LASSO = least absolute shrinkage and selection operator; RIDGE = ridge regression; ENET = elastic net regression; PLS = partial least squares; RF = random forests; RPART = recursive partitioning and regression trees; XGB = boosted trees). Best results are indicated in bold.

Algorithm | DM | NH_{4} | N | P_{2}O_{5} | CaO | MgO | K_{2}O | Average |
---|---|---|---|---|---|---|---|---|

SVRLin | 0.923 | 0.922 | 0.930 | 0.818 | 0.661 | 0.786 | 0.820 | 0.837 |

SVRPoly | 0.946 | 0.937 | 0.928 | 0.817 | 0.713 | 0.806 | 0.810 | 0.851 |

SVRRad | 0.945 | 0.943 | 0.946 | 0.849 | 0.779 | 0.817 | 0.783 | 0.866 |

LASSO | 0.910 | 0.900 | 0.918 | 0.796 | 0.652 | 0.743 | 0.776 | 0.814 |

RIDGE | 0.795 | 0.813 | 0.801 | 0.748 | 0.590 | 0.684 | 0.720 | 0.736 |

ENET | 0.910 | 0.901 | 0.918 | 0.797 | 0.653 | 0.748 | 0.775 | 0.815 |

PLS | 0.885 | 0.891 | 0.879 | 0.733 | 0.586 | 0.675 | 0.749 | 0.771 |

RF | 0.875 | 0.886 | 0.880 | 0.729 | 0.648 | 0.762 | 0.695 | 0.782 |

RPART | 0.775 | 0.789 | 0.757 | 0.584 | 0.529 | 0.645 | 0.509 | 0.656 |

XGB | 0.917 | 0.911 | 0.899 | 0.770 | 0.672 | 0.748 | 0.721 | 0.805 |

**Table 6.**Comparison of the root mean square error of prediction (RMSEP) (% wet weight) among the seven chemical components: dry matter (DM), total ammonium nitrogen (NH

_{4}), total nitrogen (N), P

_{2}O

_{5}, CaO, MgO, and K

_{2}O of the fresh homogenized samples using various machine learning techniques (SVRLin = support vector regression with linear kernel; SVRPoly = support vector regression with polynomial kernel; SVRRad = support vector regression with radial kernel; LASSO = least absolute shrinkage and selection operator; RIDGE = ridge regression; ENET = elastic net regression; PLS = partial least squares; RF = random forests; RPART = recursive partitioning and regression trees; XGB = boosted trees). Best results are indicated in bold.

Algorithm | DM | NH_{4} | N | P_{2}O_{5} | CaO | MgO | K_{2}O | Average |
---|---|---|---|---|---|---|---|---|

SVRLin | 6.909 | 0.075 | 0.343 | 0.306 | 0.322 | 0.096 | 0.399 | 1.207 |

SVRPoly | 5.158 | 0.078 | 0.346 | 0.307 | 0.410 | 0.091 | 0.398 | 0.970 |

SVRRad | 5.005 | 0.091 | 0.252 | 0.275 | 0.373 | 0.078 | 0.374 | 0.921 |

LASSO | 7.154 | 0.082 | 0.368 | 0.317 | 0.335 | 0.091 | 0.412 | 1.251 |

RIDGE | 9.307 | 0.102 | 0.505 | 0.390 | 0.394 | 0.107 | 0.472 | 1.611 |

ENET | 7.103 | 0.083 | 0.370 | 0.317 | 0.335 | 0.090 | 0.412 | 1.244 |

PLS | 8.647 | 0.097 | 0.449 | 0.320 | 0.349 | 0.092 | 0.441 | 1.485 |

RF | 5.987 | 0.091 | 0.339 | 0.449 | 0.380 | 0.121 | 0.471 | 1.120 |

RPART | 11.284 | 0.130 | 0.566 | 0.507 | 0.377 | 0.145 | 0.466 | 1.925 |

XGB | 5.642 | 0.082 | 0.279 | 0.458 | 0.407 | 0.133 | 0.414 | 1.059 |

Stack Reg | 4.088 | 0.055 | 0.217 | 0.269 | 0.309 | 0.092 | 0.373 | 0.772 |

**Table 7.**Comparison of the R

^{2}in the testing set among the seven chemical components: dry matter (DM), total ammonium nitrogen (NH

_{4}), total nitrogen (N), P

_{2}O

_{5}, CaO, MgO, and K

_{2}O of the fresh homogenized samples using various machine learning techniques (SVRLin = support vector regression with linear kernel; SVRPoly = support vector regression with polynomial kernel; SVRRad = support vector regression with radial kernel; LASSO = least absolute shrinkage and selection operator; RIDGE = ridge regression; ENET = elastic net regression; PLS = partial least squares; RF = random forests; RPART = recursive partitioning and regression trees; XGB = boosted trees). Best results are indicated in bold.

Algorithm | DM | NH_{4} | N | P_{2}O_{5} | CaO | MgO | K_{2}O | Average |
---|---|---|---|---|---|---|---|---|

SVRLin | 0.919 | 0.944 | 0.928 | 0.770 | 0.673 | 0.716 | 0.656 | 0.801 |

SVRPoly | 0.948 | 0.946 | 0.924 | 0.772 | 0.470 | 0.766 | 0.658 | 0.783 |

SVRRad | 0.950 | 0.896 | 0.951 | 0.852 | 0.564 | 0.837 | 0.689 | 0.820 |

LASSO | 0.892 | 0.924 | 0.900 | 0.781 | 0.649 | 0.761 | 0.624 | 0.790 |

RIDGE | 0.812 | 0.868 | 0.808 | 0.660 | 0.527 | 0.678 | 0.517 | 0.696 |

ENET | 0.894 | 0.923 | 0.899 | 0.781 | 0.648 | 0.771 | 0.624 | 0.792 |

PLS | 0.839 | 0.894 | 0.851 | 0.742 | 0.611 | 0.748 | 0.557 | 0.749 |

RF | 0.915 | 0.897 | 0.913 | 0.568 | 0.676 | 0.709 | 0.677 | 0.765 |

RPART | 0.683 | 0.787 | 0.752 | 0.351 | 0.594 | 0.433 | 0.539 | 0.591 |

XGB | 0.924 | 0.916 | 0.937 | 0.658 | 0.552 | 0.612 | 0.729 | 0.761 |

Stack Reg | 0.965 | 0.966 | 0.965 | 0.875 | 0.743 | 0.792 | 0.736 | 0.863 |

**Table 8.**Comparison of the ratio of performance to deviation (RPD) in the testing set among the seven chemical components: dry matter (DM), total ammonium nitrogen (NH

_{4}), total nitrogen (N), P

_{2}O

_{5}, CaO, MgO and K

_{2}O of the fresh homogenized samples using various machine learning techniques (SVRLin = support vector regression with linear kernel; SVRPoly = support vector regression with polynomial kernel; SVRRad = support vector regression with radial kernel; LASSO = least absolute shrinkage and selection operator; RIDGE = ridge regression; ENET = elastic net regression; PLS = partial least squares; RF = random forests; RPART = recursive partitioning and regression trees; XGB = boosted trees). Best results are indicated in bold.

Algorithm | DM | NH_{4} | N | P_{2}O_{5} | CaO | MgO | K_{2}O | Average |

SVRLin | 2.807 | 3.545 | 3.202 | 2.000 | 1.745 | 1.855 | 1.674 | 2.404 |

SVRPoly | 3.761 | 3.545 | 3.175 | 1.993 | 1.370 | 1.965 | 1.676 | 2.498 |

SVRRad | 3.875 | 3.042 | 4.355 | 2.223 | 1.503 | 2.293 | 1.783 | 2.725 |

LASSO | 2.711 | 3.357 | 2.985 | 1.931 | 1.677 | 1.962 | 1.621 | 2.321 |

RIDGE | 2.084 | 2.704 | 2.175 | 1.570 | 1.423 | 1.664 | 1.415 | 1.862 |

ENET | 2.731 | 3.348 | 2.973 | 1.930 | 1.675 | 1.988 | 1.620 | 2.324 |

PLS | 2.243 | 2.841 | 2.450 | 1.913 | 1.609 | 1.938 | 1.514 | 2.073 |

RF | 3.240 | 3.054 | 3.241 | 1.363 | 1.475 | 1.478 | 1.417 | 2.181 |

RPART | 1.719 | 2.124 | 1.942 | 1.208 | 1.487 | 1.236 | 1.432 | 1.593 |

XGB | 3.438 | 3.357 | 3.943 | 1.336 | 1.380 | 1.347 | 1.612 | 2.345 |

Stack Reg | 4.745 | 5.002 | 5.062 | 2.274 | 1.814 | 1.938 | 1.788 | 3.232 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Cobbinah, E.; Generalao, O.; Lageshetty, S.K.; Adrianto, I.; Singh, S.; Dumancas, G.G.
Using Near-Infrared Spectroscopy and Stacked Regression for the Simultaneous Determination of Fresh Cattle and Poultry Manure Chemical Properties. *Chemosensors* **2022**, *10*, 410.
https://doi.org/10.3390/chemosensors10100410

**AMA Style**

Cobbinah E, Generalao O, Lageshetty SK, Adrianto I, Singh S, Dumancas GG.
Using Near-Infrared Spectroscopy and Stacked Regression for the Simultaneous Determination of Fresh Cattle and Poultry Manure Chemical Properties. *Chemosensors*. 2022; 10(10):410.
https://doi.org/10.3390/chemosensors10100410

**Chicago/Turabian Style**

Cobbinah, Elizabeth, Oliver Generalao, Sathish Kumar Lageshetty, Indra Adrianto, Seema Singh, and Gerard G. Dumancas.
2022. "Using Near-Infrared Spectroscopy and Stacked Regression for the Simultaneous Determination of Fresh Cattle and Poultry Manure Chemical Properties" *Chemosensors* 10, no. 10: 410.
https://doi.org/10.3390/chemosensors10100410