Determination of Bio-Based Fertilizer Composition Using Combined NIR and MIR Spectroscopy: A Model Averaging Approach

Application of bio-based fertilizers is considered a practical solution to enhance soil fertility and maintain soil quality. However, the composition of bio-based fertilizers needs to be quantified before their application to the soil. Non-destructive techniques such as near-infrared (NIR) and mid-infrared (MIR) are generally used to quantify the composition of bio-based fertilizers in a speedy and cost-effective manner. However, the prediction performances of these techniques need to be quantified before deployment. With this motive, this study investigates the potential of these techniques to characterize a diverse set of bio-based fertilizers for 25 different properties including nutrients, minerals, heavy metals, pH, and EC. A partial least square model with wavelength selection is employed to estimate each property of interest. Then a model averaging, approach is tested to examine if combining model outcomes of NIR with MIR could improve the prediction performances of these sensors. In total, 17 of the 25 elements could be predicted to have a good performance status using individual spectral methods. Combining model outcomes of NIR with MIR resulted in an improvement, increasing the number of properties that could be predicted from 17 to 21. Most notably the improvement in prediction performance was observed for Cd, Cr, Zn, Al, Ca, Fe, S, Cu, Ec, and Na. It was concluded that the combined use of NIR and MIR spectral methods can be used to monitor the composition of a diverse set of bio-based fertilizers.


Introduction
With an ever-increasing world population, the demand for food is growing at an alarming rate [1]. To meet this increasing demand for food, agricultural production thrived on the use of chemical fertilizers during the past few decades to improve soil fertility and help increase crop yield [2]. However, the use of chemical fertilizers is not without drawbacks. Chemical fertilizers are based on limited natural resources (e.g., phosphorus P) or energy-consuming chemical processes (e.g., Nitrogen N). The excessive use of chemical fertilizers can also negatively impact the environment through eutrophication and acidification [3,4].
To address the negative impacts of synthetic fertilizer, modern agriculture is exploring the application of bio-based fertilizers. Bio-based fertilizers have a long history of use in agriculture, including compost, manure, and bio-solids. However, during the post world war 2 (WW2) era, an increasing focus is on chemical fertilizers due to their effectiveness in raising crop production by rapid release of nutrients, abundant and low-cost availability, and ease of application [5]. Most bio-based fertilizers, on the other hand, have slow-release characteristics that may reduce the risk of environmentally deleterious losses. Bio-based fertilizers contain not only major nutrients (nitrogen (N), phosphorous (P), and potassium in non-homogeneous samples, and the size of the sample greatly affects its prediction capability [35]. Rourke et al. [36] and Javadi et al. [37] concluded that NIR and MIR spectroscopy techniques individually have the potential to predict certain elements with improved prediction accuracy while not predicting others. Hence, a potentially improved multi-sensor fusion approach could be realized by taking advantage of both NIR and MIR spectroscopy to quantify a range of nutrients in bio-based fertilizers.
To use the useful information present in two or more spectral sensors, different approaches like spectral fusion and model averaging can be utilized. Spectral fusion means combining spectra from two or more sensors and is expected to improve prediction accuracy. Previously, Wang et al. [31] showed that spectral fusion through concatenation of portable X-ray fluorescence (pXRF) with visible near-infrared (Vis-NIR) can improve the prediction capability for total carbon (TC) and total nitrogen (TN). A similar approach has been proposed by Aldabaa et al. [38] and Chakraborty et al. [39] for soil analysis, where the fused spectral model outperformed the models that utilized individual spectral information.
The fusion of spectral information from NIR and MIR can increase the useful information about a particular component, as well as amplify redundant and unwanted signals. The inclusion of unwanted signals therefore sometimes results in a reduction in prediction capability and can make the prediction model more complex [36,40]. The redundant and unwanted signals can be eliminated by using different wavelength selection techniques. In wavelength selection techniques only those wavelengths in the NIR and MIR spectral range, which are highly correlated with the response variables are selected and the unwanted signals are discarded. This selection of useful wavelengths is expected to improve the individual prediction from both sensors as well as reduce the redundancy in information.
Alternatively, the model averaging technique proposed by Granger et al. [41] can be used to overcome the problem associated with spectral fusion [36,42]. In the model averaging the individual prediction results from different sensors is combined and is expected to enhance prediction due to the complementary nature of the sensors used.
For use of bio-based fertilizers, a non-destructive measurement technique is needed that can determine a wide range of components, like nutrients, their plant-available forms as well as some minerals and heavy metals with sufficient accuracy and precision. A literature review revealed: (1) that NIR and MIR-based techniques could be used to that end, but will not be sufficient on an individual basis. Improved performance is expected from the fusion of information from both sensors. (2) so far NIR and MIR-based techniques have been investigated on a very limited set of chemical constituents of a limited set of different bio-based fertilizers.
Thus, this work contributes by sensing the contents of 25 nutrients in different biobased fertilizers (manure, bio-solids, plant residues, and composts) using NIR and MIR spectroscopy. Furthermore, for improvement of estimations, a wavelength selection method followed by the model averaging technique is investigated to get the benefits of fusing the results from NIR and MIR sensors.

Materials and Methods
A dataset of 85 amendments was taken from Farrel et al. [43], described in Baldock et al. [27]. The data set describes 85 different bio-based fertilizers, including 50 composts from different composting facilities across Australia, 6 manure samples from different animals (cow, pig), 10 fresh plant residues derived from the major Australian crop species and some alternative species, and 19 biosolids obtained from a range of urban and rural wastewater plants. These spectral datasets enabled the testing of the robustness of prediction models using NIR and MIR spectra and model averaging.

Chemical Analysis
Data on the nutrient content and other chemical properties of the 85 bio-based fertilizers were obtained using standardized chemical analysis [43]. Briefly, pH and electrical conductivity (EC) were quantified using standard electrodes in a solid: water slurry (1:5 w/v).
Total nitrogen (N) was quantified by high-temperature combustion analysis (Leco CNS 2000, Leco Corporation, St Joseph, MI, USA). The available free amino acid N (FAA-N) and ammonium-N (NH4-N) were quantified in 1:5 w/v water extracts by fluorimetry and colorimetry on a multimode plate reader (Synergy MX, Biotek, Winooski, VT, USA) using the methods of Jones et al. [44], Miranda et al. [45], and Mulvaney et al. [46], respectively. Total major and minor elements were quantified by inductively coupled plasma-mass spectroscopy (ICP-MS; 7500cx, Agilent Technologies, CA, USA) following HClO 4 /HNO 3 digestion in open digestion tubes in a heated block [43]. The summary of the different properties of the 85 bio-based fertilizer samples is shown in Figure 1. The summary shows that heavy metals and trace elements concentration is for elements Ni, Zi, Mn, Se, Pb, Mo. Cr, Cu, Cd, and As is very low (less than 1000 mg kg −1 ) in the selected samples. The lower concentration of these elements might make it difficult to measure through NIR and MIR as suggested by Wu et al. [30]. , and total elements derived through chemical analysis from bio-solid (red-color circles), composts (orange-color circles), manure (green-color circles) and plants residues (blue-color circles). y-axis shows the concentration of each element with all units in mg kg −1 except N which is in mg g −1 and the number of each element is shown on the x-axis. (a-x) Different properties of the 85 bio-based fertilizer samples.

Sample Characterization
The 25 properties of interest presented in this paper belong to three broad categories.
The total data set (n = 85) was divided into a training set (80% i.e., n = 67) and test set (20% i.e., n = 18) according to Table 1 to ensure homogeneity over the train and test sets. From the chemical distribution shown in Figure 1, it can be seen that the nutrient content between the sample groups has an obvious gradient distribution, which can well represent bio-based fertilizers in practical applications. The distribution of the concentrations of the elements is skewed, and the high variation of the concentration of elements in the data set enabled us to analyze why the predictions of some samples are better than others and how the concentration of an element changes the prediction capabilities of the model.

Model Development
Partial least squares regression (PLS) is one of the most widely used multivariate prediction methods in chemometric analysis. PLS projects spectral data into latent variables that explain the variances within the spectral data. Given a spectral matrix X and the corresponding truth matrix Y, PLS is used to find the scores (T and U) with loading (P and Q) and error matrix (F) from the decomposition of X and Y as given in Equations (1) and (2): While the original space relation is: where matrix B is the regression coefficient and E is the residuals matrix. After the selection of characteristics wavelengths, the partial least squares regression model was established. The model was created using the optimal number of latent variables (lvs). The calibration set data was used to find the optimal number of latent variables, and the model obtained was used to predict the prediction set data.

Data Prepossessing
Pre-processing of NIR and MIR spectra is considered an important part of any quantitative or qualitative analysis [47,48]. Performing spectroscopy in the laboratory or in the field is often influenced by noise. This noise can reduce the signal-to-noise ratio (SNR) of the spectral information and, therefore, negatively affect a calibration model's accuracy. Other challenges associated with NIR/MIR spectra include complex backgrounds and baselines, which introduce unwanted variations in the spectra and make calibration of the model complicated [49]. To deal with these problems the spectra are often pre-processed before any analysis. In the present study, the data set was mean-centered and pre-processed for baseline offset followed by a second-order polynomial de-trending algorithm. No further preprocessing was performed as this might have a negative impact on the prediction performance [50,51].

Optimal Wavelength Selection
In NIR and MIR spectroscopy, it is a challenge to identify upfront or prior to the wavelength bands, which will contain most of the information about the response variable [52]. Therefore all wavelengths are measured in the full NIR and MIR range. Sensing the whole range of wavelengths automatically also leads to the inclusion of irrelevant or less informative wavelengths. This inclusion of irrelevant or less informative signals has a negative impact on the prediction ability of the model and also might make the model unnecessarily complex [42,53,54]. In addition, the inclusion of this irrelevant information makes model interpretation difficult. Therefore, it is challenging to determine in advance which wavelengths or combinations are responsible for estimating the property of interest [52]. Hence measurement of the full NIR and MIR range, and then the identification and selection of a combination of wavelengths that contain information about response variables (nutrient content) are expected to improve prediction performance [55]. The wavelength selection can also help interpret fingerprint regions across the NIR and MIR spectral data which correspond to each response variable.
A simple method proposed by Frenich et al. [56] based on the PLS regression coefficients (B) is used in this paper for the selection of characteristic wavelengths. The method proposes that the value of B can be used as a measure of importance for an individual wavelength in the prediction of the response variable. This is similar to the interpretation of parameters in linear regression. A high absolute value of B indicates that the corresponding wavelength λ i is more important and has a high correlation with the response variable and vice versa [42].
The method of wavelength selection using B is implemented in three steps. First, the PLS model is fitted and optimized in the entire spectrum to find the optimum latent variables (lvs). The latent variables are optimized by observing the mean square error (MSE) as shown in Figure 2. The optimum number of lvs are the ones where the MSE of cross-validation is minimum.   In the second step, the wavelengths (for each spectra) are sorted using the indices as corresponding to the sorted absolute value of the PLS regression coefficient.
In the third step, wavelengths that had a low B value and low correlation with the response variable are discarded using Algorithm 1. The algorithm iterate and discard one wavelength at a time (the one with the lowest absolute value of the associated regression coefficient) and rebuild the calibration model and evaluate the mean square error (MSE) of the cross-validation set. At some point, removing wavelengths will increase the MSE, and that is the stopping criterion for the optimization algorithm. The remaining wavelengths were selected and it was expected that they will improve the prediction performance of the model. The wavelength selected for the nitrogen content in NIR spectra is shown in Figure 5. initialize i = 1 9: Discard one wavelength at time λ(i) 10: Fit PLS on remaining wavelengths 11: Find MSE of CV 12: if MSE(i) ≤ MSE(i − 1) then 13 Print all the discarded wavelengths 19: Print all the remaining wavelengths 20: Selected wavelengths = remaining wavelengths Note: The remaining number of wavelengths must be greater than or equal to optimize the number of lvs.

Model Averaging
In the model averaging method, the results of the NIR and MIR spectral analysis are combined to improve the prediction results as proposed by Granger et al. [41] and shown in Figure 6. The proposed method uses ordinary least squares regression to utilize covariance structure in the prediction errors, where the weighting attributed to the prediction result of each sensor does not necessarily sum to one [36]. The wavelength selection algorithm is applied individually on NIR and MIR to select the characteristics wavelengths and then the prediction results from each sensor are combined using Equation (3). To the predictions obtained from each individual sensor, weights are assigned according to their performances in the training set. The results from NIR spectra get a higher weight if it has a lower RMSE value compared to MIR spectra and vice versa as shown in Equation (3).
where Y i contains the observed vector of the response variable (element of interest), W o is the intercept, Y N IR and Y MIR are the individual prediction results of the NIR and MIR spectral models, and W 1 and W 2 are the weights assigned to the NIR and MIR predictions, respectively. Ordinary least squares (OLS) regression is used to find the values of W o , W 1 , and W 2 . For model development, the prediction results of training and test data sets from NIR and MIR were concatenated, resulting in a two-column feature matrix.

Model Assessment Criteria
For model assessment, the performance parameters, root mean square error (RMSE), correlation coefficient R 2 , and the ratio of performance deviation (RPD) were used. R 2 shows the goodness of fit between the predicted value and the experimental value. As proposed by Saeys et al. [57], a value for R 2 (0.66∼0.80) indicates approximate quantitative predictions, whereas a value for R 2 (0.81∼0.90) reveals good prediction. Calibration models having R 2 > 0.90 are considered to be excellent. RPD is defined as the standard deviation of the predicted value divided by the RMSE, which is a measure of the effectiveness and overall predictability of the regression model. According to Saeys et al. [57] and Zornoza et al. [58] RPD < 2 is considered insufficient for applications, whereas a value for RPD between 2 and 2.5 makes approximate quantitative predictions possible. For values between 2.5 and 3 predictions can be classified as good, and an RPD > 3 indicates an excellent prediction. RMSE is used to measure the deviation between the predicted value and the experimental value. The smaller the value of RMSE indicates a smaller deviation between the predicted value and the experimental value. The calculation of these parameters are as follows: Here y i and y i are the predicted and actual values of the response variables, y is the mean value of the actual value of the response variable, and STD(Y i ) is the standard deviation of the actual response variables.

Near-Infrared (NIR) and Mid-Infrared (MIR) Predictions
The prediction results before and after characteristic wavelength selection for each sensor (NIR and MIR) are presented in Table 2. The prediction results based on the full spectrum from both NIR and MIR for N, NH 4 -N, Al, P and EC were better, However, for the metal and mineral contents, the results were not satisfactory in the current study. It can be observed that the results of wavelength selection outperformed the results based on the full spectrum for all elements. The essential plant nutrients (N, P, and plant-available forms of nitrogen) are predicted relatively better than the rest of the elements. Nitrogen has the highest (R 2 = 0.94) followed by aluminum (R 2 = 0.92), phosphorous, and ammonium ion, while K was predicted more poorly. The prediction results of N, FAA-N, NO 3 , pH, Cr, Cu, Se, Ca, Mn, and P were better in the NIR range, while the predictions of NH 4 -N, Ec, As, Cd, Zn, Al, Fe, K, Mg, Na, and S was better in the MIR range in both cases (with and without wavelength selection). For Co, Mo, Ni, and Pb, the results of the prediction of NIR and MIR were comparable, though MIR results were slightly better than NIR. The ranking in Table 2 was established for each sensor by observing the RMSE, R 2 , and RPD values for each sensor. The sensor having the lowest RMSE, and highest R 2 and RPD values are preferred for the prediction of a particular nutrient. The table shows the ability of individual sensors and the ranking of each sensor in predicting the nutrient contents. For Na, Zn, Ni, Mo, Cr, Co, Cd, As and Mn, the prediction of NIR and MIR did not reach an acceptable range, i.e., (R 2 < 0.7) even with wavelength selection. This is due to the fact that these elements are featureless in NIR and MIR range [30]. They are mostly indirectly predicted using NIR and MIR spectroscopy. In terms of predictive performance with R 2 > 0.7, 8 elements were predicted to an acceptable range using NIR spectral method, while 9 elements reached to an acceptable range using MIR spectral method. The results in Table 2 suggests that MIR performed better for Al, and Fe. and that is why the prediction performances of metal content is better in MIR range [36].
If the result with R 2 > 0.7 is an acceptable prediction for a particular response variable, then a total of 17 out of 25 elements were sufficiently predicted with wavelength selection, as shown in Table 2. Improvement in prediction results is expected by combining the results from both NIR and MIR using model averaging as proposed by Rourke et al. [36] and are presented in the next section. Table 2. The goodness of fit for essential nutrients for plants (N. P, K and plant-available form of nitrogen) and total elements derived from near-infrared (NIR) and mid-infrared (MIR) with and without wavelength selection. All units are g kg −1 , except Cd, As, Cr, Se, Mo, and Ni in mg kg −1 .

Prediction of Elements Using Model Averaging NIR and MIR Results
The combined results from NIR and MIR prediction using model averaging are shown in Table 3. The wavelength selection algorithm is applied individually to spectral data of NIR and MIR and then the prediction results are combined using Equation (3). The percent improvement in prediction from both NIR and MIR sensors indicates that model averaging is a good technique for combining the prediction results. The percent improvement in Table 3 shows that model averaging improved the prediction of Zn, Al, Cr, Cd, Ca, and Fe substantially in terms of RMSE, R 2 , and RPD from both NIR and MIR individual results. A positive improvement in prediction results was observed for all properties compared to the results obtained from individual sensor predictions.
For elements Pb, K, Cu, Cr, Mn, As, Cd, and Co, 0.75 ≤ R 2 < 0.81 was observed using model averaging. The prediction result for elements Cr, Co, Cd, As and Mn using model averaging reached to an acceptable range (R 2 > 0.7). Major and trace elements (Ni, Zn, Mo, and Na) were difficult to predict using individual senor results and model averaging couldn't improve their prediction to acceptable range. The unreliable predictions of Ni, Zn, Mo, and Na present a barrier for the combine use of NIR and MIR for the quantification of composition of bio-based fertilizers. If the result with R 2 > 0.7 is considered an acceptable prediction as proposed by Saeys et al. [57], then a total of 21 out of 25 elements are predicted with wavelength selection and model average, as shown in Table 3. overall, the reasonable to good prediction of most nutrients, trace elements and metal contents in the current study using model averaging of NIR and MIR results suggests that measurement of full suite composition of bio-based fertilizers might be possible if other sensors are combined with NIR and MIR.

Discussion
The potential of NIR and MIR spectroscopy was investigated both in full range, as well as selected wavelengths, the range for estimation of bio-based fertilizers composition. The results based on the full range of NIR and MIR spectrum were encouraging for some essential nutrients (N, NH4, Al, and P) but could not produce promising results for other elements. The prediction results for the full range of NIR and MIR spectrum, suggest that a total of 13 properties were predicted to an acceptable range (R 2 > 0.70) [57]. The poor results might be due to the irrelevant information included in the spectral range [42,53,59] which makes the calibration model complex. Therefore, the wavelength selection technique for each response variable resulted in improved prediction from both NIR and MIR full range. The improvement in the prediction performance can be viewed in terms of RMSE, R 2 , and RPD as shown in Table 1. Prediction results from selected wavelength enabled the prediction of 17 elements out of 25 to an acceptable range in the current study. The prediction results and the corresponding ranking in Table 2 suggest that NIR can produce better results for certain elements while MIR can be useful for others. By combining both NIR and MIR using model averaging outperformed both the individual results as shown in Table 3.
The model averaging of results obtained from individual sensors improved the prediction for each response variable. Maximum improvement in terms of RMSE, R 2 and RPD are observed for Cd, Co, Cr, and Mn which were not predicted to an acceptable range according to the criteria of R 2 > 0.7 by individual sensors. For Al, Ca, Fe, Mg, S, NH 4 -N substantial improvements were observed. The prediction results for both individual sensors and the model averaging for Ni, Zn, Mo, and Na did not reach an acceptable range (R 2 > 0.70), although, substantially improved from individual sensor predictions [57]. The lower prediction of metal content was expected as they are spectral inactive in NIR and MIR range. Their predictions are only possible by linking them with other properties which show more features in NIR and MIR range [59]. As proposed by Wang et al. [30] NI, Zn, Mo, and other metals contents can indirectly be predicted only if their concentration is not less than 1000 mg kg −1 . Thus the lower concentration of these elements in the current study might be another reason for their poor predictions.
The set of sample in the current study is diverse and contain four different sources, the correlation between metal content and the spectrally active element might be different in each source. This can affect the indirect prediction of some of the metal elements and result in their poor prediction. In order to overcome the problem associated with NIR and MIR alternate sensors can be investigated in future studies [36]. Alternative sensors namely, X-ray fluorescence (XRF) and Fourier transform infrared photoacoustic spectroscopy (FTIR-PAS) sensors might be more effective in predicting these properties [60,61].
Overall, model averaging improved the prediction of all the elements of interest in the current study. The results shown in Table 3, demonstrate that 21 out of 25 properties were predicted using the model averaging strategy. This improvement for the detection of nutrients and other elements can be compared with the results obtained for NIR and MIR in the literature. Huang et al. [25] evaluated different nutrients and elements (N, Fe, Mg, Ca) in manure using NIR and, in comparison, the model averaging perform better in terms of R 2 in the current study, despite the fact that the samples selected in the current study contain four variants (manure, bio-solids, plant residues, and composts).

Conclusions
In this study, a wide range of nutrients, their plant-available forms, minerals, and heavy metal contents are quantified using NIR and MIR spectroscopy in a diverse set of bio-based fertilizers. A wavelength selection technique is applied for the selection of characteristic wavelengths, and then the Individual prediction capabilities of NIR and MIR are investigated for quantification of nutrient contents. A model averaging technique that combines model outcomes derived from NIR and MIR was then used which resulted in an improved prediction performance predicting 21 out of 25 nutrients and other properties. The most notable improvement in prediction was obtained for Cd, Cr, Zn, Al, Ca, Fe, S, Cu, Ec, and Na. However, for Ni, Zn, Mo, and Na, the obtained prediction results from model averaging could not reach the acceptable range (R 2 > 0.70); although it improved substantially from individual sensors predictions. Therefore, combining the results from NIR and MIR spectral methods using model averaging is well placed to replace traditional wet chemical analysis methods for the analysis of bio-based fertilizers composition. In the future, we plan to investigate the potential of NIR and MIR with other sensors (XRF, and FTIR-PAS) to provide more comprehensive coverage of bio-based fertilizer composition.