Next Article in Journal
A Sound Source Identification Algorithm Based on Bayesian Compressive Sensing and Equivalent Source Method
Previous Article in Journal
Multi-under-Actuated Unmanned Surface Vessel Coordinated Path Tracking
Previous Article in Special Issue
A Non-Invasive Method Based on Computer Vision for Grapevine Cluster Compactness Assessment Using a Mobile Sensing Platform under Field Conditions
Open AccessArticle

Predicting Forage Quality of Warm-Season Legumes by Near Infrared Spectroscopy Coupled with Machine Learning Techniques

1
Department of Plant and Soil Sciences, Oklahoma State University, 371 Agricultural Hall, Stillwater, OK 74078, USA
2
Department of Computer Science, Oklahoma State University, 219 MSCS, Stillwater, OK 74078, USA
3
USDA-ARS, Southeast Area Branch, 114 Experiment Station Road, Stoneville, MS 38776, USA
4
USDA-ARS, Grazinglands Research Laboratory, 7207 W. Cheyenne St., El Reno, OK 73036, USA
*
Author to whom correspondence should be addressed.
Sensors 2020, 20(3), 867; https://doi.org/10.3390/s20030867
Received: 20 December 2019 / Revised: 3 February 2020 / Accepted: 4 February 2020 / Published: 6 February 2020
(This article belongs to the Special Issue Emerging Sensor Technology in Agriculture)

Abstract

Warm-season legumes have been receiving increased attention as forage resources in the southern United States and other countries. However, the near infrared spectroscopy (NIRS) technique has not been widely explored for predicting the forage quality of many of these legumes. The objective of this research was to assess the performance of NIRS in predicting the forage quality parameters of five warm-season legumes—guar (Cyamopsis tetragonoloba), tepary bean (Phaseolus acutifolius), pigeon pea (Cajanus cajan), soybean (Glycine max), and mothbean (Vigna aconitifolia)—using three machine learning techniques: partial least square (PLS), support vector machine (SVM), and Gaussian processes (GP). Additionally, the efficacy of global models in predicting forage quality was investigated. A set of 70 forage samples was used to develop species-based models for concentrations of crude protein (CP), acid detergent fiber (ADF), neutral detergent fiber (NDF), and in vitro true digestibility (IVTD) of guar and tepary bean forages, and CP and IVTD in pigeon pea and soybean. All species-based models were tested through 10-fold cross-validations, followed by external validations using 20 samples of each species. The global models for CP and IVTD of warm-season legumes were developed using a set of 150 random samples, including 30 samples for each of the five species. The global models were tested through 10-fold cross-validation, and external validation using five individual sets of 20 samples each for different legume species. Among techniques, PLS consistently performed best at calibrating (R2c = 0.94–0.98) all forage quality parameters in both species-based and global models. The SVM provided the most accurate predictions for guar and soybean crops, and global models, and both SVM and PLS performed better for tepary bean and pigeon pea forages. The global modeling approach that developed a single model for all five crops yielded sufficient accuracy (R2cv/R2v = 0.92–0.99) in predicting CP of the different legumes. However, the accuracy of predictions of in vitro true digestibility (IVTD) for the different legumes was variable (R2cv/R2v = 0.42–0.98). Machine learning algorithms like SVM could help develop robust NIRS-based models for predicting forage quality with a relatively small number of samples, and thus needs further attention in different NIRS based applications.
Keywords: partial least square; support vector machine; Gaussian processes; soybean; pigeon pea; guar; tepary bean partial least square; support vector machine; Gaussian processes; soybean; pigeon pea; guar; tepary bean

1. Introduction

Perennial warm-season grasses, such as bermudagrass (Cynodon dactylon), old world bluestems (Bothriochloa spp.), and bahiagrass (Paspalum notatum), serve as major summer forage resources for grazing stocker cattle in the southern United States (US). While capable of producing large amounts of biomass, these perennial grasses often show a decline in forage quality with their maturation towards the mid-late summer growing season and do not meet the nutritional needs of grazing stocker cattle for the entire season [1,2]. Legumes, being high-quality forages, can be adopted to offset the summer slump in forage quality, and enhance the efficiency of forage-based beef production systems. Further, the continued increase in the cost of nitrogen fertilizers has added to the interest of producers in utilizing legume crops as forage in many regions across the US. In response, extensive research in the southern US over the last decade has focused on evaluating warm-season annual legumes as summer forage resources that can be grown in rotation with winter-wheat (Triticum aestivum L.) [3,4,5,6]. In more recent years, several legumes have received increased attention due to their capabilities of generating high amounts of biomass under the limited moisture conditions that prevail in the southern US [7].
Quantifying the quality of forage in pastures is crucial for both agriculture research and forage management, including cattle grazing and harvesting. However, the determination of the different parameters of forage quality, such as crude protein (CP), neutral detergent fiber (NDF), acid detergent fiber (ADF), and in vitro true digestibility (IVTD), by classical analytical techniques is time-consuming and expensive, especially when numerous samples are required. The vast evolution of computers and multivariate statistical techniques has enabled the use of near infrared spectroscopy (NIRS) in assessing the quality parameters of many forages. The NIRS method is quick, inexpensive, and facilitates timely decision-making related to grazing periods. The technique is based on interactions between light reflectance in the wavelength ranging between 750–2500 nm and organic compounds in the plant biomass [8]. The method of applying NIRS to predict forage quality involves analyzing a particular forage with both traditional lab analysis and NIRS, and then developing a predictive equation by pairing the information in a calibration dataset (Figure 1). The NIRS has been widely used in forage quality predictions of crops including alfalfa (Medicago sativa) [9], maize (Zea mays) [10], ryegrass (Lolium multiflorum) [11], tall fescue (Festuca arundinacea) [12], and other species. However, the technique has been underutilized to provide predictions of forage quality for many warm-season legumes.
As developed for other forage crops, well-calibrated NIRS species-based models for warm-season legumes could be useful tools to quickly asses the forage quality of different legume species grown under a range of environmental or management settings, or harvested at different stages of growth, or cutting or grazing height. Therefore, it is necessary to examine the effectiveness of NIRS in predicting forage characteristics of some important warm-season legumes. This work includes species such as guar (Cyamopsis tetragonoloba), tepary bean (Phaseolus acutifolius), soybean (Glycine max), and pigeon pea (Cajanus cajan), given past research, and the potential for expansion of use of these species across the southern US and other similar environments. There are also other warm-season legumes that may be capable of providing high-quality forage for summer grazing [3,13,14]. However, developing NIR calibration equations for every species can become challenging for public or private laboratories that test forage quality. Generally, accurate chemical analyses of a large number of samples is not readily available or feasible to develop calibrations, especially when novel legume species are involved. In response to challenges related to developing species-based calibrations, global models developed from samples of ranges of different warm-season legumes can prove useful, if such calibrations provide sufficiently accurate predictions.
Several calibration techniques are known to perform well in the application of NIRS in estimating forage quality and are generally available in most chemometric packages [15]. Partial least squares (PLS) is among the most commonly used methods, where least square algorithms are used to compute regressions [16]. In contrast, a comparatively novel and robust machine learning algorithm, support vector machine (SVM), has been gaining attention for NIRS calibrations [15]. Further, the Gaussian processes (GP) have provided better calibration results than PLS and SVM, in some cases [17,18]. However, tests of these calibration techniques on wide ranges of common and more novel legumes are required to define their function.
The combination of NIRS and machine learning calibration techniques could serve as an effective tool to streamline the monitoring efforts in warm-season legumes by eliminating the need for classical forage analytical methods. Therefore, the objective of this research was to evaluate the performance of NIRS in predicting the forage quality of four warm-season legumes (guar, tepary bean, pigeon pea, and soybean), using three different calibration techniques—PLS, SVM, and GP—on individual species bases. Additionally, the efficacy of global calibrations of these techniques, developed by combining datasets of all four species and mothbean (Vigna aconitifolia), was tested using different independent datasets of five species.

2. Materials and Methods

2.1. Materials

Forage samples used in the study (n = 410) were collected as parts of two different field experiments conducted at the USDA-ARS Grazinglands Research Laboratory near El Reno, Oklahoma, US (35.57° N, 98.03° W, elevation 414 m). Ninety samples each for guar and tepary bean, and 50 mothbean samples were collected from field experiments conducted during the summers of 2017 and 2018. An additional 90 samples of both soybean and pigeon pea were obtained from two long-term experiments (2001–2008) conducted in the same location [3,19]. In all three experiments, aboveground biomass was collected from randomly clipped 0.5 m row lengths from experimental plots at 15-day intervals, starting at 45 days after planting. Apart from whole plant samples, a major proportion of collected biomass samples in these experiments were separated into leaf, stem, and pods fractions before laboratory analysis.

2.2. Laboratory and NIRS Analysis

All leaf, stem, pod, and whole plant samples were oven-dried at 60 °C until a constant weight. Dry samples were ground to pass a 2-mm filter using a Wiley grinding mill. Total nitrogen concentration in each sample was determined by the flash combustion method (Model Vario Macro, Elementar Americas, Inc., Mt. Laurel, NJ, USA) and then converted into CP by multiplying with a factor of 6.25 (Table 1). The IVTD was obtained for each sample by following the Daisy Digester procedures (ANKOM Technology, Macedon, NY, USA). The NDF and ADF concentrations were only determined in samples of guar and tepary bean, in accordance with the batch fiber analyzer techniques (ANKOM Technology, Macedon, NY, USA).
Aliquots of ground samples were filled into ring cups to eliminate voids. Spectral reflectance (R) of monochromatic light, averaged over 10 spectra per sample, were collected by scanning spectrophotometer (Model SpectraStar 2600 XT-R, Unity Scientific, Columbia, MD, USA). Spectral data were obtained as the logarithm of the inverse of reflectance [log(1/R)] at 1-nm interval over the range of 680–2600 nm.

2.3. Calibration Techniques

Partial least squares (PLS) is an extensively used class of statistical methods, which includes regression, classification, and dimension reduction techniques. It uses latent variables, also called score vectors, to model the relationship between input and response variables. In the case of regression problems, PLS first generates the latent variables from the given data and uses them as new predictor variables. There are different types of PLS, based on techniques employed to extract the latent variables. Two approaches are used to extend PLS for modeling non-linear relations among data. The first approach is to reformulate the linear relationship between score vectors, u and v , by a non-linear model:
v = g ( u ) + h = g ( X , w ) + h
where g is the continuous function that models the existing non-linear relation. Generally, g is modeled using artificial neural networks, smoothing splines, polynomial, or radial basis functions. Remaining variables h and w denote a residual vector and a weight vector, respectively.
The second PLS approach is to apply kernel-based learning. The kernel PLS method transforms the input space data to higher dimensional feature space and linearly estimates PLS in that space. To avoid the mapping function Φ from projecting data to feature space, PLS applies the kernel trick which uses the fact that a value of the inner product of two vectors x and y in feature space can be calculated using a kernel function k ( x , y ) [20]:
k ( x , y ) = Φ ( x ) T Φ ( y )
By using the kernel function, score vectors ( u and v ) can be identified and used to define the non-linear relationship. The kernel PLS approach is used to model complex non-linear relations easily in terms of implementation and computation.
Gaussian processes (GP) are kernel-based, probabilistic, non-parametric regression models. A Gaussian process involves a set of random variables such that every finite number of those variables possess joint Gaussian distributions. A Gaussian process, f ( x ) , can be described using a mean function m ( x ) and a covariance function k ( x , x ) . The covariance function defines the smoothness of responses, and the basis function Φ projects the input space vector x to a higher dimension feature space vector Φ ( x ) . A Gaussian process regression (GPR) model describes the response by using latent variables from a Gaussian process. A GPR model is represented as:
Φ ( x i ) T w + f ( x )
where f ( x )   ~   G P ( 0 , k ( x , x ) ) , and f ( x ) are from a zero mean GP having a covariance function, k ( x , x ) [21]. The covariance is specified by kernel parameters, which are also known as hyperparameters. GPR is a probabilistic model, and an instance of response y is:
P ( y i | f ( x i ) , x i )   ~   N ( y i | Φ ( x i ) T w + f ( x i ) , σ 2 )
GPR is non-parametric as there is a latent variable f ( x i ) for each observation x i . Noise variance σ 2 , basis function coefficients w , and hyperparameters of the kernel can be estimated from the data while training the GPR model.
Support vector machine (SVM) is a popular machine-learning algorithm used for identifying linear as well as non-linear dependency between input vectors and outputs. SVMs are non-parametric models, which means parameters are selected, estimated, and tuned in such a way that the model capacity matches the data complexity [21]. Generally, SVM starts by observing the multivariate inputs X and outputs Y , estimates its parameters w , and then learns the performed mapping function y = f ( x , w ) , which approximates the underlying dependency between inputs and responses. The obtained function, also known as a hyperplane, must have a maximal margin (for classification) or the error of approximation (for regression) to predict the new data. In the case of SVM regression, Vapnik’s error (loss) function is used with ε-insensitivity. It finds a regression function f ( x ) that deviates from the actual responses (y) by values no more than ε and is considerably flat at the same time.
For non-linear regression problems, SVM maps the input space to feature space (a higher dimension space) using a mapping Φ ( x ) to find a linear regression hyperplane in that space. However, there is no need to know the mapping Φ , as the kernel function k ( x i , x j ) , which is the inner product of the vectors Φ ( x i ) and Φ ( x j ) , can be used to find the optimal regression hyperplane in extended space. There are many kernel functions available to describe non-linear regressions, such as the polynomial kernel, RBF kernel, Gaussian Kernel, normalized polynomial kernel, etc. The learning problem in classification as well as in regression, leads to solving the quadratic programming (QP). The sequential minimal optimization (SMO) is considered as the most popular optimizer for solving SVM problems [22]. It divides the large QP problem into a set of small QP problems and analytically solves them.

2.4. Performance Evaluation

Apart from calibration, 10-fold cross-validations and external validations were conducted to assess the performance of the calibration techniques. The 10-fold cross-validation is a unique statistical way of performance evaluations of machine learning models in which ten repeated hold-out executions are obtained and averaged. In each execution, the model is trained with 90% of the data points and tested with the remaining 10%, and thus every data point is taken nine times for training and once for testing the model. For each species-based model, the original dataset of 90 samples for each species was split into two subsets (Figure 2). A subset of 70 samples was used for running calibration and 10-fold cross-validation. The other subset of 20 remaining samples was used only for external validation and neither used in calibration nor cross-validation of any model. For the global model, the original dataset consisted of 250 samples, involving 50 samples each of guar, tepary bean, soybean, pigeon pea, and mothbean. These samples were divided into six subsets (Figure 2). One random subset of 150 samples (30 samples per species) was employed for calibration as well as 10-fold cross-validation. Each of the remaining five subsets, comprising 20 samples of individual species, was used for external validation.
Coefficients of determination, being upper-bounded by 1.0, are often adopted for meaningful comparisons across different models and therefore was used here as an estimate of prediction accuracy. To be precise, coefficient of determination in calibration (R2c), coefficient of determination in cross-validation (R2cv), and coefficient of determination in validation (R2v) were used for direct computation of the variance in the data captured at calibration, cross-validation, and external validation, respectively by each model. Additionally, root mean squared error estimation was also presented for comparing models, which were termed as RMSEc, RMSEcv, and RMSEv for calibration, cross-validation, and external validation, respectively.

2.5. Software

Regression models were calibrated, cross-validated, and externally validated using the Weka software, version 3.8 [23]. Weka is a suite of machine learning algorithms and is widely used for data mining. For implementing PLS, we used the PLS classifier package in Weka, which uses the prediction capabilities of PLSFilter. The PLSFilter runs the PLS regression on the given set of data and computes the beta matrix for prediction. By default, missing values are replaced, and the data are centered. For GP implementation, the Gaussian classifier for regression without hyperparameter-tuning was used. The kernel for the Gaussian classifier was configured as a polynomial. By default, missing values were replaced by the global mean. The SMOReg classifier was used to implement SVM in Weka. The classifier used the polynomial kernel and RegSMOImproved optimizer to learn SVM for regression. All remaining parameters, such as batch size, debugging, and filter type, which do not check capabilities, noise, etc., were kept as default.

3. Results and Discussion

The prediction accuracy of calibrated models is discussed by comparing their cross-validation (R2cv) and external validation (R2v) results to a scale proposed for NIRS calibrations [24]. According to the scale, the performance of a model is considered excellent if the R2 of validations is greater than 0.95, and the resultant model can be used in any application. A model is assumed satisfactory with R2 ranging from 0.9–0.95 and would be usable for most applications involving quality assurance. Models with R2 ranging between 0.8–0.9 are considered moderately successful and can be used with caution for most applications, including research.

3.1. Guar

The chemical analysis of guar samples showed wide variability in parameters that define forage quality for different components (leaf, stem, or pod) of plant sampled at different growth stages (Table 1). The CP content for all 90 (70 + 20) guar samples ranged from 3.7% to 34.9%, while NDF concentrations ranged from 16.8% to 75.8%, ADF concentrations ranged from 8.9% to 62.9%, and IVTD from 40.3% to 95.2%.
Among the three techniques, the PLS technique performed best at calibrating each of the four forage quality parameters in guar with R2c of 0.98–0.99, though calibration results of SVM (R2c = 0.94–0.98) were also comparable (Table 2). While GP had a comparatively lower calibration accuracy with R2c ranging between 0.88–0.91 for IVTD, NDF and ADF, and R2c of 0.95 for CP of guar samples. Although PLS provided best calibrations out of the three, SVM gave better prediction accuracy in both cross-validation and external validation of all four indices of forage quality for guar. Thee GP approach generated the lowest R2cv for all four parameters and R2v for NDF and ADF.
Among forage quality parameters, the greatest prediction accuracy was recorded for CP by all three techniques with R2cv of 0.93–0.97 and R2v of 0.93–0.98 (Table 2). In comparison, only the SVM technique resulted in a satisfactory prediction accuracy (R2cv = 0.92; R2v = 0.94) for NDF, based on the proposed scale [24]. Both the SVM and PLS techniques showed excellent accuracy at predicting ADF with R2cv and R2v between 0.94–0.96, while GP produced R2cv of 0.86. All three techniques resulted in relatively low prediction accuracy for IVTD, with R2cv ranging from 0.81–0.83. Overall, performances of SVM was most satisfactory among the three calibration methods, and it can be employed in NIRS-based prediction of CP, ADF, and NDF of guar. In contrast, use of IVTD predictions of guar would require caution, based on the type of application.
While currently a minor crop in the southern US, guar has a proven potential to serve as a multi-purpose legume and has potential for expansion in use. Guar is a common crop in regions of the Indian subcontinent, Africa, North and South America, and Australia [25]. Guar has been gaining attention as a forage resource in the southern US due to its capability of producing high N biomass under limited water conditions [3,5]. Therefore, this first report investigating the application of NIRS in guar would encourage the utilization of the technique in its research and forage management.

3.2. Tepary Bean

Results from the laboratory analysis of tepary bean samples showed high variability in all four of the quality indices, though the observed ranges were narrower than guar (Table 1). The concentration of CP varied from 4.5–31.1%, while NDF ranged from 22.9% to 71.6%. In contrast, ADF and IVTD ranged between 15.3–59.2% and 55.9–93.2%, respectively. Best calibration results for tepary bean were recorded using the PLS technique, with R2c of 0.98–0.99 (Table 3). Whereas, neither SVM nor PLS clearly resulted in better predictions for all quality indices when cross-validated and externally validated.
All calibration techniques showed best results at predicting CP in tepary bean samples with a R2cv or R2v above 0.90 among the forage quality characteristics (Table 3). The SVM technique resulted in the lowest RMSEcv value (1.74) for cross-validation of CP, whereas PLS had the lowest RMSEv of 1.35 for external validation among the three techniques. In contrast, PLS showed the lowest RMSEcv values of 5.09 and 3.97 and SVM had the lowest RMSEv of 4.03 and 2.23 for NDF and ADF, respectively. Both PLS and SVM produced satisfactory results at predicting ADF concentration in tepary bean with R2cv of 0.86–0.89 and R2v of 0.92–0.95 compared to GP, while all three techniques had comparatively low performance at predicting NDF in tepary bean with R2cv and R2v of 0.72–0.84 and 0.75–0.84, respectively.
In comparison to ADF, the NDF concentration of tepary samples were less accurately predicted by all three techniques (Table 3). Similar differences between prediction accuracy of ADF and NDF were also noticed for guar in this study, and also reported earlier in NIRS studies involving Brassica napus [26], Lolium multiflorum [11], and Oryza sativa [27]. Though PLS performed better at predicting IVTD in tepary bean compared to other two, all three techniques resulted in relatively low prediction accuracy with R2cv and R2v ranging between 0.75–0.79 and 0.75–0.88, respectively. Overall, both PLS and SVM could be considered as good among three tested techniques and hence can be employed for satisfactory predictions of CP and ADF in tepary bean. While prediction results of NDF and IVTD would need some caution if calibrations are developed with similar sample sizes (n = 70) as used in this study.
Tepary bean is a vining, warm-season legume species originated from the areas of the southwestern United States and northwestern Mexico, that may have value for multiple uses in dryland agricultural systems. Due to its spreading growth habit, and the ability to generate high N biomass with limited soil moisture, tepary bean could be an ideal summer forage for the Southern Great Plains [14]. This first study investigating the application of NIRS to attributes of forage quality in tepary bean showed that the technique could aid in quantifying its role in meeting animal nutrition needs.

3.3. Soybean

All three techniques (PLS, SVM, and GP) gave excellent accuracies at calibrating CP and IVTD of soybean samples with PLS again performing the best out of three with a R2c greater than 0.98 (Table 4). Among three techniques, SVM performed best at predicting CP with RMSEcv and RMSEv of 1.85 and 1.78, respectively, followed closely by PLS. All three calibration techniques produced better predictions of IVTD in soybean (R2cv > 0.84 and R2v > 0.89), compared to prediction accuracies obtained for guar and tepary bean. As observed for CP, SVM performed better than the other techniques in cross-validation (R2cv = 0.89) of IVTD, while the other two techniques performed better in external validation (R2v of 0.92–0.93). All three techniques can be employed for rapid NIR-based predictions of CP and IVTD in soybean forage samples, with SVM would be the best choice.
Soybean was initially introduced as a forage into the US in the 19th Century, but is now one of the most widely grown grain legumes in the Southern Great Plains [14]. In the last two decades, there has been increased interest from researchers in utilizing soybean as a summer forage in the US [28,29,30]. Hence the need for rapid and low-cost techniques for estimating forage quality. The NIRS technique has not been exploited for forage quality predictions in soybean. A single report investigated modified PLS and multiple scatter correction methods for NIR predictions of CP, NDF, and ADF concentrations, using 353 soybean samples collected at one (R6) growth stage [31]. In comparison, calibrations developed in the present study, used data on IVTD and CP with just 70 soybean samples collected across a range of different growth stages. Thus, our observed ranges for CP (4.1–39.7) and IVTD (42.4–99.3%) were more diverse (Table 1). The accuracies (R2cv or R2v > 0.92) obtained in predicting CP in soybean forage by all three techniques were higher than the values reported [31], despite large differences in sample sizes (N = 70 vs. 353) used for developing calibrations. Therefore, this study showed machine learning algorithms could develop robust NIRS calibrations for precise analysis of forage quality of soybean with small sample sizes.

3.4. Pigeon Pea

Laboratory analyses for the current study showed wide variability in both CP (4.5–32.5%) and IVTD (30.7–91.1%) for forage samples of pigeon pea (Table 1). The CP concentration in pigeon pea was accurately calibrated (R2c > 0.95) by each of the three techniques (Table 5). All three techniques resulted in CP predictions with R2cv and R2v greater than 0.96. Both PLS and SVM also showed greater accuracies in predicting IVTD of pigeon pea with R2cv and R2v ranges of 0.91–0.92 and 0.96–0.97, respectively. Although lower than PLS and SVM, the performance (R2cv = 0.86) of GP-based calibrations were moderately satisfactory in IVTD predictions, following the proposed scale [24]. Overall, both PLS and SVM would provide excellent options for NIR predictions of CP and IVTD in pigeon pea.
Pigeon pea is another legume species that has seen the development of a range of cultivars for different uses in its home range, and areas of greater cultivation. This includes research on the value of cultivars of pigeon pea in the US for forage, grain, and pasture productivity [4,32]. Pigeon pea has a high degree of heat and drought tolerance, and the capacity for high levels of forage production in the US and other tropical and sub-tropical regions.
While pigeon pea is a broadly grown crop in much of the world, there was only one preliminary report that discussed the possible use of NIRS techniques to predict forage quality of pigeon pea [33]. That report used limited numbers of samples (n = 48), involving leaves and branches, that were mostly collected at one growth stage for calibrations of CP, NDF, and ADF concentrations; however, no validations were performed [33]. In contrast, the present study undertook both calibrations and validations using 90 (70 + 20) pigeon pea samples, involving leaves, stems, or seed pods, collected at different growth stages during a long-term experiment. Further, we investigated the NIR-based predictions of IVTD, which is assumed as an important quality parameter in pigeon pea forage [4]. Therefore, this study confirms that NIRS techniques could be effective tools for predicting forage quality of pigeon pea.

3.5. Global Calibrations

Global calibrations for CP and IVTD of warm-season legumes were developed with 150 samples, which included 30 samples each of guar, tepary bean, soybean, pigeon pea, and mothbean (Figure 2). As observed with the species-based calibrations for the four different legumes, the PLS technique performed best out of the three techniques for global calibrations both CP and IVTD (R2c of 0.97 and 0.94, respectively), while the GP technique was the least accurate (Table 6). In comparison, cross-validation of global models showed the SVM approach provided the greatest prediction accuracy for both CP (R2cv = 0.94) and IVTD (R2cv = 0.86), followed closely by PLS. Therefore, based on cross-validation results, the performance of global calibrations developed using SVM and PLS were satisfactory at predicting CP, and moderately satisfactory for IVTD.
When global calibrated models were validated using different external datasets for each of the five legume species, the predictions for CP by all three techniques resulted in sufficient accuracies with R2v ranging between 0.91–0.97 (Table 6). The SVM technique showed higher accuracy compared to the others in predicting CP, with the exception of guar, where the PLS approach provided slight improvements. Among species, the best CP predictions were noted for pigeon pea (R2v values of 0.98–0.99) for all three techniques. In contrast, IVTD predictions were not consistently accurate across all five species. The greatest accuracy was observed for IVTD predictions in pigeon pea with R2v of 0.97–0.98 under SVM and PLS. The lowest accuracy in predicting IVTD was noted for mothbean (R2v between 0.65–0.69 by SVM and PLS; 0.42 by GP). The best prediction accuracies for IVTD of soybean (R2v = 0.82–0.86 for all three techniques) and guar (PLS; R2v = 0.81) were moderately satisfactory. However, the performance of all three techniques was satisfactory at predicting IVTD in tepary bean (R2v of 0.91–92), which was better than the specific models developed for tepary bean (Table 3).
Overall, global-calibrated models for CP have the potential to offer sufficient prediction accuracies that are comparable to species-based calibration models. Diverse calibration sets that contain different legume species may allow the creation of robust, generalized models that provide predictions similar to species-based models. In some cases, global models may be capable of providing more accurate predictions, as was observed for IVTD predictions of tepary bean in this study.
The application of accurate globally calibrated models would be extremely useful for a broad range of end-users. They would reduce or eliminate the large amounts of time and other resources required to perform chemical analyses or the development and use of separate calibration sets for every species. However, adopting the global calibration approach for IVTD may not provide satisfactory predictions for all species. Some of the issues related to the low level of performance of calibrations for IVTD may be variability associated with using techniques that rely on rumen fluids in laboratory analyses [34]. Therefore, further investigations are required to compare the performance of global calibrations developed for IVTD of warm-season legumes derived using both rumen fluid and cellulose degradation methods.

4. Conclusions

The statistics obtained for calibration, cross-validation, and external validation in this study demonstrated that NIRS techniques could be effective for supplying rapid and accurate predictions of most attributes of forage quality (cell wall fractions, crude protein) for different warm-season legumes. Further, the applications of NIRS technique to guar, tepary bean, and mothbean represent the first reports of such tools to provide estimates of forage quality for these species. Though similar to PLS, the SVM technique performed consistently well in predicting quality parameters of five warm-season legumes under both species-based and global calibration strategies. The global calibration approach can be a useful approach for predicting CP in warm-season legumes, and reduce the time and resources required for traditional chemical analysis in the use of separate calibration equations for each species. However, the global model for IVTD was not accurate for all species. Further model development based on other analytical procedures may improve the consistency and reliability of the global approach. Machine learning algorithms like SVM could also allow the development of robust models with a relatively small number of samples. Additional research is required to refine the SVM approach for different NIRS applications.

Author Contributions

Conceptualization, G.S.B., H.K.B. and P.H.G.; methodology, G.S.B., H.K.B., P.H.G. and J.P.T.; formal analysis, G.S.B. and H.K.B.; data curation, G.S.B., B.K.N. and S.C.R.; writing—original draft preparation, G.S.B. and H.K.B.; writing—review and editing, B.K.N., P.H.G., H.S. and J.P.T.; visualization, G.S.B. and H.K.B.; supervision, P.H.G. and J.P.T.; project administration, P.H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially funded by the Cooperative Agreement with USDA-ARS Grazinglands Research Laboratory and Oklahoma Agriculture Experiment Station Hatch project OKL03132.

Acknowledgments

The authors would like to acknowledge ARS technicians Cindy Coy, Delmar Shantz, Kory Bollinger, and Jeff Weik for their assistance in collecting, processing, and analyzing forage samples.

Conflicts of Interest

The authors declare no conflict of interest.

Disclaimer

Mention of trademarks, proprietary products, or vendors does not constitute guarantee or warranty of products by USDA and does not imply its approval to the exclusion of other products that may be suitable. All programs and services of the USDA are offered on a nondiscriminatory basis, without regard to race, color, national origin, religion, sex, age, marital status, or handicap.

Abbreviations

PLS, partial least square; SVM, support vector machine; GP, Gaussian processes; CP, crude protein; ADF, acid detergent fiber; NDF, neutral detergent fiber; IVTD, in vitro true digestibility; R2c, determination coefficient in calibration; R2cv, determination coefficient in cross-validation; R2v, determination coefficient in external validation; RMSEc, root mean square error in calibration; RMSEcv, root mean square error in cross-validation; RMSEv, root mean square error in external validation.

References

  1. Phillips, W.; Coleman, S. Productivity and economic return of three warm season grass stocker systems for the Southern Great Plains. J. Prod. Agric. 1995, 8, 334–339. [Google Scholar] [CrossRef]
  2. Williams, M.; HaLmmond, A. Rotational vs. continuous intensive stocking management of bahiagrass pasture for cows and calves. Agron. J. 1999, 91, 11–16. [Google Scholar] [CrossRef]
  3. Rao, S.C.; Northup, B.K. Capabilities of four novel warm-season legumes in the southern Great Plains: Biomass and forage quality. Crop Sci. 2009, 49, 1096–1102. [Google Scholar] [CrossRef]
  4. Rao, S.C.; Northup, B.K. Pigeon pea potential for summer grazing in the southern Great Plains. Agron. J. 2012, 104, 199–203. [Google Scholar] [CrossRef]
  5. Rao, S.C.; Northup, B.K. Biomass production and quality of indian-origin forage guar in Southern Great Plains. Agron. J. 2013, 105, 945–950. [Google Scholar] [CrossRef]
  6. Foster, J.; Adesogan, A.; Carter, J.; Sollenberger, L.; Blount, A.; Myer, R.; Phatak, S.; Maddox, M. Annual legumes for forage systems in the United States Gulf Coast region. Agron. J. 2009, 101, 415–421. [Google Scholar] [CrossRef]
  7. Baath, G.S.; Northup, B.K.; Gowda, P.H.; Turner, K.E.; Rocateli, A.C. Mothbean: A potential summer crop for the Southern Great Plains. Am. J. Plant Sci. 2018, 9, 1391. [Google Scholar] [CrossRef]
  8. Rushing, J.B.; Saha, U.K.; Lemus, R.; Sonon, L.; Baldwin, B.S. Analysis of some important forage quality attributes of Southeastern Wildrye (Elymus glabriflorus) using near-infrared reflectance spectroscopy. Am. J. Anal. Chem. 2016, 7, 642. [Google Scholar] [CrossRef]
  9. Brogna, N.; Pacchioli, M.T.; Immovilli, A.; Ruozzi, F.; Ward, R.; Formigoni, A. The use of near-infrared reflectance spectroscopy (NIRS) in the prediction of chemical composition and in vitro neutral detergent fiber (NDF) digestibility of Italian alfalfa hay. Ital. J. Anim. Sci. 2009, 8, 271–273. [Google Scholar] [CrossRef]
  10. Volkers, K.; Wachendorf, M.; Loges, R.; Jovanovic, N.; Taube, F. Prediction of the quality of forage maize by near-infrared reflectance spectroscopy. Anim. Feed Sci. Technol. 2003, 109, 183–194. [Google Scholar] [CrossRef]
  11. Yang, Z.; Nie, G.; Pan, L.; Zhang, Y.; Huang, L.; Ma, X.; Zhang, X. Development and validation of near-infrared spectroscopy for the prediction of forage quality parameters in Lolium multiflorum. PeerJ 2017, 5, e3867. [Google Scholar] [CrossRef] [PubMed]
  12. Hill, N.; Cabrera, M.; Agee, C. Morphological and climatological predictors of forage quality in tall fescue. Crop Sci. 1995, 35, 541–549. [Google Scholar] [CrossRef]
  13. Muir, J.P.; Pitman, W.D.; Dubeux Jr, J.C.; Foster, J.L. The future of warm-season, tropical and subtropical forage legumes in sustainable pastures and rangelands. Afr. J. Range Forage Sci. 2014, 31, 187–198. [Google Scholar] [CrossRef]
  14. Baath, G.S.; Northup, B.K.; Rocateli, A.C.; Gowda, P.H.; Neel, J.P. Forage potential of summer annual grain legumes in the southern great plains. Agron. J. 2018, 110, 2198–2210. [Google Scholar] [CrossRef]
  15. Agelet, L.E.; Hurburgh, C.R., Jr. A tutorial on near infrared spectroscopy and its calibration. Crit. Rev. Anal. Chem. 2010, 40, 246–260. [Google Scholar] [CrossRef]
  16. Roggo, Y.; Chalus, P.; Maurer, L.; Lema-Martinez, C.; Edmond, A.; Jent, N. A review of near infrared spectroscopy and chemometrics in pharmaceutical technologies. J. Pharm. Biomed. Anal. 2007, 44, 683–700. [Google Scholar] [CrossRef]
  17. Wang, K.; Chi, G.; Lau, R.; Chen, T. Multivariate calibration of near infrared spectroscopy in the presence of light scattering effect: A comparative study. Anal. Lett. 2011, 44, 824–836. [Google Scholar] [CrossRef]
  18. Cui, C.; Fearn, T. Comparison of partial least squares regression, least squares support vector machines, and Gaussian process regression for a near infrared calibration. J. Near Infrared Spectrosc. 2017, 25, 5–14. [Google Scholar] [CrossRef]
  19. Rao, S.; Mayeux, H.; Northup, B. Performance of forage soybean in the southern Great Plains. Crop Sci. 2005, 45, 1973–1977. [Google Scholar] [CrossRef]
  20. Rosipal, R.; Kramer, N. Subspace, latent structure and feature selection techniques. Lect. Notes Comput. Sci. Chap. Overv. Recent Adv. Part. Least Sq. 2006, 2940, 34–51. [Google Scholar]
  21. Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; Volume 2. [Google Scholar]
  22. Huang, C.-L.; Wang, C.-J. A GA-based feature selection and parameters optimizationfor support vector machines. Expert Syst. Appl. 2006, 31, 231–240. [Google Scholar] [CrossRef]
  23. Platt, J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 1999, 10, 61–74. [Google Scholar]
  24. Frank, E.; Hall, M.A.; Witten, I.H. The WEKA Workbench; Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4th ed.; Morgan Kaufmann: Cambridge, MA, USA, 2016. [Google Scholar]
  25. Malley, D.; Martin, P.; Ben-Dor, E. Application in analysis of soils. In Near-Infrared Spectroscopy in Agriculture, 1st ed.; Roberts, C.A., Workman, J., Jr., Reeves, J.B., III, Eds.; American Society of Agronomy; Crop Science Society of America; Soil Science Society of America: Madison, WI, USA, 2004; pp. 729–783. [Google Scholar]
  26. Baath, G.S.; Kakani, V.G.; Gowda, P.H.; Rocateli, A.C.; Northup, B.K.; Singh, H.; Katta, J.R. Guar responses to temperature: Estimation of cardinal temperatures and photosynthetic parameters. Ind. Crop. Prod. 2019. [Google Scholar] [CrossRef]
  27. Wittkop, B.; Snowdon, R.J.; Friedt, W. New NIRS calibrations for fiber fractions reveal broad genetic variation in Brassica napus seed quality. J. Agric. Food Chem. 2012, 60, 2248–2256. [Google Scholar] [CrossRef]
  28. Kong, X.; Xie, J.; Wu, X.; Huang, Y.; Bao, J. Rapid prediction of acid detergent fiber, neutral detergent fiber, and acid detergent lignin of rice materials by near-infrared spectroscopy. J. Agric. Food Chem. 2005, 53, 2843–2848. [Google Scholar] [CrossRef]
  29. Nielsen, D.C. Forage soybean yield and quality response to water use. Field Crop. Res. 2011, 124, 400–407. [Google Scholar] [CrossRef]
  30. Beck, P.; Hubbell, D., III; Hess, T.; Wilson, K.; Williamson, J.A. Effect of a forage-type soybean cover crop on wheat forage production and animal performance in a continuous wheat pasture system. Prof. Anim. Sci. 2017, 33, 659–667. [Google Scholar] [CrossRef]
  31. Asekova, S.; Han, S.-I.; Choi, H.-J.; Park, S.-J.; Shin, D.-H.; Kwon, C.-H.; Shannon, J.G.; LEE, J.D. Determination of forage quality by near-infrared reflectance spectroscopy in soybean. Turk. J. Agric. For. 2016, 40, 45–52. [Google Scholar] [CrossRef]
  32. Rao, S.; Coleman, S.; Mayeux, H. Forage production and nutritive value of selected pigeonpea ecotypes in the southern Great Plains. Crop Sci. 2002, 42, 1259–1263. [Google Scholar] [CrossRef]
  33. Berardo, N.; Dzowela, B.; Hove, L.; Odoardi, M. Near infrared calibration of chemical constituents of Cajanus cajan (pigeon pea) used as forage. Anim. Feed Sci. Technol. 1997, 69, 201–206. [Google Scholar] [CrossRef]
  34. Roberts, C.A.; Stuth, J.; Flinn, P. Analysis of forages and feedstuffs. In Near-Infrared Spectroscopy in Agriculture, 1st ed.; Roberts, C.A., Workman, J., Jr., Reeves, J.B., III, Eds.; American Society of Agronomy; Crop Science Society of America; Soil Science Society of America: Madison, WI, USA, 2004; pp. 231–267. [Google Scholar]
Figure 1. Illustration of the procedure used for applying near infrared spectroscopy (NIRS) technique in forage quality predictions.
Figure 1. Illustration of the procedure used for applying near infrared spectroscopy (NIRS) technique in forage quality predictions.
Sensors 20 00867 g001
Figure 2. Diagram of the datasets, calibration, and different validation processes used in two calibration strategies.
Figure 2. Diagram of the datasets, calibration, and different validation processes used in two calibration strategies.
Sensors 20 00867 g002
Table 1. Summary statistics of lab datasets used for calibration, cross-validation and external validation of crude protein (CP), neutral detergent fiber (NDF), acid detergent fiber (ADF), and in vitro true digestibility (IVTD) of four warm season legumes.
Table 1. Summary statistics of lab datasets used for calibration, cross-validation and external validation of crude protein (CP), neutral detergent fiber (NDF), acid detergent fiber (ADF), and in vitro true digestibility (IVTD) of four warm season legumes.
SpeciesParameterCalibration and Cross-Validation (n = 70)External Validation (n = 20)
MinMaxMeanSDMinMaxMeanSD
------------------------- (%) -------------------------
GuarCP3.9434.8717.668.663.6933.5615.079.50
NDF16.8370.8037.5716.9522.9575.7845.9417.82
ADF8.9058.3927.1915.2912.7962.9334.7016.57
IVTD40.3595.2279.2714.1142.9694.3773.3816.08
Tepary beanCP4.5031.1215.767.785.9430.2519.358.13
NDF22.9071.5748.3412.3125.5260.9543.9010.21
ADF15.3259.1634.9211.9917.0848.1630.3610.39
IVTD55.8893.1675.5010.8360.2392.5681.348.56
SoybeanCP4.1539.7521.1611.036.3136.1219.738.94
IVTD42.4599.3078.2516.2857.6698.3180.2112.38
Pigeon peaCP4.5232.4816.308.776.2428.6415.627.41
IVTD30.7191.0861.5519.2833.3182.8959.7616.40
n, number of samples; Min, minimum value; Max, maximum value; SD, standard deviation.
Table 2. Calibration, cross-validation, and external validation statistics obtained for crude protein (CP), neutral detergent fiber (NDF), acid detergent fiber (ADF), and in vitro true digestibility (IVTD) in guar using three calibration techniques.
Table 2. Calibration, cross-validation, and external validation statistics obtained for crude protein (CP), neutral detergent fiber (NDF), acid detergent fiber (ADF), and in vitro true digestibility (IVTD) in guar using three calibration techniques.
ParameterMethodCalibration (n = 70)Cross-Validation (n = 70)External Validation (n = 20)
R2cRMSEcR2cvRMSEcvR2vRMSEv
CPGP0.951.840.932.200.962.12
PLS0.990.780.951.970.932.52
SVM0.981.230.971.560.981.27
NDFGP0.905.530.846.730.906.98
PLS0.982.170.856.660.935.52
SVM0.943.980.915.080.944.67
ADFGP0.914.790.865.770.926.02
PLS0.991.180.953.360.944.23
SVM0.972.460.953.510.963.78
IVTDGP0.884.920.816.100.935.63
PLS0.982.150.816.690.875.66
SVM0.943.510.835.880.944.19
GP, Gaussian processes; PLS; partial least square; SVM, support vector machine; R2c, determination coefficient in calibration; RMSEc, root mean square error in calibration; R2cv, determination coefficient in cross-validation; RMSEcv, root mean square error in cross-validation; R2v, determination coefficient in external validation; RMSEv, root mean square error in external validation.
Table 3. Calibration, cross-validation, and external validation statistics obtained for crude protein (CP), neutral detergent fiber (NDF), acid detergent fiber (ADF), and in vitro true digestibility (IVTD) in tepary bean using three calibration techniques.
Table 3. Calibration, cross-validation, and external validation statistics obtained for crude protein (CP), neutral detergent fiber (NDF), acid detergent fiber (ADF), and in vitro true digestibility (IVTD) in tepary bean using three calibration techniques.
ParameterMethodCalibration (n = 70)Cross-Validation (n = 70)External Validation (n = 20)
R2cRMSEcR2cvRMSEcvR2vRMSEv
CPGP0.941.890.902.420.942.20
PLS0.990.680.932.030.981.35
SVM0.971.350.951.740.941.94
NDFGP0.854.960.756.220.755.10
PLS0.981.640.845.090.755.53
SVM0.942.970.727.010.844.03
ADFGP0.874.600.785.620.863.90
PLS0.981.470.893.970.923.34
SVM0.962.450.864.520.952.23
IVTDGP0.874.020.755.390.754.25
PLS0.981.550.795.000.882.89
SVM0.932.860.755.700.823.82
GP, Gaussian processes; PLS; partial least square; SVM, support vector machine; R2c, determination coefficient in calibration; RMSEc, root mean square error in calibration; R2cv, determination coefficient in cross-validation; RMSEcv, root mean square error in cross-validation; R2v, determination coefficient in external validation; RMSEv, root mean square error in external validation.
Table 4. Calibration, cross-validation, and external validation statistics obtained for crude protein (CP) and in vitro true digestibility (IVTD) in soybean using three calibration techniques.
Table 4. Calibration, cross-validation, and external validation statistics obtained for crude protein (CP) and in vitro true digestibility (IVTD) in soybean using three calibration techniques.
ParameterMethodCalibration (n = 70)Cross-Validation (n = 70)External Validation (n = 20)
R2cRMSEcR2cvRMSEcvR2vRMSEv
CPGP0.924.630.875.780.923.78
PLS0.982.160.846.920.933.46
SVM0.943.920.895.280.894.09
IVTDGP0.962.140.942.710.922.53
PLS0.990.800.962.050.942.24
SVM0.991.260.971.850.961.78
GP, Gaussian processes; PLS; partial least square; SVM, support vector machine; R2c, determination coefficient in calibration; RMSEc, root mean square error in calibration; R2cv, determination coefficient in cross-validation; RMSEcv, root mean square error in cross-validation; R2v, determination coefficient in external validation; RMSEv, root mean square error in external validation.
Table 5. Calibration, cross-validation, and external validation statistics obtained for crude protein (CP) and in vitro true digestibility (IVTD) in pigeon pea using three calibration techniques.
Table 5. Calibration, cross-validation, and external validation statistics obtained for crude protein (CP) and in vitro true digestibility (IVTD) in pigeon pea using three calibration techniques.
ParameterMethodCalibration (n = 70)Cross-Validation (n = 70)External Validation (n = 20)
R2cRMSEcR2cvRMSEcvR2vRMSEv
CPGP0.981.370.961.730.961.69
PLS1.000.430.971.460.981.02
SVM0.990.840.981.170.981.12
IVTDGP0.954.510.867.180.972.95
PLS0.991.930.925.490.963.09
SVM0.973.310.915.860.972.85
GP, Gaussian processes; PLS; partial least square; SVM, support vector machine; R2c, determination coefficient in calibration; RMSEc, root mean square error in calibration; R2cv, determination coefficient in cross-validation; RMSEcv, root mean square error in cross-validation; R2v, determination coefficient in external validation; RMSEv, root mean square error in external validation.
Table 6. Calibration, cross-validation, and external (species) validation statistics of global models obtained for crude protein (CP) and in vitro true digestibility (IVTD) in warm-season legumes using three calibration techniques.
Table 6. Calibration, cross-validation, and external (species) validation statistics of global models obtained for crude protein (CP) and in vitro true digestibility (IVTD) in warm-season legumes using three calibration techniques.
MethodCalibration (n = 150)Cross-Validation (n = 150)External Validation (n = 20)
GuarTepary BeanSoybeanPigeon PeaMothbean
R2cRMSEcR2cvRMSEcvR2vRMSEvR2vRMSEvR2vRMSEvR2vRMSEvR2vRMSEv
CPGP0.922.150.892.460.932.420.952.720.913.950.982.210.943.36
PLS0.971.150.922.020.942.360.942.490.942.470.982.030.943.10
SVM0.961.480.941.870.922.770.952.360.943.160.991.290.972.54
IVTDGP0.865.090.815.840.656.160.914.410.827.930.914.750.425.40
PLS0.943.280.855.280.815.000.905.530.885.190.982.210.694.50
SVM0.913.980.864.980.775.120.924.740.865.600.972.770.654.29
GP, Gaussian processes; PLS; partial least square; SVM, support vector machine; R2c, determination coefficient in calibration; RMSEc, root mean square error in calibration; R2cv, determination coefficient in cross-validation; RMSEcv, root mean square error in cross-validation; R2v, determination coefficient in external validation; RMSEv, root mean square error in external validation.
Back to TopTop