E ﬀ ect of Sample Preparation Methods on the Prediction Performances of Near Infrared Reﬂectance Spectroscopy for Quality Traits of Fresh Yam ( Dioscorea spp.)

: High throughput techniques for phenotyping quality traits in root and tuber crops are useful in breeding programs where thousands of genotypes are screened at the early stages. This study assessed the e ﬀ ects of sample preparation on the prediction accuracies of dry matter, protein, and starch content in fresh yam using Near-Infrared Reﬂectance Spectroscopy (NIRS). Fresh tubers of Dioscorea rotundata (D. rotundata ) and Dioscorea alata (D. alata ) were prepared using di ﬀ erent sampling techniques—blending, chopping, and grating. Spectra of each sample and reference data were used to develop calibration models using Modiﬁed Partial Least Square (MPLS). The performance of the model developed from the blended yam samples was tested using a new set of yam samples ( N = 50) by comparing their wet laboratory results with the predicted values from NIRS. Blended samples had the highest coe ﬃ cient of prediction (R 2 pre ) for dry matter (0.95) and starch (0.83), though very low for protein (0.26), while grated samples had the lowest R 2 pre of 0.87 for dry matter and 0.50 for starch. Results showed that blended samples gave a better prediction compared with other methods. The feasibility of NIRS for the prediction of dry matter and starch content in fresh yam was highlighted.


Introduction
Yam is a starch-rich staple food commonly grown in Africa, the Caribbean, the South Pacific, the Americas, and Asia. In West Africa, yam contributes over 200 dietary calories/day for more than 300 million people [1]. Statistics show that 97% of the 48.7 million tons of yam produced worldwide in 47 countries originate from sub-Saharan Africa, and Nigeria accounts for 68% of global production [2]. Also, yam has a better nutritional value than cassava. The vitamin C content ranges from 40 to 120 mg/g/edible portion; crude protein ranges from 4% to 14%, and starch from 70% to 80% on a dry weight basis [3]. It is an excellent source of carbohydrates, vitamins, protein, and essential amino acids such as arginine, leucine, isoleucine, and valine. It is a good source of complex carbohydrates that help to reduce the constant increase in blood sugar levels; hence it is categorised as a low glycemic index 2 of 12 food. Fresh yam tubers also contain approximately 29% vitamin C, according to the USDA National Nutrient Database, and help to boost the immune system [4].
Yam is a key staple crop for smallholders in Africa. It is consumed in various forms depending on the processing approaches-boiling, drying, frying, milling, pounding, roasting, and steaming [5]. Furthermore, yam serves as a valuable source of farm employment and income in Africa, as well as being culturally relevant and involved in society in many life events. In Nigeria and neighbouring West African countries, yam plays an essential role during marriage and fertility ceremonies [6][7][8][9].
Quality components of yam cultivars are essential for their acceptability for cultivation and consumption. Traits such as starch and sugar content, the colour of tuber flesh, and oxidation are routinely measured by breeding programs because they influence the acceptability of new varieties [10,11]. The visual selection has been used by breeding programs in root and tuber crops to distinguish varieties; however, this approach to quality trait selection is quite challenging [12]. Conventional methods of assessing yam tuber quality traits are time-consuming, cost-intensive, and involve the use of hazardous chemicals. The conventional methods for quality assessment of yam include oven drying the fresh samples for 72 h and then ground dried sample to a fine flour (0.05 mm particle size) before chemical analysis, which will, in turn, take some time. However, analysing the samples freshly using NIRS eliminates the time required for drying and grinding in the conventional method. Therefore, techniques to generate data quickly on tuber quality assessment will be of great significance in yam improvement programs. Near-infrared spectroscopy (NIRS) is among the high throughput techniques for the speedy diagnosis of quality traits of yam tubers. The principle of NIRS is based on the interaction of different chemical groups such as C-H, O-H, and N-H in a sample with electromagnetic radiation at a specific wavelength. Near-Infrared spectra comprise broad bands of overlapping absorptions and combinations of vibrational modes involving chemical bonds within the wavelength range of 780 to 2500 nm [13]. Models of NIRS have been developed for the rapid characterisation of essential quality traits in various Dioscorea spp. [12,14,15] and have been successfully applied in many crops, including potato, cassava, and sweet potato [16][17][18][19]. Quantification of dry matter and protein in yam bean using NIRS was reported with an excellent prediction performance of 0.94 and a SEP of 1.2 [20]. This technology could be a useful tool in the breeding of improved yam varieties where thousands of accessions are maintained. Traits such as the physicochemical characteristics of the tubers are vital because they allow the identification of parents as well as the screening of generated progenies with suitable tuber quality.
Due to its rapid, cost-effective, and non-destructive features, the NIRS application will be beneficial for detecting tuber quality attributes in the yam crop. Sample preparation is a prerequisite for the prediction of quality traits by NIRS and sometimes is a big challenge when large populations of breeding materials are to be evaluated. The effects of sample preparations on the prediction performances of NIRS have been reported on other materials such as livestock slurry and forages [21][22][23]. Few, if any, research studies have examined the effects of different sampling techniques on the application of NIRS for nutritional traits in yam. The current study has evaluated different methods of sample preparation for the feasibility of using NIRS as an alternative to wet chemical analysis for the prediction of quality parameters in fresh yam tubers.

Sample, Sampling, and Sample Preparation
Two hundred clones representing accessions of D. rotundata and D. alata, grown at two locations (Ibadan and Ubiaja), were used for this study. The samples, 200 of each variety, were split randomly into two sets: calibration and validation. The calibration set consisted of 150 clones, while the cross-validation set contained 50 clones. Each clone was represented by three tubers (big, medium, and small sizes). The sampling protocol reported by Alamu et al. [14] was adopted for sample preparation and homogenisation. The sampled tubers were washed, air-dried, and peeled. Each peeled root was washed, dried with soft tissue, and cut into four portions longitudinally from the proximal to the distal end. Two opposite parts from each tuber were cut into smaller pieces, then pooled and mixed manually.
Samples were prepared using three different methods: chopping, grating, and blending ( Figure 1).

a.
A portion of the cut pieces was subsampled as chopped, further reduced using a knife, and packed into an adequately labelled sample bag. b.
The grated sample was prepared by using a 2 mm size grater to reduce the yam to shreds that were then packed into a labelled sample bag. c.
The blended sample was made by first decreasing the sizes of the cut pieces, then blending them using an electric blender.
Two hundred clones representing accessions of D. rotundata and D. alata, grown at two locations (Ibadan and Ubiaja), were used for this study. The samples, 200 of each variety, were split randomly into two sets: calibration and validation. The calibration set consisted of 150 clones, while the crossvalidation set contained 50 clones. Each clone was represented by three tubers (big, medium, and small sizes). The sampling protocol reported by Alamu et al. [14] was adopted for sample preparation and homogenisation. The sampled tubers were washed, air-dried, and peeled. Each peeled root was washed, dried with soft tissue, and cut into four portions longitudinally from the proximal to the distal end. Two opposite parts from each tuber were cut into smaller pieces, then pooled and mixed manually.
Samples were prepared using three different methods: chopping, grating, and blending ( Figure  1). a. A portion of the cut pieces was subsampled as chopped, further reduced using a knife, and packed into an adequately labelled sample bag. b. The grated sample was prepared by using a 2 mm size grater to reduce the yam to shreds that were then packed into a labelled sample bag. c. The blended sample was made by first decreasing the sizes of the cut pieces, then blending them using an electric blender. Subsequently, subsamples were taken from the prepared samples for wet chemical analysis for dry matter, starch, and protein. These were all carried out to generate reference data to be used in the development of the calibration. Subsequently, subsamples were taken from the prepared samples for wet chemical analysis for dry matter, starch, and protein. These were all carried out to generate reference data to be used in the development of the calibration.

Collection of Spectral Data
Samples were prepared and homogenised, thoroughly mixed, and loaded into the sample compartment of the Benchtop NIRS equipment (FOSS XDS, Rapid Content Analyzer) with the use of the sample quartz cell. Spectral data for all samples were collected in duplicate by measuring the diffuse reflectance from the sample in the NIR region within 400-2498 nm using a NIRS monochromator and a continuous moving cell. The reflectance spectra were collected continuously over a NIR wavelength region with each spectrum represented as absorbance value logs (1/R) at 0.5 nm increments (see Plate 1). The collection of spectra was done with the ISI Scan software, and an average of 60 scans/samples were taken within 60 s. Spectral data were exported to Win ISI software (version 4.9.0) for pre-treatment and subsequent development of the calibration model.

Spectral Data Pre-Treatment
There are many uncontrollable physical variations, such as the non-homogeneous distribution of the particles, changes in the refractive index, and particle size distribution, that make samples exhibit differences in spectral information during measurement. These also cause light scattering effects and then result in additive or multiplicative effects that are not related to the chemical response and lead to inaccurate results. Thus, to address these effects, the spectra for each sample set were subjected to pre-treatment to minimise background noise ( Figure 2). The pre-treatment used standard variate normalisation (SNV), the first derivation was calculated using a Savitzky-Golay filter with a first-order polynomial and a 15-point window smoothing (order 1, windows 15 points), followed by linear detrending as reported by Soldado et al. [24].
diffuse reflectance from the sample in the NIR region within 400-2498 nm using a NIRS monochromator and a continuous moving cell. The reflectance spectra were collected continuously over a NIR wavelength region with each spectrum represented as absorbance value logs (1/R) at 0.5 nm increments (see Plate 1). The collection of spectra was done with the ISI Scan software, and an average of 60 scans/samples were taken within 60 s. Spectral data were exported to Win ISI software (version 4.9.0) for pre-treatment and subsequent development of the calibration model.

Spectral Data Pre-Treatment
There are many uncontrollable physical variations, such as the non-homogeneous distribution of the particles, changes in the refractive index, and particle size distribution, that make samples exhibit differences in spectral information during measurement. These also cause light scattering effects and then result in additive or multiplicative effects that are not related to the chemical response and lead to inaccurate results. Thus, to address these effects, the spectra for each sample set were subjected to pre-treatment to minimise background noise (Figure 2). The pre-treatment used standard variate normalisation (SNV), the first derivation was calculated using a Savitzky-Golay filter with a first-order polynomial and a 15-point window smoothing (order 1, windows 15 points), followed by linear detrending as reported by Soldado et al. [24].

Calibration Model Development
The selected 200 samples were used to construct the near-infrared spectra models with the Win ISI 4 Project Manager, and the modified partial least squares (MPLS) regression and cross-validation techniques were used to calculate the correlation between spectral data and laboratory reference values. The performance of the spectrophotometer and the wavelength stability was checked before the collection of spectra using system diagnostics performance testing tools. MPLS calibration was 3

Calibration Model Development
The selected 200 samples were used to construct the near-infrared spectra models with the Win ISI 4 Project Manager, and the modified partial least squares (MPLS) regression and cross-validation techniques were used to calculate the correlation between spectral data and laboratory reference values. The performance of the spectrophotometer and the wavelength stability was checked before the collection of spectra using system diagnostics performance testing tools. MPLS calibration was performed on the calibration data set (N = 100) for each sample presentation, i.e., chopped, blended, and grated, using the Win ISI software within the wavelength of 400-2498 nm.
Cross-validation was conducted on 50 samples randomly selected, which were not part of the calibration set, and were used to optimise the PLS equations for each trait and to choose the number of latent variables that lead to the lower standard validation error (SECV). SECV is considered the best statistical tool to assess the robustness of a calibration model. The t-outliers were set at "t > 2.0" and GH-outliers, which indicates the Mahalanobis distance limit was set at "G > 4.0". There were two passes for outlier elimination, and samples with t > 2.0 were dropped from the calibration file. The statistical parameters for the calibration models (the coefficient of determination for cross-validation (R 2 pre), the standard error of calibration (SEC), the standard error of cross-validation (SECV), and the standard deviation) are presented in Figures 2-4. The ratio of prediction to deviation (RPD), which shows the correlations between the standard deviation of reference analysis and the predicted data, is also presented.

Determination of Dry Matter
The dry matter content of fresh samples was determined using the approved method 925.09 of the Association of Official Analytical Chemists [25]. Ten grams of the fresh sample was weighed in a pre-weighed aluminium dry matter can and placed in an Air Convectional Oven (Memmert UN 55, GmbH) for 16 h at 105 • C until constant weighed was attained.

Determination of Crude Protein
Crude protein was determined using Kjeltec ™ Model 2300 following procedures described in the FOSS Manual [26], the operational guide for the equipment. Samples were digested at a temperature of 420 • C for 1 h to convert the organically bound nitrogen to ammonium sulphate. The ammonia in the digest (ammonium sulphate) was then distilled into a boric acid receiver solution. The solution was titrated with standard hydrochloric acid to obtain the total nitrogen content. The percentage of crude protein was derived using a conversion factor of 6.25.

Determination of Starch Content
A colourimetric method, earlier described by Onitilo et al. [27], was used to determine the starch content as 0.02 g of the sample was weighed into a clean centrifuge tube and 1 mL ethanol, 2 mL distilled water, and 10 mL boiling ethanol were added. The mixture was vortexed and centrifuged at 2000 rpm for 10 min. The residue was hydrolysed with perchloric acid to determine starch content, and the supernatant was used to estimate sugar. The phenol-sulphuric acid reagent was used for colour development, glucose standard was used to develop a calibration curve for quantification, and absorbance was read using a Genesys 101S UV-Vis Spectrophotometer at a wavelength of 490 nm. Table 1 presents the wet chemistry results of dry matter, protein, and starch content in fresh tuber samples of white Guinea yam (D. rotundata). There was a significant effect (p < 0.05) of method and location on the starch content of D. rotundata, but only location effect (p > 0.05) on dry matter and protein. The average content of dry matter, protein, and starch (%) from the Ibadan location were 36.5 ± 4.24, 1.58 ± 0.36, and 28.75 ± 3.96 in chopped samples; 36.60 ± 4.18, 1.62 ± 0.34, and 27.47 ± 3.84 in grated samples; and 37.14 ± 4.87, 1.60 ± 0.33, and 26.97 ± 3.88 in blended samples, respectively. For tubers collected from the Ubiaja location, the average dry matter content was found as 37.98 ± 6.06, protein 1.69 ± 0.45, and starch 30.97 ± 1.90 in chopped samples. Likewise, the average dry matter content was 38.17 ± 5.19 (grated) and 38.75 ± 5.49% (blended), protein 1.69 ± 0.53 (grated) and 1.84 ± 0.37 (blended), and starch 31.35 ± 1.57 (grated) and 31.49 ± 1.57 (blended). Table 2 describes the dry matter, protein and starch contents of D. alata. There was a significant effect (p < 0.05) of method and location on the starch content of D. alata, but only location effect (p > 0.05) on dry matter and protein. The average dry matter, protein, and starch for the Ibadan location had a range (mean ± SD) for an average dry matter of 20.61%-44.86% (29.99 ± 4.52%); for the protein of 0.68%-3.11% (1.52 ± 0.33%), and starch of 19.25%-36.93% (21.59 ± 1.68%) for chopped samples. Blended samples had an average of 21.38 ± 1.70% for starch, 1.51 ± 0.31% for protein, and 31.14 ± 5.17% for dry matter. The estimation of starch was, on average, 21.38 ± 1.70% with blended tubers and 21.69 ± 1.79% with grated yam. The protein content of samples was estimated with an average of 1.51 ± 0.31% when blended and 1.58 ± 0.27% when grated. Likewise, the dry matter (%) was estimated as 31.14 ± 5.17 in blended samples and as 30  Figures 2-4 indicate the parameters for the developed calibration models for dry matter, protein, and starch content of fresh yam using different sample presentations methods (chopping, grating, and blending).

Discussion
The sample preparation methods had non-significant differences (p < 0.05) in the predicted content of dry matter and protein in the fresh yam tubers. However, methods and locations had significant effects (p < 0.05) on the starch content for D. rotundata and D.alata (Tables 1 and 2). Therefore, any of the sampling protocols could be used for dry matter and protein determination, but analysis of starch content is method dependent. The yam tuber from Ubiaja had an average protein content (%) ranging from 1.04 to 2.79 with mean 1.84 ± 0.37 for the blended samples, and this was significantly different (p < 0.05) from results with chopped and grated samples. The average protein content from the blended yam samples in the present studies agrees with the findings of Polycarp et al. [28] where the average protein content ranged from 4% to 6% by dry weight. The starch content varied significantly among samples from the three sampling procedures (p < 0.05). The highest starch content was obtained from blended samples with an average value of 29.93 ± 3.88%, which is about 72.62% on a dry weight (dw) basis; the content in chopped samples with an average of 28.75 ± 3.96% (78.62% dw) was higher than that in grated samples with a mean value of 27.47 ± 3.84% (75.05% dw) from Ibadan (Table 1). This could be because of variations in the particle size of the samples from different sampling methods. Starch content for D. rotundata (Table 1) reported in this study is consistent with average values reported in the literature [29][30][31]. Similarly, the average starch content of chopped D.alata, 28.41 ± 2.31 from Ubiaja (Table 2), was higher when compared with values obtained for samples that were grated (27.93 ± 2.78) and blended (23.32 ± 2.32).

NIRS Calibration
A calibration model was developed for dry matter, starch, and protein with the laboratory analysis data obtained using standard wet chemistry methods and the near-infrared spectra of the samples. The coefficient of prediction (R 2 pre ) was used to evaluate the performances of the models; this indicates the percentage variation in spectra data that is accounted for by the chemical constituents of the samples and other statistical variables such as SEC, SECV, RPD, and SEP. The standard error of calibration is SEC, while SECV is the standard error of cross-validation; the ratio of performance to deviation (RPD = SD/SECV) was also used for performance evaluation. For fresh chopped samples, R 2 pre of 0.78% for starch, 0.61% for dry matter, and 0.04% for protein were obtained. Grated samples had R 2 pre of 0.15% for starch, 0.84% for dry matter, and 0.50% for protein, while samples presented as blended had an excellent prediction performance for dry matter (R 2 pre = 0.95%), and R 2 pre of 0.83% for starch, and a very poor R 2 pre of 0.26% for protein (Figures 2-4).

Prediction Performance of the Developed NIRS Calibration Models
The performance of the model developed from the blended yam samples was tested using a new set of yam samples (N = 50) by comparing their wet laboratory results with the predicted values from NIRS. Figure 5a,c describes the determination coefficient (R 2 ) for dry matter (0.95), protein (0.26), and starch (0.83) respectively. The SEP of 1.4 for dry matter (DM) with R 2 of 0.95 indicates the excellent performance of the model for dry matter prediction. R 2 of 0.83 for starch is also adequate for screening selection. However, the R 2 of 0.26 for protein might be considered too low, but the SEC (0.26), SECV (0.27), and SEP (0.36) values were closed (Figure 5b). It implies that though a model may have a poor R 2 value, with closed SEC, SECV, and SEP, the model could be used and improved. The prediction of starch content also showed that the equation is suitable for screening purposes with R 2 of 0.80 and SEP of 3.38. However, the SEC (2.2) and SECV (2.10) for starch for the blended samples are similar, indicating the robustness of the model.
In general, models with R 2 of 0.66 to 0.81 can be used for screening purposes. In contrast, models with R 2 of 0.83 to 0.90 are suitable for many applications, and R 2 of 0.92 to 0.96 is ideal for most applications, including quality control [32]. Therefore, R 2 of 0.95 for dry matter and 0.83 for starch in blended samples (as shown in Figure 5a,c) make this form of sample preparation most suitable for the rapid prediction of these traits, though not good for protein prediction. Screening stages of breeding can also adopt chopped samples for the determination of starch with R 2 cal of 0.84 and dry matter with R 2 cal of 0.64 (Figure 2), though there was a poor R 2 for protein (R 2 = 0.04). Grated samples had an equally good R 2 of 0.88 for dry matter and R 2 of 0.62 for starch content, but low R 2 of 0.43 for protein content (Figure 4). Figure 4 shows the SEC (1.94) and SECV (1.88) for the dry matter of the blended samples. Blended yam samples had a high RPD value of 2.93 for dry matter, 2.16 for starch, and a low RPD of 1.03 for protein ( Figure 3). RPD values below 1.5 are considered not usable, although those between 2.0 and 3.0 are suitable for screening purposes, and values above 2.5 and 3.0 are excellent for quantitative predictions [33,34]. In general, models with R 2 of 0.66 to 0.81 can be used for screening purposes. In contrast, models with R 2 of 0.83 to 0.90 are suitable for many applications, and R 2 of 0.92 to 0.96 is ideal for most applications, including quality control [32]. Therefore, R 2 of 0.95 for dry matter and 0.83 for starch in blended samples (as shown in Figure 5a,c) make this form of sample preparation most suitable for the rapid prediction of these traits, though not good for protein prediction. Screening stages of

Conclusions
The results of the study demonstrate the potential of NIRS for the rapid prediction of dry matter, protein, and starch content of fresh yam samples, which eliminates the tedious, costly, and time-consuming processes of wet chemical analysis. The statistics of model performances showed an excellent coefficient of prediction (R 2 pre ) and a low standard error of prediction (SECV) for dry matter and starch. The highest R 2 pre for starch and dry matter was found in blended samples, then in chopped samples. Grated samples had the lowest R 2 pre for most of the traits; this confirms that blended or chopped fresh yam tubers could result in an accurate prediction for dry matter and starch content using NIRS. However, protein had weak predictions across the three methods of sample presentation with very low R 2 pre . The successful analysis of fresh tuber crops with NIRS would significantly improve the output of the breeding program by eliminating the tedious sample and sampling preparation processes that involve long hours of oven drying and milling. The sample preparation serves as a rate-limiting step when analysing a larger sample population for the characterisation of biophysical traits.