Development and Validation of Near-Infrared Reflectance Spectroscopy Prediction Modeling for the Rapid Estimation of Biochemical Traits in Potato

Potato is a globally significant crop, crucial for food security and nutrition. Assessing vital nutritional traits is pivotal for enhancing nutritional value. However, traditional wet lab methods for the screening of large germplasms are time- and resource-intensive. To address this challenge, we used near-infrared reflectance spectroscopy (NIRS) for rapid trait estimation in diverse potato germplasms. It employs molecular absorption principles that use near-infrared sections of the electromagnetic spectrum for the precise and rapid determination of biochemical parameters and is non-destructive, enabling trait monitoring without sample compromise. We focused on modified partial least squares (MPLS)-based NIRS prediction models to assess eight key nutritional traits. Various mathematical treatments were executed by permutation and combinations for model calibration. The external validation prediction accuracy was based on the coefficient of determination (RSQexternal), the ratio of performance to deviation (RPD), and the low standard error of performance (SEP). Higher RSQexternal values of 0.937, 0.892, and 0.759 were obtained for protein, dry matter, and total phenols, respectively. Higher RPD values were found for protein (3.982), followed by dry matter (3.041) and total phenolics (2.000), which indicates the excellent predictability of the models. A paired t-test confirmed that the differences between laboratory and predicted values are non-significant. This study presents the first multi-trait NIRS prediction model for Indian potato germplasm. The developed NIRS model effectively predicted the remaining genotypes in this study, demonstrating its broad applicability. This work highlights the rapid screening potential of NIRS for potato germplasm, a valuable tool for identifying trait variations and refining breeding strategies, to ensure sustainable potato production in the face of climate change.


Introduction
Potato (Solanum tuberosum) is an essential crop that is grown worldwide, with a production of 376.2 million tons spanning over an area of more than 16 million hectares [1].Globally, India ranks second in potato production, with 54.23 million tons produced from an area of 2.24 million hectares [2].Potatoes, along with cereals, provide food security to millions of people, owing to the high amount of carbohydrates, dietary fibers, vitamins, and minerals.The biochemical makeup of potatoes, which contain essential components such as starch (including amylose), protein, vitamin C, and antioxidants (phenols and carotenoids), is strongly linked to their nutritional value.These traits contribute to aesthetic appeal and offer numerous health benefits to humans [3].
The quality evaluation of potatoes is vital for crop development, nutritional research, and breeding programs.Traditional wet lab procedures have been used for a long time; however, these approaches have several drawbacks.These methods are often time-consuming, costly in terms of equipment and reagents, prone to human error, have a limited throughput, and are laborious and limited in obtaining real-time data, hindering rapid and efficient research analysis [4].Additionally, conventional techniques usually entail destructive sampling, which makes it difficult to study the same sample again or to monitor changes over time [5].
Hence, there is an increasing need for effective, rapid, and non-destructive methods to evaluate the nutritional properties of largely consumed non-cereal crops, such as potatoes.NIRS is one such method and is a rapid and non-destructive analytical approach that has attracted significant interest in various food and feed industries.It is rapid in nature, making it possible to analyze a large number of samples in a short amount of time, with greater accuracy.In addition, NIRS model deployment requires very little sample preparation, eliminating the need for labor-and time-intensive processes.NIRS is a non-destructive technique that facilitates multiple measurements on a single material and streamlines the monitoring of changes.Moreover, it aligns with environmentally responsible practices by minimizing the need for chemical reagents and reducing sample waste, thereby highlighting its eco-friendly application [6].
NIRS involves examining matter interaction with electromagnetic radiation, typically using one or more wavelength bands within the range of 780-2500 nm.This radiation is directed at a sample, penetrating it and interacting with the molecular bonds, particularly -CH, -NH, -OH, and -CO formations, causing the absorption of light at their distinct vibration frequencies [7].The spectrum produced, as a consequence, gives insightful information about the molecular makeup of the material.In spectroscopy evaluation, a crucial approach is spectrum calibration, which is achieved through multivariate regression methods.These techniques are collectively known as chemometrics, which are essential for calibrating near-infrared (NIR) spectra to wet chemistry values [8].Due to the extensive range of organic compounds present in biomaterials, a precise calibration is essential.During the calibration process, the biochemical information present in a substance's spectrum is separated from physical or chemical data, which are obtained through reference laboratory values [9].The effectiveness of near-infrared spectroscopy (NIRS) relies on the complex connection between biochemical parameters and their corresponding absorption spectra.To establish accurate prediction models in NIRS, precise relationships must be established, which requires the pre-processing of spectral data using multivariate statistical analysis.Commonly used pre-treatment techniques include Multiplicative Scatter Correction (MSC) and Standard Normal Variate and Detrend (SNV-DT).To describe the relationship between biochemical components and spectral data, multivariate regression techniques such as partial least squares (PLS), modified PLS (MPLS), and Principal Component Regression (PCR) are employed [10].
The relevant body of work highlights the potential of NIRS for measuring the biochemical characteristics of several crops, including potatoes [11].The first report of NIRS application in relation to potatoes was the evaluation of the moisture content in potato chips [12].With advancements in NIRS techniques, researchers have investigated the Foods 2024, 13, 1655 3 of 14 use of NIRS in potatoes, to assess the starch and dry matter content [13,14], protein [15], phenol [9,16], carotenoids [3,17], fat [18], acrylamide [19,20], and moisture [21].These studies provide insight into the adaptability of NIRS and demonstrate its potential to bring about a sea of change in potato quality evaluation, to strengthen Sustainable Development Goals 2 (Zero Hunger) and 3 (Good Health and well-being).
This investigation focused on the development and validation of NIRS techniques to provide an accurate and time-saving method for estimating the biochemical characteristics of potatoes.In addition, this study aimed to illustrate the potential benefits of NIRS in potato nutritional enrichment approaches and development by contrasting estimates obtained using NIRS with those obtained using more conventional wet lab approaches.Eight important nutritional traits, including vitamin C, total phenols, total carotenoids, anthocyanin, dry matter, starch, amylose, and protein, were assessed.These models can be used for the rapid screening of diverse potato germplasms, the use of nutritionally superior germplasms in potato breeding projects, and as an alternative to conventional wet lab methodology for the quality assessment of potatoes for sustainable production in the era of climate change.E), following the augmented block design.Standard crop cultivation practices were adopted during the growing season.Mature tubers were harvested 90 days after planting.After harvesting, a marketable tuber (40-50 g, free from defects and damage) was preferred for nutritional evaluation.

Sample Preparation
The samples were thoroughly cleaned with tap water, without any contamination or loss of skin.Care was taken during sample selection for wet lab analysis, such as avoiding green color, external damage, and internal unwanted tuber flesh pigmentation.The composite samples were created by combining three to five tubers, with the number dependent on their size.
The tubers were peeled with a potato peeler, and the flesh of the selected tuber was quartered and placed in an NIR ring cup for scanning (Figure 1).The remaining flesh tissues of peeled tubers were used for the estimation of biochemical parameters, viz., vitamin C, total phenols, total carotenoids, anthocyanin, and dry matter.In addition, 50 g of tuber flesh from the same sample was kept in a hot air oven (103 ± 2 • C) for 12-24 h for uniform drying.The dried tuber tissue was then subjected to a further fine (1 mm sieve) powder form using a FOSS Cyclotec.Later, the powder form was subjected to wet lab analysis of the remaining biochemical parameters, that is, starch, amylose, and protein.In the present study, both potato flesh and flour were utilized for NIRS and wet chemistry analyses.

Selection of Samples for Assembling the NIRS Model
Using the FOSS NIRS DS-3 spectrophotometer, (FOSS, Nils Foss Allé 1, DK-3400 Hilleroed, Denmark) 432 germplasms from both locations were scanned and their reflectance spectra were recorded within the wavelength range of 400 to 2400 nm.Ward's method of performing hierarchical clustering on normalized spectral data from 432 germplasms was used, with a squared Euclidean distance metric.The resulting clusters were further analyzed to identify the major clusters and sub-clusters.From this analysis, a subset of 120 highly diverse germplasms was subjected to wet lab analysis that repre-

Selection of Samples for Assembling the NIRS Model
Using the FOSS NIRS DS-3 spectrophotometer, (FOSS, Nils Foss Allé 1, DK-3400 Hilleroed, Denmark) 432 germplasms from both locations were scanned and their reflectance spectra were recorded within the wavelength range of 400 to 2400 nm.Ward's method of performing hierarchical clustering on normalized spectral data from 432 germplasms was used, with a squared Euclidean distance metric.The resulting clusters were further analyzed to identify the major clusters and sub-clusters.From this analysis, a subset of 120 highly diverse germplasms was subjected to wet lab analysis that represented the entire spectrum of variability within the dataset.Vitamin C estimation was performed using the Folin-phenol reagent method [22] on potato flesh.The absorbance at 630 nm was measured using a UV-VIS spectrophotometer and the results were expressed in milligrams per 100 g on fresh weight basis.

Total Phenols
The Folin-Ciocalteu reagent assay, involving oxidation and reduction reactions, was used to assess the total phenolic content of potato flesh [23].A UV-VIS spectrophotometer was used to measure the absorbance at 650 nm and the results are expressed in terms of gallic acid equivalents (GAEs) per 100 g of fresh weight.

Total Carotenoids
The total carotenoid content in the flesh of potatoes was assessed using the AOAC method [24] and the absorbance at 420 nm was measured.The findings were expressed in µg per 100 g on a fresh weight basis.

Anthocyanin
The total anthocyanin content in potato flesh was determined using the pH differential method using the AOAC Official Method 2005.02[25].The pH of the two solutions was set at 1.0 and 4.5 using concentrated hydrochloric acid, prior to taking the absorbance readings in multi-wavelength at 520 nm (HCl/KCl) and 700 nm (sodium acetate), respectively; the results were expressed in µg per g, as cyanidin-3-glucoside equivalents on fresh weight basis.2.4.5.Dry Matter The dry matter content was determined using the AOAC method 925.10 [9].Potato tuber tissue (50 g) was placed in an oven at 103 ± 2 • C until a constant weight was obtained.The dry matter percentage was calculated using the following formula: Dry matter (%) = [(final weight − crucible weight)/initial weight] × 100.

Starch
The Megazyme total starch assay kit, which employs α-amylase, amyloglucosidase, and glucose oxidase peroxidase, was used to assess the total starch concentration in accordance with AOAC 996.11 [26].A UV-VIS spectrophotometer was used to measure the absorbance at 510 nm and the results were represented as percentages based on a dry weight basis.

Amylose
Amylose content from potato flour was estimated on a dry weight basis using a simplified method with a Continuous Flow Analyzer (CFA) in San++ Automated Wet Chemistry Analyzer Model 3000 (Skalar Analytical, Breda, The Netherlands) using the iodine method [27].

Protein
The protein content of potato flour on a dry weight basis was determined using the Dumas combustion method [28].The protein content was determined by converting the nitrogen percentage (%N) using the Jones conversion factor of 6.25.

Spectroscopic Analysis
A FOSS NIRS DS-3 spectrophotometer, which was equipped with Win ISI Project Manager Software Version 1.50, was used for spectroscopic analysis of both potato flesh and flour.Peeled potatoes were sliced using a rolling disc slicer, giving a 0.5 mm slice thickness.A disc of 3.7 cm in diameter was taken from the center of a slice to fit in the NIRS cuvette.The total moisture content was determined using AOAC 934.01 [29] and a range of 73-82.3% was observed in total of 120 potato germplasms.About 5 g of homogenized potato flour was filled in a cuvette and was scanned in a quartz window-equipped circular ring cup with a thickness of 1 mm and 3.8 cm.The check sample P/N:60053128, S/N:83924 provided by the instrument manufacturer, FOSS, was used for calibrating the instrument's optical performance.An average spectrum was generated by subjecting a sample to 32 scans within the 400-2500 nm range, resulting in the recording of log (1/R) values at 2 nm increments.Here, "R" signifies reflectance.

Development of Calibration and Validation Sets
A systematic approach was followed to create calibration and validation sets, involving a comprehensive assessment of 120 potato germplasms, consisting of local varieties, wild species, hybrids, CIP accessions, advanced breeding lines, and commercial varieties, all of which were evaluated for nutritionally relevant biochemical parameters using wellestablished biochemical protocols.Germplasms from both locations were considered in the calibration and validation sets for model development.The 120 germplasms were then carefully divided into a training set (calibration) comprising 80 samples and an external validation set comprising 40 samples.This categorization was based on the variations observed in the biochemical parameters of the samples.The values were meticulously organized using Microsoft Excel to ensure that both the training and validation sets had samples with similar variability and nearly equal minimum and maximum values.This strategic approach greatly facilitated the modeling process, with the validation set helping to guide predictions for the remaining germplasm.

Calibration and Validation of Equations
The calibration equation was developed using Win ISI III Project Manager Software Version 1.50.Multivariate analysis was employed to regress the spectral data against laboratory values.Using software, MPLS regression, along with cross-validation, was carried out on the complete spectrum.Various mathematical approaches, including SNV-DT (SNV with detrend), have been used for scatter correction and spectral data preprocessing for each biochemical parameter [8].A combination of mathematical treatments, denoted as "2,4,4,1", "2,8,8,1", "3,4,4,1", and "3,4,3,2" were used in the model development process.In these notations, the first digit represents the order of the derivative, while the second digit signifies the gap.The third and fourth digits specify the number of data points employed in the first and second smoothing processes, respectively.The quality of the calibration equations was evaluated using various metrics, including the coefficient of determination (RSQ), the standard error of cross-validation [SEC(V)], the standard deviation (SD), and one-minus the variance ratio (1-VR) [6].Additionally, the RSQ external , bias, SEP, SEP(C), and RPD values were used to gauge the model's accuracy.The RPD values were particularly insightful, with their ranges indicating the reliability and quality of predictions.

Statistical Analysis
The Win ISI III Project Manager Software Version 1.50 was utilized as the platform for all calibration and prediction operations, which employed mathematical methods based on spectral and analytical data.The developed equations were used to monitor the reference and predicted values using the software.The accuracy and predictive capacity of the model were assessed using comprehensive statistical parameters, such as RSQ, slope, bias, RPD, and SEP(C).The coefficient of determination (internal/external) was depicted externally using the R programming software for statistical computing [30].Using the Jamovi statistical software program v2.4, a paired sample t-test with a 95% confidence interval was carried out in order to ensure the significance of the findings [31].

Estimation of Nutritional Traits
The evaluation of important nutritional traits for 120 diverse potato germplasms is summarized in Table 1, while the trait variability is illustrated in Figure 2 through violin plots.Our studied germplasm had a vitamin C content from 23.7 to 103.9 mg per 100 g on a fresh weight basis.This finding is in agreement with vitamin C evaluation in fortyeight Indian potato varieties from 19.4 to 58.4 mg per 100 g on a fresh weight basis [32].The analyzed germplasms showed that the total phenol content varied between 14.5 and 108.7 mg per 100 g on a fresh weight basis.Previously, the total phenol content in four popular potato varieties was reported to range from 101.8 to 299.1 mg per 100 g [33].We studied the total carotenoid content on a fresh weight basis among the germplasm and observed a range from 61.9 to 829.6 µg per 100 g on a fresh weight basis.Our reported range was wider than the results obtained by Tatarowska et al. for potato tubers from Poland [34].However, the analysis of 152 potato germplasms in S. phureja, exhibited a broad range of total carotenoid contents, ranging from 103 to 2135 µg per 100 g on a fresh weight basis [3].Our study evaluated germplasms for anthocyanin content and the trait ranged between 0.33 and 539 µg per g on a fresh weight basis.Similarly, an anthocyanin content of 11-174 mg per 100 g on a fresh weight basis was reported by Reyes et al. for purple-and red-fleshed potatoes from Texas, USA [35].The dry matter content, an important biochemical attribute of potatoes, ranged from 11.95 to 24.11% among the studied germplasms.Similar findings, ranging from 14.1 to 35.2%, have been reported for potatoes [13,36].Starch is also an important biochemical constituent and the results of our study, on a dry weight basis, showed variability from 55.7 to 86.1% among the germplasm.This range is consistent with the reported 83-90% in ten varieties and 69.4-72.3% in four varieties [37,38].The range of protein content on a dry weight basis found in the germplasm was between 5.43 and 15.1%, confirming the reported range of 4.93 to 12.3% [36].The amylose content varied from 17.9 to 26.7%, with several reports mentioning a range of 18-29% in potatoes on a dry weight basis [39,40].

NIRS Spectra Acquisition
The combined raw NIRS spectra of the 120 germplasms based on potato flesh are shown in Figure 3a, whereas the raw spectra of potato flour are shown in Figure 3c.The bands that appear are as a result of the overlapping absorption that corresponds to the combination and overtones of vibrational modes N-H, O-H, and C-H found in proteins, fatty acids, and carbohydrates, respectively.Six primary absorption peaks in the fresh potato germplasm were detected at wavelengths of 978, 1188, 1444, 1784, 1924, and 2490 nm, as indicated in Figure 3b.However, for potato flour, eight absorption peaks were observed at wavelengths of 1199, 1460, 1766, 1932, 2100, 2290, 2310, and 2490 nm, as shown in Figure 3d.The spectral range between 2000 and 2222 nm revealed stretching vibrations related to C-O and N-H bonds, which are indicative of protein content [41].In contrast, the bending and stretching of the O-H bonds in polysaccharides were detected around the peak at 1920 nm.This peak was due to the second overtone of O-H and C-O bending, associated with starch [42].The 1650-1750 nm peak corresponds to the second overtone of O-H bending, which is primarily related to water.The first overtone of O-H stretching, related to hydroxyl phenol groups, was observed within the range of 1430-1470 nm.This range peak is as a result of the O-H functional group in starch, whereas the N-H peak is attributed to protein stretching in its first overtone.The 1180-1200 nm peak originates from the second overtone of the C-H vibrations associated with aliphatic hydrocarbons.Similar peaks were found in hot-dried and cold-dried samples of sweet potato [43] and in the spectra of potato chips [18].

NIRS Spectra Acquisition
The combined raw NIRS spectra of the 120 germplasms based on potato flesh are shown in Figure 3a, whereas the raw spectra of potato flour are shown in Figure 3c.The bands that appear are as a result of the overlapping absorption that corresponds to the combination and overtones of vibrational modes N-H, O-H, and C-H found in proteins, fatty acids, and carbohydrates, respectively.Six primary absorption peaks in the fresh potato germplasm were detected at wavelengths of 978, 1188, 1444, 1784, 1924, and 2490 nm, as indicated in Figure 3b.However, for potato flour, eight absorption peaks were observed at wavelengths of 1199, 1460, 1766, 1932, 2100, 2290, 2310, and 2490 nm, as shown in Figure 3d.The spectral range between 2000 and 2222 nm revealed stretching vibrations related to C-O and N-H bonds, which are indicative of protein content [41].In contrast, the bending and stretching of the O-H bonds in polysaccharides were detected around the peak at 1920 nm.This peak was due to the second overtone of O-H and C-O bending, associated with starch [42].The 1650-1750 nm peak corresponds to the second overtone of O-H bending, which is primarily related to water.The first overtone of O-H stretching, related to hydroxyl phenol groups, was observed within the range of 1430-1470 nm.This range peak is as a result of the O-H functional group in starch, whereas the N-H peak is attributed to protein stretching in its first overtone.The 1180-1200 nm peak originates from the second overtone of the C-H vibrations associated with aliphatic hydrocarbons.Similar peaks were found in hot-dried and cold-dried samples of sweet potato [43] and in the spectra of potato chips [18].

Calibration of the NIRS Model
The process typically begins with the creation of a calibration set, also known as a training set, which serves to instruct and train the construction of the model.The use of

Calibration of the NIRS Model
The process typically begins with the creation of a calibration set, also known as a training set, which serves to instruct and train the construction of the model.The use of regression algorithms, including MPLS, PLS, and PCR, can be implemented for the purpose of model development.In comparison with the PLS algorithm, MPLS is often considered to offer greater stability and accuracy [9], making it a good choice for this particular study.The MPLS technique utilizes both spectra and reference compositions to formulate equations, mitigating the influence of significant spectroscopic variations that may not be pertinent.Changes in absorption levels generally occur due to alterations in light scattering and path length, resulting from interventions with sample particles and light.Interpreting NIR spectra and creating a linear calibration becomes notably intricate because of these alterations [8].Spectral pre-processing methods are applied to alleviate the compounding effects from particle size and scattering, specifically through scatter correction and derivatization techniques [21].One such pre-processing technique, the Standard Normal Variate (SNV), involves mean removal from each spectrum, followed by normalizing each signal's value by the standard deviation of the entire spectrum, to center it around zero.Additionally, a detrend (DT) approach was incorporated with the SNV, in this study, to rectify any shifts in the signal baseline and to reduce the NIRS signal noise.Table 2 presents a summary of the calibration developed for various potato biochemical parameters using the MPLS method for vitamin C, total phenols, total carotenoids, anthocyanin, dry matter from potato flesh, and starch, protein, and amylose derived from homogenized potato flour.To develop calibration equations for these parameters, various mathematical treatments, including "2,4,4,1", "2,8,8,1", "3,4,4,1", and "3,4,3,2" were examined and finalized.The selection of the calibration equation was determined by the highest values of 1-VR and RSQ internal , as well as the lowest SEC(V) values.Improvements in the spectral resolution were accomplished using derivatives 2 and 3, which effectively eradicated baseline shifts and overlaid peaks.The use of gaps 4 and 8, as well as smoothing (S1, S2), helped to mitigate the impact of erratic high-frequency perturbations and to improve the signal-tonoise ratio within the specified spectral range.During the generation of the calibration equations, a small number of outliers (<10) resulting from scanning or analytical errors were identified and subsequently removed.RSQ internal values for different biochemical traits, as shown in Table 2, were obtained for vitamin C (0.920), total phenols (0.893), total carotenoids (0.902), anthocyanin (0.837), dry matter (0.794), starch (0.644), amylose (0.905), and protein (0.986) for specific mathematical treatments, including "2,4,4,1", "2,8,8,1", "3,4,4,1", "2,4,4,1", "3,4,4,1", "2,8,8,1", "3,4,3,2", and "3,4,4,1", respectively.(Units: Vitamin C and Total Phenol: mg per 100 g, Total Carotenoids: µg per 100 g, and Anthocyanin: µg per g and Dry Matter (%) on fresh weight (FW) basis.Starch, amylose, and protein contents were expressed as percentages (%) on a dry weight basis.Abbreviations: N-number of samples, RSQ-coefficient of determination, SD-standard deviation, SECV-standard error of cross validation).

Validation of the NIRS Model
External validation statistics for the evaluated biochemical traits are presented in Table 3.No outliers were removed in the external validation, to achieve a higher prediction power and ensure the robustness of the developed models.The best-fit models were chosen based on their higher RSQ external and RPD values, as well as their low SEP, SD, slope, and bias values.To authenticate the model's validity, the RPD value was used, which considers both SEP and variation in values and is more precise than SEP(C) [19].Bedini et.al [10] found an RSQ external of 0.90 for dry matter content, whereas our finding showed an RSQ external of 0.89 (Table 3).(Abbreviations: N-number of samples, RSQ-coefficient of determination, SD-standard deviation, SEP-standard error of performance, RPD-ratio of performance to deviation).
The regression plot of the predicted values against the reference values for the studied traits is shown in Figure 4.

Validation of the NIRS Model
External validation statistics for the evaluated biochemical traits are presented in Table 3.No outliers were removed in the external validation, to achieve a higher prediction power and ensure the robustness of the developed models.The best-fit models were chosen based on their higher RSQexternal and RPD values, as well as their low SEP, SD, slope, and bias values.To authenticate the model's validity, the RPD value was used, which considers both SEP and variation in values and is more precise than SEP(C) [19].Bedini et.al [10] found an RSQexternal of 0.90 for dry matter content, whereas our finding showed an RSQexternal of 0.89 (Table 3).The regression plot of the predicted values against the reference values for the studied traits is shown in Figure 4.The RPD values for the models developed from potato flesh for vitamin C, total phenol, total carotenoids, anthocyanin, and dry matter were 1.857, 2.00, 1.619, 1.856, and 3.041 and from potato flour for starch, amylose, and proteins were 1.867, 1.662, and 3.982, respectively.The RPD values serve as a measure of precision in MPLS models.When the RPD value is below 1.5, it signifies that the model lacks reliability.In the range of 1.5 to 2.0, it suggests the model's capability to differentiate between high and low values.In the 2.0 to 2.5 range, it indicates an approximate ability for quantitative prediction.The RPD values for the models developed from potato flesh for vitamin C, total phenol, total carotenoids, anthocyanin, and dry matter were 1.857, 2.00, 1.619, 1.856, and 3.041 and from potato flour for starch, amylose, and proteins were 1.867, 1.662, and 3.982, respectively.The RPD values serve as a measure of precision in MPLS models.When the RPD value is below 1.5, it signifies that the model lacks reliability.In the range of 1.5 to 2.0, it suggests the model's capability to differentiate between high and low values.In the 2.0 to 2.5 range, it indicates an approximate ability for quantitative prediction.Falling between 2.5 and 3.0 signifies a good-quality prediction and if it exceeds 3.0, the prediction is considered excellent [44].From this study, the RPD values for total phenol are in agreement with those reported (2.20) by Escuredo et al. [9] in potato.The RPD value for phenolics (2.00) indicates that the model is capable of differentiating between higher lower values.The RPD value of dry matter (3.04) aligns with those reported (1.23-2.27) on various local and global models exclusively developed for dry matter [5].The RPD value for the protein (3.98) is in close agreement with that reported (3.99) by Bernhard et al. [36].In our study, RPD values were better in fresh samples than in flour samples; the same results were observed when assessing leaf nitrogen content in wheat [45].These differences between the RPD values in our study may be due to the differences between the homogenization of fresh and flour samples.The slope represents the alteration in the estimated values, resulting from a unit change in the reference values.A slope value of 1 is ideal and any value close to 1 suggests an accurate model.The slope values for various traits in our study were vitamin C (0.85), total phenol (1.20), total carotenoids (0.80), anthocyanin (1.26), dry matter (0.89), starch (1.11), amylose (1.01), and protein (0.99).When assessing a model's accuracy, one crucial factor to consider is its bias, which measures the degree of similarity between the predicted and reference values of the model [10].The ideal bias value is zero, which is attained when the reference and predicted values are equal.This is considered the best possible outcome for bias.An underestimation model is signified by a negative bias and the overestimating model is signified by a positive bias [46].The values of bias for different traits such as vitamin C (−0.001), total phenol (0.003), total carotenoids (−0.158), anthocyanin (−13.66),dry matter (−0.548), starch (−0.661), amylose (−0.053), and protein (0.004) where the developed models for six traits were found to be underestimating and two traits were found to be overestimating.
A paired t-test with a 95% confidence interval was used to assess if the mean of the dependent variable matched the analytical and predicted values for the investigated biochemical parameters [31].In our findings, the p-value was greater than 0.05, indicating the precision and dependability of the models (Table 4).The p-values were as follows: vitamin C (0.429), total phenol (0.173), total carotenoids (0.115), anthocyanin (0.171), dry matter (0.190), starch (0.112), amylose (0.766), and protein (0.973).Therefore, the means of the NIRS method and the standard methods used to assess the traits were found to be not statistically different from one another.

Conclusions
This study aimed to create a rapid evaluation instrument for screening potato genetic resource collections and construct prediction models based on near-infrared reflectance spectroscopy (NIRS).MPLS-based regression models have been developed for vitamin C, total phenols, total carotenoids, anthocyanins, dry matter, starch, amylose, and protein, which are suitable for all traits.High RSQ external and RPD values were found for most potato biochemical traits.The best models were developed for protein, followed by dry matter and total phenols, based on the RPD values.Compared with traditional wet lab techniques, these NIRS prediction models offer a more efficient, environmentally friendly, and less labor-intensive method to simultaneously evaluate the necessary components, providing desirable information about the biomolecules being studied.The combination of vibrational spectroscopy with multivariate methods in the models developed here has the potential to be a valuable analytical tool for potato breeding, food industry, and regulatory agencies.It can be used to efficiently and accurately develop, process, monitor, and evaluate the nutritional quality of potato in a cost-effective manner.This screening process aims to pinpoint trait-specific germplasm and select desired chemotypes, regardless of their genetic background, for the purpose of improving nutritional quality in potato breeding programs.However, these models can be forwarded to applicability studies to verify the accuracy and precision of the developed models.

Foods 2024 , 16 Figure 1 .
Figure 1.Schematic representation of NIRS model development and wet lab biochemistry of potatoes.

Figure 1 .
Figure 1.Schematic representation of NIRS model development and wet lab biochemistry of potatoes.

Figure 2 .
Figure 2. A violin plot depicting the nutritional variability among the studied 120 potato germplasm.

Figure 2 .
Figure 2. A violin plot depicting the nutritional variability among the studied 120 potato germplasm.

Figure 3 .
Figure 3. (a) Combined raw NIRS spectra of potato flesh of 120 germplasms.(b) An average reflectance spectrum of potato flesh with peaks.(c) Combined raw NIRS spectra of 120 potato flour germplasms.(d) Average reflectance spectrum of homogenized potato flour with peaks.

Figure 4 .
Figure 4. Measured references versus NIRS in the prediction set.

Figure 4 .
Figure 4. Measured references versus NIRS in the prediction set.

Table 1 .
Descriptive analysis of potato biochemical traits.

Table 2 .
Calibration statistics of biochemical traits evaluated in potatoes.

Table 3 .
Validation statistics of biochemical traits evaluated in potatoes.

Table 3 .
Validation statistics of biochemical traits evaluated in potatoes.Abbreviations: N-number of samples, RSQ-coefficient of determination, SD-standard deviation, SEP-standard error of performance, RPD-ratio of performance to deviation).