FTIR ‐ ATR Spectroscopy Combined with Multivariate Regression Modeling as a Preliminary Approach for Carotenoids Determination in Cucurbita Spp .

Quantitative  analysis  of  carotenoids  has  been  extensively  reported  using  UV‐Vis spectrophotometry and chromatography, instrumental techniques that require complex extraction protocols with  organic  solvents.  Fourier  transform  infrared  spectroscopy  (FTIR)  is  a  potential alternative for simplifying the analysis of food constituents. In this work, the application of FTIR with attenuated  total  reflectance  (ATR) was  evaluated  for  the determination of  total  carotenoid content  (TCC)  in Cucurbita spp. samples. Sixty‐three samples, belonging  to different cultivars of butternut  squash  (C.  moschata)  and  pumpkin  (C.  maxima),  were  selected  and  analyzed  with FTIR‐ ATR  (attenuated  total  reflectance). Three different preparation protocols  for samples were followed: homogenization (A), freeze‐drying (B), and solvent extraction (C). The recorded spectra were used  to develop  regression models by Partial Least  Squares  (PLS), using data  from TCC, determined by UV‐Vis spectrophotometry. The PLS regression model obtained with the FTIR data from the freeze‐dried samples, using the spectral range 920–3000 cm−1, had the best figures of merit (R2CAL of 0.95, R2PRED of 0.93 and RPD of 3.78), being reliable for future application in agriculture. This approach for carotenoid determination in pumpkin and squash avoids the use of organic solvents. Moreover,  these results are a rationale  for  further exploring  this technique  for  the assessment of specific carotenoids in food matrices.


Introduction
Carotenoids, a complex family of isoprenoids, are relevant in plant foods and agriculture, as nutrients, antioxidants, pigments, and ripening indicators [1,2]. They are the main reason for orange-red-yellow colors of plant tissues, they act as antioxidants, and confer benefits for human health, including provitamin A activity in some cases [3].
Hundreds of carotenoids have been found in food matrices, but usually, a plant food has one to five predominant carotenoids, with a series of carotenoids in trace amounts [4] The most widely used techniques for the evaluation of carotenoids are UV-Vis spectrophotometry and HPLC coupled to diode array detectors (DAD) and mass spectrometers (MS) [3,5]. These techniques demand high solvent consumption and extensive sample preparation and extraction protocols. In this context, there is an emerging interest in vibrational spectroscopy, more specifically in Fourier transform infrared spectroscopy (FTIR), as a powerful tool for fast and simple analysis of compounds in food matrices, when coupled with multivariate statistical methods [6]. The mid-infrared region (MIR) of the electromagnetic radiation, located from 4000 cm −1 to 450 cm −1 , contains information arising from molecular vibrations, which is sensitive to the chemical and physical states of the sample. Therefore, several FTIR-based methods have been developed to discriminate and determine various properties of food matrices [7][8][9][10], showcasing an important reduction of chemicals and time consumption, as the main advantage compared to conventional analytical techniques. Moreover, the use of attenuated total reflectance (ATR) devices simplifies the application of FTIR for qualitative and quantitative analysis of materials, including food matrices.
The capability of this technique for the determination of carotenoids in food matrices has been demonstrated in tomatoes, with good performance in lycopene quantification [11]. However, there are no previous studies of either FTIR-ATR or other MIR-based techniques for the evaluation of carotenoids in pumpkin or squash pulp, two of the richest sources of these compounds among fresh produce. Therefore, the aim of this work was to develop a methodology for the simple and fast quantification of total carotenoid content in Cucurbita spp. samples by FTIR-ATR spectroscopy combined with multivariate techniques.

Plant Material
For this study, 63 samples of Cucurbita spp. were obtained from different sources. Nearly half of the samples (n = 31), belonging to the cultivars Boloverde (n = 6), Dorado (n = 8), Abanico 75 (n = 9), from sp. Cucurbita moschata Duchesne, and Mandarino (n = 8), from sp. Cucurbita maxima, were harvested at the experimental center of the Universidad Nacional de Colombia in Palmira (Valle del Cauca) [3°32′05″N 76°17′44″O; 1001 m.a.s.l.]. The harvesting process of these samples took different times to be completed, according to the characteristic ripening periods of each cultivar. Namely, from planting to harvest, cv. Mandarino took 70-80 days, cv. Boloverde 120-150 days, and cv. Abanico 75 and cv. Dorado, 90-100 days. The samples were sent to the laboratory within 2-3 days after harvesting; they were disinfected in a 200 mg/L sodium hypochlorite solution, rinsed, and kept in a freezing room at −30 °C until processing, in a time no longer than two days after receipt. The genetic, morphological and physiological traits of these cultivars have been described elsewhere [12,13]. The other half of the samples (n = 32) was acquired directly from local suppliers from different regions of Colombia, at the wholesale collection center Corporación de Abastos de Bogotá S.A. (Corabastos), the same day of arrival in Bogotá. These samples were whole butternut squashes from sp. C. moschata, in their commercial ripening degree, harvested in the departments Huila (n = 8), Tolima (n = 8), Valle del Cauca (n = 8), and Meta (n = 8). There was no available information of the harvesting duration of the commercial samples; however, at the climatic and agricultural conditions of these regions, harvesting occurs usually nearly 90 days after floration, or between 90 to 150 days after planting [14]. The samples were disinfected in a 200 mg/L sodium hypochlorite solution, rinsed, and kept in a freezing room at −30 °C until processing, in a time no longer than two days after purchase. Three different preparation protocols (A, B, and C) were followed, corresponding to increasing analyte (carotenoids) isolation degrees in samples. First, the shell and seeds were manually separated from the pulp, which was subsequently portioned and processed with a knife blender (SharkNinja, Canada). The pulp after sole homogenization is referred to as fresh pulp and corresponds to the lowest analyte isolation degree (A). The remaining portion of the fresh pulp was freeze-dried (−50 °C in the condenser, 30 °C in the heating chamber, 1 mbar, 48 h), and it is referred to as freeze-dried pulp, which corresponds to the intermediate analyte isolation degree (B). Finally, freeze-dried samples were subjected to the extraction protocol described in Section 2.4 with hexane:acetone (1:1), and are referred to as extracts, corresponding to the highest analyte isolation degree (C).

Determination of Physicochemical Properties
Squash/pumpkin samples (from the homogenized fresh pulp) were characterized by the following methodologies: moisture content was determined gravimetrically by air drying, according to the standard method AOAC 925.09; total soluble solids was measured refractometrically, according to the standard method AOAC 932.12; pH was determined potentiometrically, according to the standard method AOAC 981.12, using a calibrated FP20 pHmeter (Mettler-Toledo, Switzerland); and titratable acidity was determined by titration with standardized NaOH (0.1 N), according to the method AOAC 942.15 [15].

Determination of Total Carotenoid Content (TCC) by UV-Vis Spectrophotometry
Freeze-dried pulp was ground and transferred (100 mg) to a 10 mL Falcon tube. An extractive solvent mixture (1 mL of hexane:acetone in a 1:1 ratio) (Merck, USA) was added. Then, the mixture was shaken in a vortex for 20 s and centrifuged at 14,000× g rpm for 5 min at 4 °C. This procedure was done several times until the disappearance of the color in the solvent mixture; then the extracts were filtered with Whatman 4 filter paper. Finally, the extractive solvent was added to complete 10 mL in volumetric flasks. The absorbance of the extracts was read in a spectrophotometer Genesys 10S (Thermo Scientific, USA) at 454 nm.
For the calibration curve, the following procedure was carried out: for the preparation of the stock solution, 0.005 g of the β-carotene standard (Sigma-Aldrich, 99%) was weighed, transferred to a 50 mL volumetric flask, and adjusted with the solvent mixture hexane:acetone (1:1 ratio). Standard solutions of concentration 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5 and 7.5 μg/mL were prepared, and their absorbance was read at 454 nm. The calibration curve, from 13 replicates, was calculated by a least square regression: 0.0222 u.a. intercept; 0.1592 u.a./μg/mL slope; R 2 > 0.99; TCC was expressed as μg/g (μg of β-carotene equivalents/g), calculated according to the following equation: where is the measured absorbance at 454 nm wavelength; b and m are the intercept and slope of the calibration curve, respectively; V is the extract volume (mL); W is the weight of the freezedried sample (g), and DF is the corresponding dilution factor. Total carotenoid content was corrected to a dry basis or wet basis, according to the moisture content of the sample (freeze-dried or fresh), determined as described in Section 2.3.

Fourier Transform Infrared Spectroscopy with Attenuated Total Reflectance (FTIR-ATR)
Portions of 20-30 mg of the fresh pulp (A) and the freeze-dried pulp (B), volumes of 50 μL of the extracts (C), and nearly 2 mg of the of β-carotene and lutein analytical standards (Sigma-Aldrich, 99%), were analyzed by FTIR-ATR. Spectral data were obtained with an FTIR equipment (FTIR-4100 spectrometer, Jasco, Japan) using a diamond single reflection attenuated total reflectance (ATR) device. A total of 24 spectra per sample were acquired with 24 scans per spectrum with a spectral resolution of 4 cm −1 in the spectral interval of 4000 to 450 cm −1 . The measured spectra were recorded and pre-treated with the built-in procedures for water and CO2 elimination, with the software Spectra Manager (v. 2.7, Jasco, Japan).

Partial Least Squares Regression (PLS)
Regression models for total carotenoid were developed by partial least squares (PLS) combining spectral information (X) with the dependent information (Y), i.e., TCC determined by UV-Vis. The spectra were reduced in spectral range (1300-3000 cm −1 or 920-3000 cm −1 ), and divided into two datasets by the Kennard-Stone algorithm; one dataset, containing around 70% of the spectra, was used for model calibration, and the remaining 30% of the data was used for testing the model predictiveness.
The model dimensionality, a.k.a. the number of latent variables was individuated by the venetian blind cross-validation strategy. Statistical parameters such as root mean error of calibration (RMSEC) and prediction (RMSEP) and coefficient of determination (R 2 ) of both calibration (R 2 CAL) and prediction (R 2 PRED) were used to identify the most appropriate model. Furthermore, the prediction ability of the obtained models was evaluated by comparing the RMSEP with the Standard Error of Laboratory (RMSEL) according to the procedure proposed by Shenk and Westerhaus [16]. The models were developed with PLS Toolbox (v. 8.5, Eigenvector Research, Inc., Aeattle, WA, USA) working under MATLAB environment (v. 2016a, Mathworks, Inc., Natick, MA, USA).

Total Carotenoid Content and Physicochemical Quality Indexes of Cucurbita spp. Samples
The TCC of the 63 samples of Cucurbita pulp is presented in Table A1 (Appendix A), along with the physicochemical quality parameters (moisture content, pH, titratable acidity, and total soluble solids). From these results, it is possible to highlight that this vegetable harvested in Colombia is an important source of carotenoids, presenting a great variability in its total content (155.8-2137.3 μg/g in dry basis), according to the cultivar or geographical origin, as well as within each variety, thus guaranteeing a robust representation of total carotenoids for regression model development. C. moschata from Tolima presented the highest TCC, with values that exceed 1000 μg/g (dry basis), probably associated with a higher maturity index, as indicated by average higher pH, and lower acidity [1]. The Dorado cultivar presented the lowest TCC, with values between 155.8 and 291.8 μg/g (dry basis).
There are different reported ranges of TCC of both butternut squash (C. moschata) and pumpkin (C. maxima); in general, the carotenoid content of Colombian Cucurbita spp. samples, analyzed in this study, is in agreement with previous reports. In a study developed with C. moschata from Colombia (Valle del Cauca) [17], the author found that the content ranges from 490.1-1365.8 μg/g. In a study with Brazilian pumpkins (C. moschata), TCC varied between 234.21 and 404.98 μg/g [18]. Another investigation revealed values of 1.1-42.3 μg/g (wet basis) for six cultivars from C. moschata [19]. On the other hand, the investigation performed in Germany for seven cultivars from C. maxima revealed a TCC range from 17-683 mg/kg (dry basis) [20]. Figure 1A shows the characteristic FTIR spectra of the fresh pulps. Overall, the spectra are similar regarding the position of the characteristic peaks. However, the intensity of various peaks, particularly those between 1200 and 500 cm −1 , varies notably, which supports the feasibility of obtaining quantitative information on the chemical composition of Cucurbita species from these spectra. In the spectra of fresh pulps, the β-carotene and lutein bands could be overlapped with the peaks of the functional groups of other molecules. However, it is possible to distinguish some characteristic peaks of these compounds, such as 960.1 cm −1 and 963.1 cm −1 , as observed in the spectra of the pure compounds ( Figure 2). The peak around 1550-1600 cm −1 corresponds to C=C double bond stretching vibrations of βcarotene [20]. The region of approximately 1450 cm -1 is associated with vibrations of antisymmetric deformation of CH3 groups (change in HCH angles) and CH2 groups (scissor vibrations) [20,21]. Likewise, the bands located around 1360-1390 cm -1 are likely the result of the "umbrella" vibrations of the CH3 groups [21].

Spectral Analysis
The most intense peak of the spectra, located in the range 950-980 cm −1 , corresponds to the deformation vibrations of the C-H bonds in the polyene chain, whereas the band located at approximately 520-530 cm −1 is associated with the change in the angles and deformation of the polyene chain [21,22].
In these spectra, it is also possible to identify a characteristic band around 1100 cm −1 ; this absorbance peak corresponds to the C-O stretch vibrations of alcohols, which probably arises from xanthophylls, such as lutein. Indeed, this band is recognizable in the FTIR spectra of the pure compound ( Figure 2). However, it might be also associated with the presence of other compounds with alcohol functional groups in the pulp of spp. Cucurbita [23][24][25].
There is a remarkable similarity of the spectra of Figure 1A with those reported for pectin-rich pumpkin extracts [26]. The same broad intense peak located between 3000 to 3600 cm −1 was observed, which is associated with the oscillations of the -OH groups of water. Furthermore, the bands of 2926 cm −1 (functional groups with C-H bonds), the spectral region 1500-2000 cm −1 (oscillations of C=O bonds), and regions such as 1730-1760 cm −1 (corresponding to COO-R bonds) and 1600-1630 cm −1 (corresponding to COO-bonds), were identified in pumpkin pectin-rich extracts [26]. All the above indicates that, besides carotenoid compounds, the characteristic FITR spectra of Cucurbita spp. pulp, shown in Figure 1A, are determined also by the presence of pectin, and possibly other polysaccharides.
In the freeze-dried pulp spectra, shown in Figure 1B, the relative absorbance of the broad and intense peak, with a maximum between 3200 and 3600 cm −1 , which corresponds to oscillations of the -OH groups of the water molecule, is reduced due to the freeze-drying process.
In these spectra, it is possible to appreciate the same characteristic bands of the spectra of the fresh pulps: 2926 cm -1 that corresponds to the oscillation of groups with C-H bonds, 1550 cm -1 , associated with double bonds of β-carotene. One of the typical bands of β-carotene and lutein can also be observed around 968 cm −1 [27]. Figure 1C shows the spectra of the extracts in hexane/acetone. As in the spectra of the pure compounds (Figure 2), it is possible to identify in the extracts the characteristic band of β-carotene at 3000 cm −1 . However, this band overlaps with the typical hexane band that is also around 3000 cm −1 , associated with C-H bonds. Likewise, in these spectra the characteristic bands of acetone are observed, which include the intense band at approximately 1720 cm −1 , associated with C=O bonds, and the band around 1220 cm −1 that corresponds to C-C-C bending [28]. As previously stated, βcarotene produces a characteristic band around 968 cm −1 , nevertheless, the absorbance of the solvent mixture is predominant, therefore "hiding" the corresponding bands of the carotenoids.

PLS Regression: Carotenoid Content Prediction by IR Spectral Data
Different spectral ranges and spectral pre-treatment were investigated to develop PLS models and predict TCC in Cucurbita spp. samples. Models were developed for all the preparation protocols investigated (blender homogenization (A), freeze-drying (B), and hexane/acetone extraction (C)). Cross-validation was performed for each model, to confirm the calibration robustness, and to select the number of latent variables that minimized the calibration and cross-validation error (RMSEC and RMESECV). The details of the best models developed are reported in Table 1. For each sample preparation, the best model in the prediction stage was selected taking into account the lowest error (RMSEP) and the higher coefficient of determination (R 2 ). LV, latent variables; TCC, total carotenoids content; min, minimum value of the prediction set; max, maximum value of the prediction set; S.D., standard deviation of the prediction set; R 2 , coefficient of determination; RMSEC, root mean square error of calibration; RMSEP, root mean square error of prediction; RPD, the residual predictive deviation. The error for fresh pulp is expressed as μg/g on wet basis, whereas the error for freeze-dried samples is expressed as μg/g on dry basis.
The model obtained with fresh pulp presented low predictability (R 2 PRED of 0.66) compared to the R 2 CAL (0.72), demonstrating that the model is not stable when moving to the prediction dataset. Indeed, the RPD (residual predictive deviation) reached 1.72, marking an intermediate model performance, which will need some improvement to be further applied [29] This was probably due to the high water content, which entails a lower nutrient concentration, a lower absorbance of carotenoids in the spectra, and a higher effect of the absorbance linked to water content. The RMSEP obtained was 25.66 μg/g on a wet basis, which is three times the mean standard deviation (S.D.) of the reference data on wet bases (6.6 μg/g), and greater than twice the Standard Error of Laboratory (RMSEL), which was 8.47 μg/g. According to Shenk and Westerhaus [16], a prediction model can be considered for real implementation when RMSEP < 2 RMSEL, thus the obtained model is not reliable if not for screening purposes.
On the other hand, the regression model obtained from the spectra of the freeze-dried samples improved considerably (R 2 PRED of 0.93, RMSEP of 193 μg/g on dry basis), attributed to the increase of characteristic absorption bands from spectra not covered by water absorption bands. In this case, the RMSEL was of 91.07 μg/g on a dry basis, half of the RMSEP, thus the model is within the limit of reliability for implementation of FTIR-ATR spectroscopy, coupled with PLS regression, for the determination of carotenoids in freeze-dried pumpkin samples. Moreover, the RPD was greater than three (Table 1), suggesting the model reliability for agriculture application [29].
It was considered that by increasing the degree of isolation of the analyte, that is, by carrying out the extraction of total carotenoids in hexane/acetone, it would be possible to improve the predictability of the model. However, the acetone and hexane functional groups overlapped, and attenuate the carotenoids bands, as discussed above, ultimately causing very poor predictability of the corresponding regression models. Therefore, the results of the models obtained with the fresh and freeze-dried samples are presented in Table 1.
The performance of PLS models developed in different studies for the determination of carotenoid content in fresh or minimally processed food products, using MIR and NIR (Near Infrared) spectral data, are shown in Table 2. In comparison, the PLS model developed for freezedried samples had good statistical performance, with figures of merit comparable to those previously reported. In summary, the PLS model obtained with FTIR-ATR spectral data from freeze-dried samples was the most appropriate. This model allowed us to obtain TCC values comparable to those reported by the conventionally spectrophotometric UV-Vis technique, with lower time consumption and avoiding the need for extraction protocols and solvents. The comparison between both techniques is presented in Figure 3.

Conclusions
PLS regression models based on FTIR-ATR spectra of freeze-dried samples were successfully developed for the assessment of total carotenoid content of Cucurbita spp., with R 2 values of 0.95 and 0.93 for calibration and prediction, respectively. A wide IR region between 900 cm −1 and 3000 cm −1 was appropriate for the PLS regression models, with absorption bands in the region 950-980 cm −1 providing useful quantitative information, probably associated to trans C=C groups from isoprenoids, such as β-carotene and lutein, which are known to be abundant in Cucurbita. Although extracts collected from the mixture of solvents had a greater isolation degree of the analyte, the regression models presented poor predictability, which can be attributed to the overlapping of carotenoid bands with the solvent bands. By the application of PLS regression models in spectra between 920-3000 cm −1 , TCC could be determined in pumpkin and butternut squash through a fast methodology that does not entail the use of solvents nor extraction protocols and requires minimal training, thus representing an advantage in terms of simplicity, cost, and environmental impact. Funding: This research was funded by Minciencias and Patrimonio Autónomo Fondo Nacional de Financiamiento para la Ciencia, la Tecnología y la Innovación Francisco José de Caldas (grant number FP44842-271-2018). The article processing charge was partially covered by the University of Milan.

Acknowledgments:
The authors thank Minciencias for the financial support; the Institute of Food Science and Technology, in particular Professor L.F. Gutiérrez, for the acquisition of the FTIR instrument and his kind guidance on the use of the freeze-dryer instrument, and also Mrs. Cristina Lizarazo and Mr. Jorge Sandoval, for their kind technical support in the data acquisition and sample processing.

Conflicts of Interest:
The authors declare no conflict of interest.