Identiﬁcation and Quantiﬁcation of Turmeric Adulteration in Egg-Pasta by Near Infrared Spectroscopy and Chemometrics

: “Egg pasta” is a kind of pasta prepared by adding eggs in the dough; the color of this product is often associated to its quality, as it is proportional to the quantity of egg present in the dough. A possible adulteration on this product is represented by the addition of turmeric (not reported in the label) in the dough. The inclusion of this ingredient (which is minimal, given the strong coloring power of this spice) fraudulently accentuates the yellow color of the product, making it more attractive to the consumer. Given this scenario, the aim of the present work is to develop an analytical approach suitable at detecting the presence of turmeric as an adulterant in egg pasta. One hundred samples of traditional and adulterated egg pasta were analyzed by NIR spectroscopy and PLS-DA (Partial Least Squares Discriminant Analysis) in order to discriminate adulterated and compliant pasta. The classiﬁcation model provided a total correct classiﬁcation rate of 97.5% in external validation (40 samples). Eventually, the adulterant was quantiﬁed by PLS. This strategy provided satisfying results, achieving a RMSEP (Root Mean Square Error in Prediction) of 0.112 (%-w / w) in external validation.


Introduction
Pasta is a staple food in several Mediterranean Countries and it is consumed and appreciated all over the world [1]. "Egg pasta" is pasta prepared by adding eggs to the dough. This aliment presents special physical, chemical and organoleptic characteristics, which differentiate it from common durum pasta [2][3][4]. From the physical point of view, one of the main differences between these two close products is represented by the bright yellow color provided by the presence of egg yolks. Since the presence of eggs positively influences taste, flavor and texture of egg pasta, a bright yellow color of the product is perceived by the consumers as an indication of the quality of the egg-pasta. Standing these considerations, it may happen that fraudulent pasta makers could add dyeing compounds (without declaring it on the label) to enhance the color of the product, in order to deceive colorimetric quality controls, and to make it more attractive to the eye of the consumers. In this context, a common adulterant is Curcuma longa, a zingiberaceae spice better known as turmeric. This is due to the fact even a small amount of this spice provides a bright yellow color and it is non-toxic and relatively cheap. The active ingredient of this spice, curcumin, is a natural dye often used in different contexts such as fabrics, cosmetics, textiles but also in the food industry (E100) [5].
In the light of these considerations, the aim of the present work is to develop a non-destructive analytical methodology to detect fraudulent addition of turmeric in egg pasta. For this reason, law-conformant and adulterated samples of egg pasta were prepared in the lab (following a traditional accredited recipe) and analyzed by Near Infrared Spectroscopy (NIR). This instrument was considered a suitable tool because it is non-destructive, and, consequently, it avoids any loss of product. Spectra were handled by two different chemometric techniques; Partial Least Squares Discriminant Analysis (PLS-DA) with the aim of distinguishing adulterated and pure samples, and Partial Least Squares (PLS) in order to quantify turmeric in counterfeited samples. These strategies were chosen because both of them are very efficient, and they are often used for the analysis of food products [6][7][8][9], in particular in combination with NIR. Since this strategy is non-destructive, it is especially suitable for high-value-added food [10][11][12][13][14].

Preparation of Samples
Law-conformant egg pasta doughs were prepared following the traditional procedure used by egg-pasta masters. Samples were prepared by mixing 100 g of durum semolina and 1 egg. Adulterated samples were obtained by addition of 19 different percentages of turmeric powder from 0.01% to 1% (weight turmeric powder/weight final dough sample); adulteration percentage are reported in Table 1. Four adulterated dough samples were prepared for each adulteration percentage (for a total of 19 × 4 = 76 adulterated doughs). In total, twenty-four law-conformant and seventy-six adulterated doughs were available for the analysis. Eventually, each sample was spread by means of a clean rolling pin into strips of the desired thickness (resembling the "pappardelle" shape).

NIR Analysis
NIR spectra were collected using a Nicolet 6700 FT-NIR (Thermo Scientific Inc., Madison, WI) connected to an integrating sphere (Thermo Scientific Inc., Madison, WI). The investigated spectral range was between 4000 and 10,000 cm −1 (nominal resolution: 4 cm −1 ). Each egg-pasta sample was analysed in 6 replicates (collecting a spectrum in three different position of the stripe on each side of it) for a total of 600 NIR signals. Spectra were recorded by the OMNIC software (Thermo Scientific Inc., Madison, WI) and exported in MATLAB 2015b (The Mathworks, Natick, MA). Once imported in Matlab, NIR signals (collected in reflection mode and registered as per cent reflectance, R) were transformed into pseudo-absorbance (log(1/R)); finally, replicates were averaged.

Classification Approach
Partial Least Squares-Discriminant Analysis (PLS-DA) is a discriminant classifier developed in order to overcome the issued encountered by Fishers Linear Discriminant Analysis when handling ill-conditioned data matrices [15]. This approach exploits PLS [16,17] and a so-called Dummy Y (a response matrix binary encoding the class-belonging) in order to convert a classification problem into a regression one (whose solution is achieved by PLS).
Very briefly, for a two-class problem (as the one discussed in the present study), given a data matrix X (e.g., constituted by the N tr instrumental measures used as calibration set), the dummy y is a partitioned binary vector made of N tr elements: each object belonging to Class 1 is represented by the value y 1 = 1, while samples belonging to Class 2 are encoded by y 2 = 0 [18].
Once the dummy response vector y is generated, the model is built by solving Equation (1) and estimating the regression coefficients b: y being the vector of predicted responses and e collecting the residuals. After solving the regression problem in Equation (1) by PLS, a further step is needed to achieve classification, since the predicted responseŷ is not binary anymore but real-valued: accordingly, a suitable classification rule has to be built based on the values ofŷ. For a two-class problem, such as the one in the present study, this usually translates in setting a threshold y thr toŷ, so that ifŷ i > y thr , the sample is assigned to class 1, and otherwise it is predicted to be class 2. In particular, in the present work the threshold was calculated based on the probabilistic approach proposed by Perez and coworkers [19].
When new, unknown, samples have be to classified, their predicted responsesŷ new are calculated based on the measurements X new and the regression coefficients b estimated on the training set, and the classification rule described above is then applied to assign each individual to one of the categories under study.

Regression Approaches: PLS
Partial Least Squares (PLS) is a well-known and widely applied regression method. It allows fitting response matrix (or vector) Y (or y) to a predictor matrix X. This approach is widely used with different aims in various contexts. In the present study, a PLS model was built on NIR data in order to quantify turmeric in the adulterated doughs. The reader is addressed to [16,17] for more details on the algorithm of PLS.

Detection of Adulterated Egg Pasta Samples
NIR spectra were collected for all the available samples as described in Section 2.2; average spectra for "Class Conformant" (blue line) and "Class Adulterated" (red line) are shown in Figure 1. The mean raw spectra for the two categories look alike ( Figure 1a); even after the 1st derivative (Figure 1b In order to perform external validation of the classification model, samples were divided into calibration (training) and validation (test) set by the Duplex algorithm [20]; more details are reported in Table 2. Training samples were used to define model parameters (number of latent variables (LVs) and the optimal data pretreatment) into a 7-fold cross-validation procedure; test objects were left out as external validation set and used to estimate the reliability of the calibration model. Different preprocessing approaches (aimed at removing possible uninformative information) were tested; pretreated data were used to build PLS-DA models and then the most suitable solution was defined on the basis of the Correct Classification Rate in Cross-Validation (%CV). Tested pretreatments were: mean centering, 1 st and 2 nd derivative (following the Savitzky-Golay approach with a 19 points window and a third order interpolating polynomial [21]), Standard Normal Variate (SNV) [22], Multiplicative Scatter Correction (MSC) [23] and their combination (see Table 3). In order to perform external validation of the classification model, samples were divided into calibration (training) and validation (test) set by the Duplex algorithm [20]; more details are reported in Table 2. Training samples were used to define model parameters (number of latent variables (LVs) and the optimal data pretreatment) into a 7-fold cross-validation procedure; test objects were left out as external validation set and used to estimate the reliability of the calibration model. Different preprocessing approaches (aimed at removing possible uninformative information) were tested; pretreated data were used to build PLS-DA models and then the most suitable solution was defined on the basis of the Correct Classification Rate in Cross-Validation (%CV). Tested pretreatments were: mean centering, 1 st and 2 nd derivative (following the Savitzky-Golay approach with a 19 points window and a third order interpolating polynomial [21]), Standard Normal Variate (SNV) [22], Multiplicative Scatter Correction (MSC) [23] and their combination (see Table 3). From Table 3, it was evident that the PLS-DA model providing the highest correct classification rate in cross-validation was the one built on data preprocessed by MSC (Model VII), which provided a total classification rate of 97.65%. Consequently, this model was used to predict the test samples and it provided correct classification rates of 100% and 97% for "Class Conformant" "Class Adulterated", respectively.
A graphical representation of these results is displayed in Figure 2, where the values of the predicted response for both the training and the test samples are shown. In the plot, the black dashed line is the delimiter between the two categories: a sample fallowing above it will be assigned by the model to "Class Conformant", while the objects below the delimiter are predicted as belonging to "Class Adulterated". In the plot, all the analyzed samples are displayed: both the calibration (empty symbols) and the validation (full symbols) objects. From this representation it is clear all the law-conformant objects (blue squares) have been properly classified, while only one adulterated test sample (full red diamonds) was assigned to the wrong category. From Table 3, it was evident that the PLS-DA model providing the highest correct classification rate in cross-validation was the one built on data preprocessed by MSC (Model VII), which provided a total classification rate of 97.65%. Consequently, this model was used to predict the test samples and it provided correct classification rates of 100% and 97% for "Class Conformant" "Class Adulterated", respectively.
A graphical representation of these results is displayed in Figure 2, where the values of the predicted response for both the training and the test samples are shown. In the plot, the black dashed line is the delimiter between the two categories: a sample fallowing above it will be assigned by the model to "Class Conformant", while the objects below the delimiter are predicted as belonging to "Class Adulterated". In the plot, all the analyzed samples are displayed: both the calibration (empty symbols) and the validation (full symbols) objects. From this representation it is clear all the lawconformant objects (blue squares) have been properly classified, while only one adulterated test sample (full red diamonds) was assigned to the wrong category. In order to deeply investigate the system under study, Variable Importance in Projection (VIP) [24] was used to define which spectral variables provide the greater contribution to the discrimination. Customarily, variables exhibiting VIP indices higher than 1 are considered relevant for the solution of the investigated problem [25]. A graphical representation of the outcome of the VIP analysis is reported in Figure 3. In order to deeply investigate the system under study, Variable Importance in Projection (VIP) [24] was used to define which spectral variables provide the greater contribution to the discrimination. Customarily, variables exhibiting VIP indices higher than 1 are considered relevant for the solution of the investigated problem [25]. A graphical representation of the outcome of the VIP analysis is reported in Figure 3.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 8 In the plot, the blue line represents the average spectrum; variables having VIP > 1 are highlighted in red. From the figure, it appeared the variables contributing the most to the discrimination were those between 7610 and 6738 cm −1 , ascribable to the 1st overtone of the O-H stretching; those between 5392 and 5097 cm −1 associable to the 2 nd overtone of the C=O stretching and to the 1 st overtone of the C-H stretching and spectral features between 4387 and 4000 cm −1 , probably related to the C-H stretching and to the effects of C-H deformations [26,27]. In the plot, the blue line represents the average spectrum; variables having VIP > 1 are highlighted in red. From the figure, it appeared the variables contributing the most to the discrimination were those between 7610 and 6738 cm −1 , ascribable to the 1st overtone of the O-H stretching; those between 5392 and 5097 cm −1 associable to the 2 nd overtone of the C=O stretching and to the 1 st overtone of the C-H stretching and spectral features between 4387 and 4000 cm −1 , probably related to the C-H stretching and to the effects of C-H deformations [26,27].

Quantification of the Adulterant
Eventually, the possibility of quantifying the adulterant in the analyzed samples has been investigated. In this case, the study was restricted to the only adulterated samples; consequently, 76 samples were used for this part of the analysis. Additionally, in this case, spectra were divided into two sets, the calibration one, made of 43 objects, and the validation set, constituted by 33 samples. The same preprocessing approaches reported above were used for PLS analysis; results from the different (7-fold cross-validated) models are reported in Table 4. On the basis of the RMSECV, Model IV was considered the optimal one and, consequently, it was applied to predict the adulterant percentages in test samples (pretreated by SNV). Taking into account the low percentages of turmeric powder present in each sample, the results were quite acceptable; in fact, Model IV led to a Root Mean Square Error in Prediction (on the test set) of 0.112 (%w/w).

Discussion
The present work highlighted NIR coupled with PLS-DA was a suitable tool for detecting egg pasta adulterated by turmeric. Additionally, the PLS analysis of spectra allowed a satisfying quantification of the adulterant even when present in very low quantity (<3% of the total weight). In conclusion, the proposed strategies demonstrated to be effective tools for fraud detection in traditional egg pasta. Due to the nature of the analytical technique involved (NIR), these approaches can find different applications; among the others, they could be used in factories in online systems for quality control.

Conclusions
The present work aimed at developing a NIR-based analytical methodology to detect fraudulent addition of turmeric in egg pasta. Law-conformant and adulterated egg pasta samples were prepared in the lab and analyzed by Near Infrared Spectroscopy (NIR). Eventually, spectra were handled by two different chemometric techniques: PLS-DA, in order to discriminate adulterated and pure samples, and PLS, to quantify turmeric in counterfeited samples. Both approaches performed well, demonstrating the investigated approaches are definitely suitable for the inspected goal. In fact, PLS-DA analysis provided correct classification rates of 100% and 97% for "Class Conformant" and "Class Adulterated", respectively (corresponding to 1 misclassified object over 40 test samples), whereas PLS led to a Root Mean Square Error in Prediction (on the test set) of 0.112 (%w/w).