Hyperspectral Imaging for the Detection of Bitter Almonds in Sweet Almond Batches

: A common fraud in the sweet almond industry is the presence of bitter almonds in commercial batches. The presence of bitter almonds not only causes unpleasant ﬂavours but also problems in the commercialisation and toxicity for consumers. Hyperspectral Imaging (HSI) has been proved to be suitable for the rapid and non-destructive quality evaluation in foods as it integrates the spectral and spatial dimensions. Thus, we aimed to study the feasibility of using an HSI system to identify single bitter almond kernels in commercial sweet almond batches. For this purpose, sweet and bitter almond batches, as well as different mixtures, were analysed in bulk using an HSI system which works in the spectral range 946.6–1648.0 nm. Qualitative models were developed using Partial Least Squares-Discriminant Analysis (PLS-DA) to differentiate between sweet and bitter almonds, obtaining a classiﬁcation success of over the 99%. Furthermore, data reduction, as a function of the most relevant wavelengths (VIP scores), was applied to evaluate its performance. Then, the pixel-by-pixel validation of the mixtures was carried out, identifying correctly between 61–85% of the adulterations, depending on the group of mixtures and the cultivar analysed. The results conﬁrm that HSI, without VIP scores data reduction, can be considered a promising approach for classifying the bitterness of almonds analysed in bulk, enabling identifying individual bitter almonds inside sweet almond batches. However, a more complex mathematical analysis is necessary before its implementation in the processing lines.


Introduction
Almonds are one of the most important tree nuts, containing high amounts of vegetable proteins and fat-mainly monounsaturated and polyunsaturated fatty acids, as well as other nutrients such as dietary fibre, vitamins, minerals and other bioactive constituents [1][2][3]. These fruits have been included in guidelines of healthy eating due to their nutritional quality and the evidence of health benefits associated with their consumption have, in turn, increased their commercialisation. Nowadays, almonds have become the major nut consumed in the world, accounting nearly for one-third of all the tree nuts produced worldwide, being the USA, Spain and Australia the three main exporters [4].
The most common fraud that can be found in the sweet almond industry is the presence of bitter almonds in commercial batches. This presence not only causes unpleasant flavours but also problems in commercialisation; it is necessary to eradicate the bitter almonds from these commercial batches [5]. Moreover, bitter almonds are related to toxicity for the consumers due to their content in the cyanogenic compound amygdalin [6,7]. Therefore, it is a key aspect for the industry to ensure the integrity of this product along the entire supply chain. Currently, high-performance liquid chromatography (HPLC) is the analytical technique used to measure the amygdalin content in almonds. This technique is complex, destructive, highly expensive and time-consuming. It, therefore, does not allow real-time responses to be obtained for the almond processing industries.
Different studies in the literature have demonstrated the potential of Near Infrared Spectroscopy (NIRS) to assure the integrity of sweet almond batches by identifying the presence of bitter kernels in them. In this sense, Torres et al. [8] detected the presence of bitter almonds in commercial batches of sweet almonds using two portable NIRS devices, and Vega-Castellote et al. [9] used NIRS technology as a non-targeted control method to detect non-compliant or adulterated sweet almond batches. However, although hyperspectral imaging (HSI) has been proved to be suitable for quality estimation and fraud detection in the almond sector [10][11][12][13][14], there are no published articles related to the use of this technique for the detection of bitter almonds in commercial sweet almond batches. HSI presents a fundamental advantage for this purpose as it acquires both spatial and spectral information of each sample, combining them to provide a unique fingerprint for each pixel of the image [15], which is a key feature in dealing with the food heterogeneity. Thus, while NIRS technology provides a characteristic average spectrum of each sample or batch, HSI provides the spectrum of each individual pixel of the image. It would enable us to know the spatial distribution of the different physicochemical characteristics, favouring the individual identification of the different compounds in the sample. Because of that, HSI could be an optimal alternative analytical method as it not only enables to identify each individual bitter almond kernel but also because it allows to scan the greater surface of the sample, obtaining more representative information and, therefore, reducing the sampling error. This application is of great interest for detecting bitter almonds in sweet almond batches as the only presence of one bitter kernel is a problem for the industry.
Then, this study aimed to study the feasibility of using a HSI system to identify single bitter almond kernels in commercial sweet almond batches, including bitter almonds in different percentages (from 5 to 20%). Furthermore, to reduce the amount of data, the selection of the most relevant wavelengths for the discrimination between sweet and bitter almond kernels was carried out.
From these samples, mixtures with different percentages of bitter almonds were prepared, using for this purpose sweet and bitter almond samples randomly selected. Each mixture had a final weight of 500 g and four types of mixtures were prepared, varying the proportion of bitter almonds from 5 to 20% in increments of 5% (M 5% , M 10% , M 15% and M 20% ). Finally, a total of 84 mixtures (N M5% = 21, N M10% = 21, N M15% = 21, N M20% = 21) were prepared.
On arrival at the laboratory, the almonds were immediately placed in dark, refrigerated storage. Prior to measurement, each sample was left to stabilise at the laboratory temperature of 20 • C. Furthermore, the amygdalin content of the sweet and bitter almond samples was determined by HPLC following the procedure described by Vega-Castellote et al. [7].

Hyperspectral Imaging Acquisition
Spectral images were acquired in reflectance mode using a laboratory-based pushbroom NIR system. The system was made up of a charge-coupled device (CCD) camera with a spatial resolution of 320 × 256 pixels (Xeva-FPA-1.7-320, Xenics, Leuven, Belgium), a C-mount objective lens (F1.4 25-mm compact lens, Schneider Optics, Hauppauge, NY, USA), a line scan imaging spectrograph (Specim ImSpector V10E, Oulu, Finland) which covers the range of 946.6 to 1648.0 nm with 3.3 nm spectral resolution. The illumination was provided by two lamps of 250 W located at a 45 • angle to the sample; and the line-by-line images were collected by a conveyor belt system (Velmex, Inc., Bloomfield, NY, USA) that moved the sample across the line field of view of the camera. To obtain the imaging of each sample, about 100 g of almonds were uniformly distributed on a black plastic plate of dimensions 12.5 × 17.5 cm. The number of lines was set at 450, with a distance between lines of 0.39 mm; a hypercube image with dimensions of 450 × 320 × 212 was obtained for each sample. However, due to the presence of noise, only the spectral range between 1006 and 1594 nm (179 spectral bands) was considered for data modelling.
Reflectance calibration was performed each hour by taking dark and white reference images. The dark reference was collected by covering the camera lens with a black cap and immediately after that, the white reference was collected using a 99% reflectance standard (Spectralon TM , SRS-99-10, Labsphere, Inc., North Sutton, NH, USA).

Hyperspectral Image Processing
Data analysis was performed using WinISI II software package version 1.50 (Infrasoft International LLC, Port Matilda, PA, USA) [16] and Matlab v. 2018a, equipped with the PLS and the Image Processing toolboxes (The Mathworks, Inc., Natick, MA, USA).
Initially, the reflectance value of each sample was calculated as the difference between the intensity of the sample and dark reference divided by the difference between the white and dark references [17].
After having corrected the images, the background of each sample was removed to extract the Region of Interest (ROI). For this aim, a binary mask image was generated; to achieve this, the difference between the images at wavelengths 1009.80 and 1541.60 nm was calculated. Then, the background was removed by using a simple threshold value of 0.55 to the resultant image, which was applied to each reflectance corrected image. Subsequently, the whole of the spectra extracted from the pixels non-identified as background were averaged to obtain a mean spectrum per sample, having a total of 158 spectra.

Study of the Variability and Development of Classification Models
Firstly, the structure and variability of the population were studied using the CEN-TER algorithm [18]. This algorithm performs a principal component analysis (PCA) and calculates the global Mahalanobis distance (GH) of each sample to the centre of the population in a new n-dimensional space, which enables the sorting of the samples according to their GH distance. This algorithm was applied using Standard Normal Variate (SNV) and De-trending (DT) as mathematical pre-treatments for scatter correction [19], together with the 1,5,5,1 Norris derivative treatment [20]. The CENTER algorithm was individually applied to the sets of sweet, bitter almonds and mixtures; those samples that displayed a GH > 4 were studied as potential outliers.
In the case of the sweet and bitter almonds datasets, after removing the spectral outliers, the structured selection of the training and validation sets was carried out following the procedure proposed by Shenk and Westerhaus [21]. The training set was made up of 70% of the samples and the remaining 30% constituted the validation set.
Partial least squares-discriminant analysis (PLS-DA) was used to develop classification models of almonds by bitterness, differentiating between sweet and bitter almonds. Venetian blinds for cross-validation (10 splits) were applied and a maximum number of 16 PLS terms was considered. Normalisation and detrend were used as spectral pre-processing treatments for scatter correction [19] and the first and second derivatives treatments were also tested.
The performance of the discriminant models was assessed in terms of the sensitivity, specificity and non-error rate (NER). To maximise the success of the models, the optimal threshold value obtained from the ROC curves was considered. To test the robustness of the model, it was externally validated using the validation set.
Next, global classification models were developed using the totality of the sweet and bitter samples available. For this purpose, the training and validation sets for the sweet and bitter almonds were merged and new models were developed following the procedure mentioned above. Once the best global classification model was selected, the total of mixtures (N = 84) was used to validate the model using the mean spectrum of each sample extracted from the ROI.
To determine which wavelengths were the main responsibles of the discrimination between sweet and bitter almonds, VIP scores obtained from the global PLS-DA model were used to select the most representative bands. Then, new models were developed to differentiate between sweet and bitter almonds, following the same procedure and using only the selected bands.

Pixel-by-Pixel Classification
Previous to the pixel-by-pixel validation, to carry out an exploratory analysis of the images, PCA was applied to the images of the mixtures obtained after having removed the background. This algorithm reduces spectral dimensionality by converting the huge amount of data from the hypercube into a limited set of scores and loadings. The PC images and the loading vectors for the first three principal components (PC1, PC2, PC3) were analysed for their interpretation. For this goal, the mean centre was performed as a pre-processing method [22].
Then, once the PCA was carried out and the global model for the classification between sweet and bitter almonds was developed, to simulate the actual situation in the industry, where the identification of each individual bitter almond included in the processed batches is necessary, the pixel-by-pixel prediction of the mixtures was carried out. To test whether the bitter almonds had been correctly identified, 2 almonds for each category (sweet and bitter) were used as a control in each mixture. Finally, the accuracy of the models was calculated as the percentage of correctly classified mixtures.

Spectral Analysis
Mean absorbance and second derivative spectra of the ROI extracted for the three groups of almonds (sweet, bitter and mixtures) analysed using HSI are shown in Figure 1, and as can be seen in this figure, the different groups of samples displayed similar spectral features. The main absorbance band region can be seen at around 1150-1200 nm, related to the second overtone of -CH stretching, associated with unsaturated fatty acids [23]. Furthermore, near to 1430 nm, it can be seen another important band, which corresponds to the first overtone of the -NH stretching [24].
Moreover, in the second derivative spectra (Figure 1b), it can also be appreciated another peak at around 1128 nm that might be attributed to C-H links of aromatic compounds [23]. Although the spectral pattern is similar for the three groups, the wavelengths that appeared to have the greatest weight to differentiate between the sweet and non-sweet almonds are those around 1128, 1200 and 1350-1400 nm.

Classification Models of Almonds by Bitterness
After applying the CENTER algorithm, one sweet sample was identified as a spectral outlier. Then, the training and validation sets for the sweet and bitter categories were selected, which were constituted by 117 samples (72 sweet and 45 bitter) and 40 samples (24 sweet and 16 bitter), respectively. Table 1 shows the results of the best PLS-DA models obtained to distinguish between sweet and bitter almonds analysed in bulk using the HSI system. The first derivativetogether with normalisation and detrend as scatter correction-was the math treatment that yielded the best results in cross-validation. These results were obtained considering a threshold value of 0.66 provided by the ROC curve, which maximises the specificity and sensitivity, enabling to obtain a higher NER value. Moreover, in the second derivative spectra (Figure 1b), it can also be appreciated another peak at around 1128 nm that might be attributed to C-H links of aromatic compounds [23]. Although the spectral pattern is similar for the three groups, the wavelengths that appeared to have the greatest weight to differentiate between the sweet and non-sweet almonds are those around 1128, 1200 and 1350-1400 nm.

Classification Models of Almonds by Bitterness
After applying the CENTER algorithm, one sweet sample was identified as a spectral outlier. Then, the training and validation sets for the sweet and bitter categories were selected, which were constituted by 117 samples (72 sweet and 45 bitter) and 40 samples (24 sweet and 16 bitter), respectively. Table 1 shows the results of the best PLS-DA models obtained to distinguish between sweet and bitter almonds analysed in bulk using the HSI system. The first derivativetogether with normalisation and detrend as scatter correction-was the math treatment that yielded the best results in cross-validation. These results were obtained considering a threshold value of 0.66 provided by the ROC curve, which maximises the specificity and sensitivity, enabling to obtain a higher NER value.   As shown in Table 1, the 99% (71/72) of the samples of sweet almonds were correctly classified in cross-validation, whereas for the bitter category, the percentage of correctly classified samples was 100% (45/45). As regards the external validation, for both categories, 100% of the samples were correctly classified.
Once it was demonstrated the viability of using the HSI technology to discriminate almonds according to their bitterness. In order to increase the variability covered by the model, new global models were developed by merging the training and validation sets. The best classification model was obtained using the first derivative math treatment and a threshold value of 0.64 and the classification success obtained was 99.4% (99% in the sweet category and 100% in the bitter category).
Although there are no published articles based on the use of the HSI technology for the classification of almonds by bitterness, similar works have been carried out using NIR spectroscopy. Cortes et al. [5], analysing individual kernels, and Vega-Castellote et al. [7], analysing samples in bulk, used different NIRS instruments and reported similar results to those obtained in this work with NER of 100% for both categories in external validation.

Identification of Bitter Almonds in Adulterated Sweet Almond Batches
The results obtained demonstrate the feasibility of using HSI for the discrimination between sweet and bitter almond batches. However, it is important to test the applicability of this technology in the actual situation in the industry, where the aim is not to differentiate between batches of sweet and bitter almonds but to detect the presence of random individual bitter almonds in the processed sweet batches.
To identify the presence of bitter almonds in sweet almond batches, the first approach was carried out the validation of the mixtures using for this aim the mean spectrum obtained from each ROI [Data not shown]. The predictive capability obtained using this validation strategy was not enough high as only the 40.5% (34/84) of the mixtures (6 M 5% , 13 M 10% , 8 M 15% and 7 M 20% ) were detected. It makes sense as, by using the average spectra, some information might be lost due to the heterogeneity of the sample, which is reflected in the individual pixels but not in the mean spectrum calculated.
Although it could be expected that the M 20% group had had the highest percentage of samples identified as adulterated, it is important to note that, from the total amount of mixture (500 g), only a fraction of approximately 100 g was analysed. Therefore, the percentage of bitter almonds present in the portion analysed may not be equivalent to the percentage included in the whole mixture, even when the sample was homogenised. It highlights the importance of the sampling procedure to acquire representative information of the sample that is going to be analysed [25]. Therefore, to avoid that the sampling process could affect the results obtained and taking into account that the main advantage of the HSI technology is the proportion of information provided from each pixel, it was proceeded to analyse each of the mixtures, not only at a spectral level but also at a spatial level.
Firstly, PCA was developed to study the structure of the population. The first three PCs, which accounted for 99.97% of the explained variance, were analysed. Figure 2 shows the scores and the loading plot corresponding to the PC2 (0.21%) as it exhibits the greatest distinction between sweet and bitter almonds that are present in the mixture. The remaining PC images represent other features, but they are not useful for identifying bitter almonds. The image corresponding to PC1 stands out as the almonds with respect to the background and PC3 highlights the features related to the shape and size of the kernels. The graphical representation of the loadings corresponding to the PC2 was analysed to study the dominant wavelengths, which appear to have higher discrimination power. As can be seen in Figure 2b, again, the most prominent peaks are shown at around 1200 The graphical representation of the loadings corresponding to the PC2 was analysed to study the dominant wavelengths, which appear to have higher discrimination power. As can be seen in Figure 2b, again, the most prominent peaks are shown at around 1200 nm, associated with lipids (C-H second overtone) and at 1400 nm, which corresponds not only to the first overtone of the O-H functional groups but also to the R-C≡N stretching [23].
Once it was studied the structure of the population and the most influential bands in the differentiation between sweet and bitter almonds to identify each individual bitter almond present in the sweet almond batches, the pixel-by-pixel validation-or mapping -of the mixtures was carried out. Figure 3 shows four mixtures-one of each percentage of bitter almonds-used for the validation to visualise the classification of each kernel. Some of the mixtures did not have the control almonds to test their correct classification, so they were not validated; therefore, 75 samples (17 M 5  The number of mixtures in which the bitter almonds were correctly identified wa (75%). The detailed study of the bitter almond identification success for each of the gro M5%, M10%, M15% and M20% ranged between 61-85%. In the case of the mixtures, M5%, 70 (12/17) of the samples were correctly classified. Three of the five misclassified sam were prepared with Belona cultivar almonds; this coincides with the results reporte previous NIR works by Torres et al. [8] and Vega-Castellote et al. [9], who found gre difficulty in identifying as non-compliant products those mixtures prepared using Bel possibly due to the shape of these kernels. In other mixture, it was detected that it The number of mixtures in which the bitter almonds were correctly identified was 56 (75%). The detailed study of the bitter almond identification success for each of the groups M 5% , M 10% , M 15% and M 20% ranged between 61-85%. In the case of the mixtures, M 5% , 70.6% (12/17) of the samples were correctly classified. Three of the five misclassified samples were prepared with Belona cultivar almonds; this coincides with the results reported in previous NIR works by Torres et al. [8] and Vega-Castellote et al. [9], who found greater difficulty in identifying as non-compliant products those mixtures prepared using Belona, possibly due to the shape of these kernels. In other mixture, it was detected that it had been prepared with a sweet sample from the Vairon cultivar whose amygdalin content was 0.13 mg/g, higher than the range of this parameter for the remaining samples of this cultivar (0.03-0.09 mg/g); then, it could be considered that this sample was underrepresented in the training set. Finally, in the case of the other misclassified M 5 % sample, it had been prepared using a bitter sample with a low amygdalin content (0.07 mg/g).
With regard to the mixtures, M 10% , 85% (17/20) of these samples were correctly classified. The samples wrongly classified corresponded to mixtures prepared with samples belonging to Belona (n = 1) and Laureanne (N = 2) cultivars. Furthermore, one of these samples had been prepared with the previously mentioned bitter sample, whose amygdalin content was low. Related to the mixtures M 15% , the 80% (16/20) of the samples were correctly classified, corresponding again to three of the four samples wrongly classified to mixtures prepared using Belona and Laureanne.
Finally, in the case of the M 20% , the percentage of correctly classified samples was lower, about 61% (11/18). As in the previous cases, the seven misclassified samples included mixtures that included sweet samples from the Belona (N = 2) and Laureanne (N = 2) cultivars, which would confirm the greater complexity for the identification of bitter samples in batches of sweet almonds of these cultivars. Another of the misclassified samples had been prepared using as sweet samples almonds of Guara cultivar, which is considered a slightly bitter cultivar according to its genotype [26].
These results obtained are of great interest to the almond processing industry as they enable to identify the presence of each individual bitter almond mixed with sweet almonds, so it would not be necessary to eliminate the total processed batch. It is also important to note that as individual validation of each pixel in the image is being carried out, the percentage of correctly classified samples is not influenced by the proportion of bitter almonds included in the mixture, but there are other key factors, such as the characteristics associated with each cultivar and the representativeness of each cultivar within the training set. In this sense, due to the great heterogeneity that almond samples present, the model would require periodic readjustments with samples from new seasons, different cultivars, or other changes in environmental conditions that affect the almonds' characteristics.
Once the viability of the pixel-by-pixel validation was demonstrated, it was tested the object-wise segmentation approach [Data not shown] in order to simulate the procedure required for the implementation of this system in the processing lines. For this purpose, the kernels were separated by applying different masks, and each individual kernel was identified as sweet or bitter depending on the class of most of its pixels. Although the results obtained allowed the identification of bitter almonds, more studies and the refining of the methodology is required for the improvement of this approach before its recommendation and adoption. Thus, in those cases where the number of kernels in contact or the overlap between them was high, the separation between the almonds was not successful, and this affected the accuracy of the method. These results agreed with those reported by Barbedo et al. [27], who developed an algorithm based on morphological mathematical operations able to discriminate between clusters composed of up to three kernels of wheat. These authors arranged the wheat grains in an orderly manner, having a minimal contact surface between them. Obviously, for their implementation in the processing lines, this is not a realistic situation for the almond industry.
It should be noted that the present work is only the first approach to study the feasibility of using the HIS technology for the identification of bitter almonds within processed batches, and further research is in progress. Although the validation carried out in this work allows a fairly accurate identification of individual bitter almonds, in the almond industry, large flows of the product are processed, so more complex mathematical approaches are required previously to its real implementation in the processing lines.

Data Reduction
To incorporate HSI technology in the industrial almond processing lines, it is important to select the most relevant wavelengths as it reduces the computational time. Therefore, the wavelengths of the greatest importance for classifying almonds according to their bitterness were selected using the VIP scores obtained from the global model (Table 1). Figure 4 shows the graphical representation of the VIP scores together with the most relevant wavelengths.  As can be seen in the figure, the most relevant wavelengths for the discrimination between sweet and bitter almonds are at around 1112-1161, 1191, 1395 and 1428 nm. Table 2 shows the cross-validation results for the best classification model obtained using the wavelength range selected. The new classification models showed similar results to those obtained using the complete spectral range, with a classification success of 99 and 100% for the categories sweet and bitter, respectively. This is of great interest because although the number of wavelengths has been reduced by 86% (from 179 bands to 25), the classification capability of the model was not affected.
Next, pixel-by-pixel validation of the mixtures was carried out to test the robustness of the model [Data not shown]. In this case, the new model obtained did not enable the correct identification of the bitter almonds present in the mixtures, indicating that the selection of wavelengths is not feasible for this specific objective. It is important to note that each spectrum is a detailed profile or unique fingerprint, so reducing the number of bands could eliminate essential information.

Conclusions
The results obtained demonstrated the feasibility of using HSI as a non-destructive technology to classify the bitterness of almonds analysed in bulk. The potential application of HSI technology at a single-pixel level would enable us to identify each VIP Scores As can be seen in the figure, the most relevant wavelengths for the discrimination between sweet and bitter almonds are at around 1112-1161, 1191, 1395 and 1428 nm. Table 2 shows the cross-validation results for the best classification model obtained using the wavelength range selected. The new classification models showed similar results to those obtained using the complete spectral range, with a classification success of 99 and 100% for the categories sweet and bitter, respectively. This is of great interest because although the number of wavelengths has been reduced by 86% (from 179 bands to 25), the classification capability of the model was not affected.
Next, pixel-by-pixel validation of the mixtures was carried out to test the robustness of the model [Data not shown]. In this case, the new model obtained did not enable the correct identification of the bitter almonds present in the mixtures, indicating that the selection of wavelengths is not feasible for this specific objective. It is important to note that each spectrum is a detailed profile or unique fingerprint, so reducing the number of bands could eliminate essential information.

Conclusions
The results obtained demonstrated the feasibility of using HSI as a non-destructive technology to classify the bitterness of almonds analysed in bulk. The potential application of HSI technology at a single-pixel level would enable us to identify each individual bitter almond present in the sweet almond batches. This approach could be extremely useful at an industrial level as it could eliminate each bitter almond included in the processed sweet product, reducing the cost and increasing the commercial value. However, further studies are needed, in particular, related to the use of other mathematical approaches to achieve the object-wise segmentation that enables the isolation of the bitter fruits.
It must also be considered that the percentage of correctly classified samples is influenced by key factors such as the characteristics associated with each cultivar (genotype, amygdalin content and physical features) and the representativeness of each cultivar within the training set.
Based on the results obtained, data selection using VIP scores enables to differentiate between sweet and bitter almond batches, but it is not an appropriate method to identify single bitter almonds in mixtures.