Comparison of Various Signal Processing Techniques and Spectral Regions for the Direct Determination of Syrup Adulterants in Honey Using Fourier Transform Infrared Spectroscopy and Chemometrics

Honey consumption has become increasingly popular worldwide. However, the increase in demand for honey has also caused an increase in its adulteration, a deliberate fraud which involves adding of other substances to pure honey for economic purposes. This process not only lowers the quality of honey, but also has potential health risks, including high blood sugar, increased risk of diabetes, and weight gain. Herein, we develop an easy-to-use and direct method of quantifying corn, cane, beet, and rice syrup adulterants in honey using Fourier transform infrared spectroscopy and chemometrics. Various signal processing techniques, including derivatives, moving average, binning, Savitzky–Golay, and standard normal variate using the entire spectral region (3996–650 cm−1) and specific spectral region (1501–799 cm−1), were compared. Optimum results were obtained using first derivative signal processing for both the entire and specific spectral regions. The first derivative signal processing technique garnered the most optimum results using the specific spectral range (1501–799 cm−1) (RMSECVaverage = 0.021, RMSEPaverage = 0.014, R2average = 0.859) across all syrup adulterants. An exploratory analysis to assess the utility of this specific spectral region in pattern recognition of samples based on their adulterant content show that this region is effective in discriminating samples according to the presence or absence of honey syrup adulterants.


Introduction
Honey is a thick, sugary, concentrated nectar that is made up of approximately 18% water. Due to its sweet and sticky texture, honey has found itself as a staple ingredient in many kitchens. Aside from its culinary applications, honey also has several medical applications, e.g., it has both anti-aging and anti-bacterial properties and can also be used as a remedy for sore throat. As the human population grows, the consumption of honey grows as well. However, the assessment of honey safety and quality is not regularly monitored. Further, due to its growing demand, adulteration has also become more common. In the case of honey, adulteration is the addition of any substance to the pure honey. Adulterants most frequently added to honey include syrups of corn, cane, beet, and rice, which have economic and organoleptic consequences. In addition, adulterated honey poses risks to consumer health, such as higher blood sugar and weight gain. Due to the decline in honey quality, consumers have perceived low confidence in the nutritional value this product brings, making its marketing to the public more challenging [1]. Honey evaluation would therefore not only help to improve the quality of honey, but also assist in improving its appeal to the public.

Materials and Methods
A training set of adulterated honey samples (n = 81) containing various levels of corn, cane, beet, and rice syrups was created using a full factorial design (Table S1). A full factorial design creates experimental points using all the possible combinations of the levels of the factors. Thus, for four factors (i.e., components) having three levels of each factor considering a factorial design, a total of 3 4 = 81 numbers of experiments were carried out. Similarly, an independent test set consisting of adulterated honey samples (n = 32) was created using a central composite design of experiments (Table S2). The DoE.base and rsm packages under the R Program were used to create the full factorial and central composite designs, respectively [14][15][16]. Pure Manuka honey was used as the base matrix and each sample syrup was weighed separately, then added to the pure honey and mixed thoroughly. A drop of each sample mixture was analyzed using a Bruker Tensor 27 ATR-FTIR with ZnSe crystal. Background spectra were collected using air as a blank and data collection was performed at every 2 cm −1 resolution. Each sample was analyzed using 40 scans. The ATR crystal was carefully cleaned between analysis using ethanol and then allowed to air dry. The cleaned crystal was then checked spectrally prior to each analysis to ensure that no residue from previous sample analysis was retained. All spectral analyses were collected in triplicates and the average of the results was used for both the calibration and testing sets. A PLS predictive model was built using data from the full-factorial training set using the 'pls' package in R [17]. The ultimate goal of PLS is to develop predictive models that will utilize the ATR-FTIR absorbance spectra to simultaneously predict the concentrations of corn, cane, beet, and rice syrups without any need for analytical separations. It is a powerful multivariate statistical method with a wide multitude of successes in many areas [18][19][20][21]. Mathematically, PLS involves the decomposition of A (absorbance) and C (concentration) as follows: where B and D are the n x d score matrices; P are the p x d loadings of the A matrix; E is the n x p error (residual) of A matrix; Q is the m x d loadings of the C matrix; and F is the n x m error (residual) of the C matrix. Computation of the B-coefficients is then given by: where W is a d × p matrix of PLS weights [22]. The central composite design testing set was developed as an independent dataset to test the accuracy of the predictive model created from the training data. All PLS analyses were performed using the R package 'pls' under RStudio [16]. Throughout the course of the PLS analysis, the PLS1 algorithm was utilized. PLS1 performs the optimization of the number of factors for only one component at a time [23]. Various signal processing techniques including Savitzky-Golay, binning, movingaverage, first derivative, second derivative, and SNV were used to enhance the quality of the data by noise reduction, through the 'ProspectR' package [24].
Savitzky-Golay is a preprocessing technique that enhances signal properties, resolves overlapping signals, and suppresses unwanted spectral features arising due to nonideal instrument and sample properties [25]. It fits a local polynomial regression on the signal and requires equidistant bandwidth. Mathematically, it operates as a weighted sum of neighboring values [26].
where x * j is the new value, N is a normalizing coefficient, k is the gap size on each side of j, and c h are pre-computed coefficients that depend on the chosen polynomial order and degree [26][27][28]. A differentiation order of 1, polynomial order of 2, and a window size of 5 were used in all Savitzky-Golay analyses.
Binning is another preprocessing technique that averages a signal in column bins [26]. A top-down splitting technique, it is based on a specified number of bins and is primarily used for data smoothing [29]. Binning smooths a sorted data value by consulting neighboring values around it. The sorted values are then distributed into a number of "buckets" [30].
Moving-average, another robust signal processing technique performs a column-wise operation by averaging contiguous wavelengths within a given window size [26]. A filter length of 1 was used for the moving-average. First and second derivatives, on the other hand, were computed using the finite difference technique. In this method, the difference between subsequent data points was calculated provided that the band width is constant according to: where x i and x i are the new data points, and x i , x i−1 , and x i+1 are the subsequent data points [26].
SNV is another signal processing technique that normalizes spectra by correcting for light scattering through row-wise operation. Mathematically, it is given by: where x i is the value of variable i, is the average of the variable i, and s i is the standard deviation [26].
After the training set was subjected to signal processing, PLS chemometric analysis was performed. The generated model from the training set (i.e., after cross-validation) was used to assess model performance in the validation data by predicting the sugar concentration (% w w ) in the test set. Root mean square error (RMSE) was calculated to determine the degree to which the predicted concentrations of adulterants in the samples deviate from their actual concentrations in both the training and testing sets using the formula: where y and y' are the predicted and actual concentrations, respectively, and N is the number of samples [31]. The root mean square error of cross validation (RMSECV) and root mean square error of prediction (RMSEP) were both calculated for the training and testing datasets, respectively. Further, to determine the model's goodness-of-fit between the measured and predicted adulterant concentrations, R 2 metrics were also calculated. Lastly, PCA was then used to determine pattern recognition among samples in the specific spectral region harboring unique spectral peaks for the adulterants of interest. If a pattern was seen, then hierarchical cluster analysis was used to group together data points using the 'ggbiplot' R package [28]. In cluster analysis, the process starts with one piece of data and combines groups based on distances from one another in the principal component space [22]. The cluster analysis in this study was agglomerative hierarchical with Ward's method used for the distances. PCA analysis was performed using the 'factoextra' R package [32,33].

Results and Discussion
Subtle spectral differences were evident in the fingerprint region (1501-500 cm −1 ) in both the training and testing sets due to variations in the concentrations of the corn, cane, beet, and rice syrup adulterants (Figures 1 and 2). These subtle spectral variations are critical aspects that allow discrimination of each respective syrup adulterants [34][35][36]. Beet syrup shows strong and unique absorbance bands in the 1101 cm −1 , 1053 cm −1 , 990 cm −1 , and 920 cm −1 regions. Cane syrup, on the other hand, has similar absorbance bands to beet syrup. Corn syrup has absorbance bands at 1147 cm −1 , 1101 cm −1 , 1074 cm −1 , 1015 cm −1 , 991 cm −1 , and 920 cm −1 [37,38]. Rice syrup has similar absorbance bands to cane syrup except for the absence of the prominent band at 991 cm −1 [38]. Meanwhile, pure Manuka honey offers unique absorbance bands, particularly in the regions 1053 cm −1 , 1028 cm −1 , and 946 cm −1 (Figure 3). In general, the region 1501-800 cm −1 corresponds to the absorption zones of the three major sugar constituents of honey, namely glucose, fructose, and sucrose. Particularly, the region 900-750 cm −1 corresponds to the anomeric region and is characteristic of the saccharide configurations [39]. The C-O and C-C stretching modes, on the other hand, are assigned in the bands 1153-904 cm −1 , while the O-C-H, C-C-H, and C-O-H angles due to the bending modes are assigned in the 1474-1199 cm −1 absorption bands [40]. Water has an OH stretching peak that is intense, broad, and falls at 3300 cm −1 [41]. of the saccharide configurations [39]. The C-O and C-C stretching modes, on the other hand, are assigned in the bands 1153-904 cm −1 , while the O-C-H, C-C-H, and C-O-H angles due to the bending modes are assigned in the 1474-1199 cm −1 absorption bands [40]. Water has an OH stretching peak that is intense, broad, and falls at ~3300 cm −1 [41].   of the saccharide configurations [39]. The C-O and C-C stretching modes, on the other hand, are assigned in the bands 1153-904 cm −1 , while the O-C-H, C-C-H, and C-O-H angles due to the bending modes are assigned in the 1474-1199 cm −1 absorption bands [40]. Water has an OH stretching peak that is intense, broad, and falls at ~3300 cm −1 [41].   In this study, we compared the performance of various signal preprocessing techniques using the entire spectral region (3996-650 cm −1 ) and the specific spectral region (1501-799 cm −1 ). We first attempted to perform a direct PLS analysis using the entire spectral region (3996-650 cm −1 ) without any prior signal preprocessing technique. Our results for the training set for this particular region garnered RMSECV values of 0.030, 0.015, 0.019, and 0.031 for the corn, cane, beet, and rice syrups, respectively (RMSECV average = 0.024) ( Table 1). The corresponding R 2 values garnered an R 2 average = 0.824 across these four syrup adulterants (Table 2). Further, using these developed calibration models for the individual syrups, RMSEP values of 0.022, 0.027, 0.019, and 0.034 for corn, cane, beet, and rice syrups, respectively, were obtained for the test set (RMSEP average = 0.026) ( Table 3).

Analysis of the Entire Spectral Region (399-650 cm −1 )
In this study, we compared the performance of various signal preprocessing techniques using the entire spectral region (3996-650 cm −1 ) and the specific spectral region (1501-799 cm −1 ). We first attempted to perform a direct PLS analysis using the entire spectral region (3996-650 cm −1 ) without any prior signal preprocessing technique. Our results for the training set for this particular region garnered RMSECV values of 0.030, 0.015, 0.019, and 0.031 for the corn, cane, beet, and rice syrups, respectively (RMSECV = 0.024) ( Table 1). The corresponding R 2 values garnered an R = 0.824 across these four syrup adulterants (Table 2). Further, using these developed calibration models for the individual syrups, RMSEP values of 0.022, 0.027, 0.019, and 0.034 for corn, cane, beet, and rice syrups, respectively, were obtained for the test set (RMSEP = 0.026) ( Table   3). After analysis of the entire spectral region (3996-650 cm −1 ) without any signal processing technique, further PLS modeling and signal processing techniques were performed on the entire spectral region (3996-650 cm −1 ) which yielded slightly better results than the direct PLS analysis where no signal processing was performed. First derivative and second derivative analyses were implemented in the entire spectral region (3996-650 cm −1 ). Optimum RMSECV results for the entire spectral region (3996-650 cm −1 ) were achieved using the second derivative (RMSECV average = 0.015) and the first derivative (RMSECV average = 0.020) tests across the four adulterant syrups ( Table 1). The first derivative model also attained optimum results for the RMSEP (RMSEP average = 0.017) ( Table 2). The corresponding R 2 values garnered the best results in both the second derivative (R 2 average = 0.932) and the first derivative (R 2 average = 0.880) tests across the four adulterant syrups (Tables 2 and 3). Binning and moving average techniques followed by PLS were also performed across the entire spectral region (3996-650 cm −1 ) and garnered identical results using the RMSECV, RMSEP, and R 2 statistical parameters across the four syrup adulterant syrups (Tables 1-3). Several bin sizes were tested in order to enhance the performance of the binning technique, with the optimum result obtained at a bin size = 2. We then implemented a SNV smoothing prior to PLS modeling across the entire spectral region (3996-650 cm −1 ) and the results did not improve over that of the moving average and binning techniques RMSECV average = 0.027, RMSEP average = 0.019, R 2 average = 0.754 across the four adulterant syrups (Tables 1-3). Analysis using Savitzky-Golay smoothing for the entire spectral region (3996-650 cm −1 ) garnered similar results to that of the SNV signal preprocessing technique (RMSECV average = 0.028, RMSEP average = 0.018, R 2 average = 0.732) across the four adulterant syrups (Tables 1-3).
Comparing the average values of the RMSECV and RMSEP results across the entire spectral region (3996-650 cm −1 ), as well as the R 2 values, the first derivative model provided the optimum results. The average from the RMSECV (0.020) to the RMSEP (0.017) decreased and the R 2 (0.880) garnered the second highest value across the four adulterant syrups (Tables 1-3).

Analysis of the Specific Spectral Region (1501-799 cm −1 )
We compared the results obtained above with those obtained by performing signal preprocessing and PLS analysis of selected spectral regions (1501-799 cm −1 ) harboring the spectral regions of interest present in the syrup adulterants. The RMSECV results garnered comparable values among the different signal processing techniques (Table 1). Therefore, when choosing the best model, we examined the results and compared the RMSECV to the RMSEP values (Tables 1 and 2). In a comparison of the RMSECV (Table 1) and RMSEP ( Table 2) values, in addition to the R 2 values (Table 3), using various signal processing techniques across the specific spectral region (1501-799 cm −1 ), the first derivative test garnered optimum results (RMSECV average = 0.021, RMSEP average = 0.014, R 2 average = 0.859) (Tables 1-3). The average values across the four adulterant syrups from the RMSECV (0.021) to the RMSEP (0.014) has decreased, and the R 2 value (0.859) indicates a good fit between the measured and predicted values (% w w ) for the respective syrup adulterants (Tables 1-3). The determination of which region garnered the best result was made by comparison of all results obtained using various signal preprocessing techniques (Tables 1-3). Optimum results were achieved by setting a specific spectral region for various preprocessing techniques (Tables 1-3). Various preprocessing techniques, including second derivative, moving average, binning, Savitzky-Golay, and SNV, as applied to the specific spectral region (1501-799 cm −1 ) improved the results for most RMSECV, RMSEP, and R 2 parameters across the four syrup adulterants with the first derivative technique garnering the most optimum results (RMSECV average = 0.021, RMSEP average = 0.014, R 2 average = 0.859) (Tables 1-3).
Using first derivative signal processing technique, close examination of the RMSECV as a function of the number of components showa that four, four, five, and four components are sufficient to have low errors of cross-validation (i.e., RMSECV) in the calibration set for the syrup adulterants of corn, cane, beet, and rice, respectively (Figure 4).
Using the first derivative signal processing techniques and the aforementioned numbers of components (Figure 4), the results of our study show that a good linearity is obtained between the predicted and measured concentrations (% w w ) of adulterants in the training and testing sets ( Figure 5).

Exploratory Analysis of Syrup Adulterants and Honey Samples
In an attempt to determine any pattern formed among our adulterated honey samples (i.e., calibration and testing sets), pure honey samples (i.e., Standard Manuka honey), as well as selected adulterants, we performed PCA and cluster analysis using absorbance data points from a specific spectral region (1501-799 cm −1 ) harboring sugars of interest (i.e., glucose, fructose, and sucrose; commonly found in syrups and honeys used in this study) ( Figure 6). Our analysis was able to discriminate samples according to the presence or absence of syrup adulterants. For example, cluster 1 (i.e., samples 119-121, 123-124) shows standard pure Manuka honey samples (i.e., unadulterated honey samples). Within cluster 1, there exists the presence of sample 73, a full factorial design training set sample consisting of only pure unadulterated Manuka honey (i.e., absence of any corn, cane, beet, and rice syrups) ( Figure 6) (c.f. Table S3).
Cluster 7 (i.e., samples 116-118, 122) grouped samples belonging to pure syrup adulterants. Specifically, sample 116 is a pure corn syrup, samples 117 and 122 are pure rice syrups, while sample 118 is a pure cane syrup. Of note are samples 114 and 115, belonging to pure beet and another pure cane syrup. These samples are clearly out of any typical clustering and their ingredients may warrant further investigation to explain this outcome ( Figure 6) (c.f. Table S3).
Cluster 6 shows samples containing very low amounts of any of the adulterants. Specifically, most of the samples within this cluster contain zero percent adulterant in either one to three of the component syrup adulterants. Cluster 4, on the other hand, shows samples containing also low amounts of the syrup adulterants. Generally, cluster 4 has a higher amount of corn syrup levels (% w w corn average = 6.28%) than cluster 6 (% w w corn average = 3.41%). Further, cluster 4 has also a higher amount of rice syrup levels (% w w rice average = 9.99%) than cluster 6 (% w w rice average = 6.23%). The levels of cane and beet syrups, on the other hand, are higher in cluster 6 (% w w cane average = 9.36%; % w w beet average = 6.11%) than in cluster 4 (% w w cane average = 5.37%; % w w beet average = 3.60%) (Figure 6) (c.f. Table S3). Per our PCA and cluster analyses of the specific spectral region, it can be concluded that those samples having similar properties were grouped together.
Comprehensive comparison and examination of the performance of various signal processing techniques using the entire spectral region (3996-650 cm −1 ) and specific spectral region (1501-799 cm −1 ) for the direct quantification of syrup adulterants in honey has never been explored in previous studies. Further, while previous studies were focused on the pattern recognition of honey syrup adulteration for one syrup (e.g., identification of rice adulterated honey vs. unadulterated honey), this study offers the advantage of simultaneously quantifying four syrup adulterants in honey using PLS, FTIR, and first derivative signal processing technique [42]. The first derivative signal processing technique garnered the most optimum results using the specific spectral region (RMSECV average = 0.021, RMSEP average = 0.014, R 2 average = 0.859) across all syrup adulterants as mentioned earlier.   (c) (d) Figure 5. (a) Predicted vs. measured corn syrup (% w/w) using the first derivative signal processing technique in the specific spectral region for both the training and test sets (R 2 = 0.8529), (b) Predicted vs. measured cane syrup (% w/w) using the first derivative signal processing technique in the specific spectral region for both the training and test sets (R 2 = 0.7716), (c) Predicted vs. measured beet syrup (% w/w) using the first derivative signal processing technique in the specific spectral region for both the training and test sets (R 2 = 0.8646), (d) Predicted vs. measured rice syrup (% w/w) using the first derivative signal processing technique in the specific spectral region for both the training and test sets (R 2 = 0.9437).

Exploratory Analysis of Syrup Adulterants and Honey Samples
In an attempt to determine any pattern formed among our adulterated honey samples (i.e., calibration and testing sets), pure honey samples (i.e., Standard Manuka honey), as well as selected adulterants, we performed PCA and cluster analysis using absorbance data points from a specific spectral region (1501-799 cm −1 ) harboring sugars of interest (i.e., glucose, fructose, and sucrose; commonly found in syrups and honeys used in this study) ( Figure 6). Our analysis was able to discriminate samples according to the presence or absence of syrup adulterants. For example, cluster 1 (i.e., samples 119-121, 123-124) shows standard pure Manuka honey samples (i.e., unadulterated honey samples). Within cluster 1, there exists the presence of sample 73, a full factorial design training set sample consisting of only pure unadulterated Manuka honey (i.e., absence of any corn, cane, beet, and rice syrups) ( Figure 6) (c.f. Table S3). Figure 5. (a) Predicted vs. measured corn syrup (% w/w) using the first derivative signal processing technique in the specific spectral region for both the training and test sets (R 2 = 0.8529), (b) Predicted vs. measured cane syrup (% w/w) using the first derivative signal processing technique in the specific spectral region for both the training and test sets (R 2 = 0.7716), (c) Predicted vs. measured beet syrup (% w/w) using the first derivative signal processing technique in the specific spectral region for both the training and test sets (R 2 = 0.8646), (d) Predicted vs. measured rice syrup (% w/w) using the first derivative signal processing technique in the specific spectral region for both the training and test sets (R 2 = 0.9437).
An exploratory analysis to assess the utility of this specific spectral region in the pattern recognition of samples based on their adulterant contents shows that this region is effective in discriminating samples according to the presence or absence of syrup adulterants. Results of this can be applied to determine the utility of the first derivative focused on the specific spectral region harboring the functional groups of interest for the direct determination of specific analytes. Cluster 7 (i.e., samples 116-118, 122) grouped samples belonging to pure syrup adulterants. Specifically, sample 116 is a pure corn syrup, samples 117 and 122 are pure rice syrups, while sample 118 is a pure cane syrup. Of note are samples 114 and 115, belonging to pure beet and another pure cane syrup. These samples are clearly out of any typical clustering and their ingredients may warrant further investigation to explain this outcome ( Figure 6) (c.f. Table S3).
Cluster 6 shows samples containing very low amounts of any of the adulterants. Specifically, most of the samples within this cluster contain zero percent adulterant in either one to three of the component syrup adulterants. Cluster 4, on the other hand, shows samples containing also low amounts of the syrup adulterants. Generally, cluster 4 has a higher amount of corn syrup levels ( % = 6.28%) than cluster 6 (% = 3.41%). Further, cluster 4 has also a higher amount of rice syrup levels (% = 9.99%) than cluster 6 (% = 6.23%). The levels of cane and beet syrups, on the other hand, are higher in cluster 6 ( % = 9.36%; % = 6.11%) than in cluster 4 ( % = 5.37%; % = 3.60%) (Figure 6) (c.f. Table S3). Per our PCA and cluster analyses of the specific spectral region, it can be concluded that those samples having similar properties were grouped together. Comprehensive comparison and examination of the performance of various signal processing techniques using the entire spectral region (3996-650 cm −1 ) and specific spectral region (1501-799 cm −1 ) for the direct quantification of syrup adulterants in honey has never been explored in previous studies. Further, while previous studies were focused on the pattern recognition of honey syrup adulteration for one syrup (e.g., identification of rice adulterated honey vs. unadulterated honey), this study offers the advantage of simultaneously quantifying four syrup adulterants in honey using PLS, FTIR, and first derivative signal processing technique [42]. The first derivative signal processing technique garnered the most optimum results using the specific spectral region (RMSECV = 0.021, RMSEP = 0.014, R = 0.859) across all syrup adulterants as mentioned earlier.

Conclusions
ATR-FTIR in conjunction with chemometric PLS analysis has proven to be useful in the development of a quick and facile method of quantifying corn, cane, beet, and rice syrup concentrations in honey. Specific spectral region (1501-799 cm −1 ) garnered optimal results, with lower RMSECV and RMSEP, while also having a higher R 2 value in comparison to the entire spectral region. A comparison of the RMSECV, RMSEP, and R 2 values of the specific spectral region revealed that first derivative signal processing yielded optimal results. Using the same region, a PCA analysis also allowed for the discrimination of various syrup adulterants, honey, and adulterated honeys. The study can provide a direction as to the utility of the aforementioned region and first derivative signal processing technique for both the quantitative and qualitative identification of syrup adulterants in honey.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/chemosensors10020051/s1, Table S1: Full factorial design (n = 81) of quaternary mixtures of corn, cane, beet, and rice syrup adulterants using Manuka honey as the standard matrices used in the training set, Table S2: Central composite design (n = 32) of quaternary mixtures of corn, cane, beet, and rice syrup adulterants using Manuka honey as the standard matrices used in the testing set, Table S3