Smart Detection of Faults in Beers Using Near-Infrared Spectroscopy, a Low-Cost Electronic Nose and Artificial Intelligence

Early detection of beer faults is an important assessment in the brewing process to secure a high-quality product and consumer acceptability. This study proposed an integrated AI system for smart detection of beer faults based on the comparison of near-infrared spectroscopy (NIR) and a newly developed electronic nose (e-nose) using machine learning modelling. For these purposes, a commercial larger beer was used as a base prototype, which was spiked with 18 common beer faults plus the control aroma. The 19 aroma profiles were used as targets for classification machine learning (ML) modelling. Four different ML models were developed; Models 1 (M1) and M2 based on NIR (100 inputs from 1596–2396 nm) and M3 and M4 based on the e-nose (nine sensor readings as inputs) and 19 aroma profiles as targets for all models. A customized code tested 17 artificial neural network (ANN) algorithms automatically testing performance and neuron trimming. Results showed that the Bayesian regularization algorithm was the most adequate for classification rendering precisions of M1 = 98.9%, M2 = 98.3%, M3 = 96.8%, and M4 = 96.2% without statistical signs of underor overfitting. The proposed system can be added to robotic pourers and the brewing process at low cost, which can benefit craft and larger brewing companies.


Introduction
In commercial settings, the assessment of beer faults is mainly the responsibility of the head brewer. They are usually determined from simple aroma profile assessment, sensitivity sensory tests such as absolute, recognition, differential, and/or terminal threshold using a trained panel [1][2][3] or utilizing specialized instrumentation such as gas chromatographymass spectroscopy (GC-MS) [4]. Several types of faults (off-flavors/aromas) can develop in beers and with diverse origins and sensory perception thresholds, as shown in Table 1.
The drawbacks of common beer fault assessments are that they could be subjective. In the case of instrumentation or sensory sessions, they may require expensive equipment and special skills for usage, data handling, and analysis. Regarding sensory analysis, it requires a trained panel, which can also be cost-prohibitive and can assess only a few samples at any time to avoid increasing bias due to fatigue. The implementation of new and emerging technologies for beer analysis [14], such as artificial intelligence (AI) [15][16][17] using robotics [18], near-infrared spectroscopy (NIR) [19][20][21], integrated gas sensors or low-cost electronic noses (e-noses) [22][23][24], and machine learning [25] is gaining traction recently for research and practical application purposes. One of those applications is the early detection of beer faults using e-noses [26,27] or beer classification [28].
This research focuses on implementing NIR spectroscopy and a recently developed low-cost e-nose using machine learning to create an integrated system for the smart detection of faults in beer. The integrated system proposed can become a big aid to brewing companies for the early assessment of faults in different manufacturing and processing stages to secure a high-quality product. These techniques can also be implemented for commercialization and authentication purposes to secure the provenance and consistency of quality for different markets.

Samples Description
A commercial Asahi Super Dry lager beer (Asahi Breweries, Sumida City, Tokyo, Japan) with 5% alcohol in 500 mL cans was used as the base samples for this study. This beer was selected because lager beers are less hoppy and less complex in aromas than other styles such as ales and lambics, which can be used as a prototype for testing purposes. The samples were spiked with 18 different flavor/aroma faults (Siebel Institute, Chicago, IL, USA) that are commonly found in beer (Table 2). A total of 1 L of beer (two cans) was used for each fault and was spiked with two different concentrations, as shown in Table 2. Besides the spiked samples, 1 L of the original beer was measured as a control, as two batches of beer were used, the control was taken as two samples (one per batch). All samples were measured in triplicates (replicates = 3), giving a total of n = 36 per concentration (n = 108) for the spiked samples, plus n = 6 control samples (n = 3 control per batch). Hence, giving a total of n = 114.

Near-Infrared Measurements
All samples were measured in triplicate for each of the three replicates (n = 9) using a handheld near-infrared (NIR) spectroscopy device (MicroPHAZIR™ RX Analyzer; Thermo Fisher Scientific, Waltham, MA, USA). This is able to measure the chemical fingerprinting within the 1596-2396 nm range. Samples were measured using the method described by Gonzalez Viejo et al. [19], which consists of using a filter paper Whatman ® Grade 3 (Whatman plc., Maidstone, UK). The filter paper was first measured dry and then soaked in the sample using the white reference as background to avoid any noise from the environment. Then the absorbance values from the dry filter paper were subtracted from those with the sample to remove the paper components. For this study, both the raw absorbance values and the first derivative were used; the latter were obtained with the Savitzky-Golay method using the Unscrambler X ver. 10.3 (CAMO Software, Oslo, Norway) using the second polynomial order with the following smoothing parameters: number of left side points: 1, number of right-side points: 1, number of smoothing points: 3, and with symmetric kernel.

Electronic Nose Measurements
A portable and low-cost electronic nose (e-nose) was used to measure the volatile compounds found in the samples. As described by Gonzalez Viejo et al. [29], the electronic nose consists of an array of nine different gas sensors (i) MQ3: alcohol, (ii) MQ4: methane (CH 4 ), (iii) MQ7: carbon monoxide (CO), (iv) MQ8: hydrogen (H 2 ), (v) MQ135, (vi) MQ136, (vii) MQ137, (viii) MQ138 and ix) MG811: carbon dioxide (CO 2 ). To measure the samples, the full amount of beer was poured into a clean 2 L jar, and the e-nose was placed on the top to acquire the volatile compound readings; all measurements were carried out in triplicates. The outputs were then analyzed using a supervised automatic code written in Matlab ® R2020b (Mathworks, Inc., Natick, MA, USA). This code is able to automatically recognize features of curves (starting and end of stable signals) and create 10 subdivisions of each curve from the e-nose sensors from the stable signals to calculate 10 mean values [30]. This is done to increase variability of the data to further develop the ML models.

Alcohol and pH Measurements
Samples were analyzed for basic chemometrics in triplicates for pH and alcohol. The pH was measured in 50 mL samples of each replicate using a pH-meter (QM-1670, DigiTech, Sandy, UT, USA), the device was calibrated using a buffer solution (pH 7). Furthermore, an Alcolyzer Wine M with accuracy: <0.1% vv-1 (Anton Paar GmbH, Graz, Austria) was used to measure the alcohol content in 60 mL from each replicate.

Statistical Analysis and Machine Learning Modelling
The e-nose data were analyzed through ANOVA to assess significant differences (p < 0.05) among the samples with a Tukey honest significant difference (HSD) post hoc test (α = 0.05) using XLSTAT 2020.3.1 (Addinsoft, New York, NY, USA). Furthermore, a code developed in Matlab ® R2021a was used to conduct a correlation analysis and plot it in a matrix to assess only the significant correlations (p < 0.05) between the e-nose sensor outputs and the different faults.
Six supervised classification machine learning (ML) models were developed using artificial neural networks (ANN). All models were constructed using a customized code developed by the Digital Agriculture Food and Wine group from the University of Melbourne (DAFW; UoM) in Matlab ® R2021a, which is able to test automatically 17 different ANN algorithms in a loop. The best models were selected based on the accuracy and performance, the Bayesian Regularization algorithm being the best for all four models.
The first two models were developed using the NIR absorbance raw values in the entire spectra (1596-2396 nm) (Model 1) and the 10 means (samples) from each sensor (inputs) of the e-nose outputs (Model 2) as inputs to classify the samples into (i) control, (ii) low concentration, and (iii) high concentration. Data were divided randomly as 70% of the samples used for training and 30% for testing. The performance was assessed from the means squared error (MSE), and a neuron trimming test was conducted to select the models with no under-or overfitting, being 10 the most optimal number for both models (Figure 1a).
For Models 3 and 4, the NIR absorbance raw values in the entire spectra (1596-2396 nm) were taken as inputs to predict the faults found in the sample for the low concentration (Model 3) and high concentration (Model 4). Data were divided using interleaved indices, which consists of cycling samples between the training (70%) and testing (30%) stages [31]. Performance was based on MSE; a neuron trimming test was conducted to select the models with no under-or overfitting, being 10 the most optimal number for both models (Figure 1b).
For the NIR models, the number of samples used was the number of beers with added faults plus control beers (n = 19), times the number of replicates (reps = 3; n = 57), multiplied by the number of measurements per replicate (measurements = 3; n = 171), giving a total of 180 samples considering the control as six replicates (+9).
The other two models were developed using the e-nose outputs as inputs to predict the fault found in the sample for the low concentration (Model 5) and high concentration (Model 6). Data were divided randomly into training (70%) and testing (30%) stages. Similar to Models 1 and 2, the performance was based on MSE, and 10 neurons ( Figure 1c) were used for both models as they provided the best models with no under-or overfitting after conducting a neuron trimming test.
For the e-nose models, the number of samples used was the number of beer faults plus control (n = 19), times the number of replicates (reps = 3; n = 57), multiplied by the number of mean values obtained from the e-nose curves per beer sample (values = 10; n = 570), giving a total of 600 considering the control as six replicates (+30).   Figure 2 shows the curves of the NIR raw values for the control and each fault tested for low and high concentrations. It can be observed that the overtones were similar for both  3. Results Figure 2 shows the curves of the NIR raw values for the control and each fault tested for low and high concentrations. It can be observed that the overtones were similar for both low and high concentrations; however, the absorbance values were different. The overtones found for all samples are within the 1900 and 2000 nm and >2250 nm. On the other hand, in Figure  3, which shows the absorbance values for the first derivative transformation, the overtones were enhanced within the 1850-1905 nm and 1950-2140 nm ranges. In both Figures, it could be observed that the sample with hydrogen sulfide at low concentration had higher absorbance values than with the high concentration.                 Figure 6 shows significant differences (p < 0.05) between samples for pH and alcohol content for low and high concentration treatments. It can be observed that at both low and high concentrations, the beer with eugenol had the highest pH (Low: 4.22; High: 4.33), while the sample with lactic acid presented the lowest pH (Low: 4.03; High: 3.90), which differed from the control (4.13). On the other hand, the sample with Eugenol and low and high concentration had the highest alcohol content (Low: 4.73; High: 4.77), while the light-stuck sample had the lowest alcohol content (Low and High: 4.71%), which was similar to the control (4.71%). Table 3 shows that both Model 1 (NIR inputs) and Model 2 (e-nose inputs) had high overall accuracy (>95%) to classify the beer samples into control, low and high concentrations of faulty aromas. None of these models was under-or overfitted because their training MSE values were lower than the testing stage, which indicates they gave a high performance. Furthermore, in their receiver operating characteristic (ROC) curves (Figure 7), it can be seen that the three classifications in the two models had very high sensitivity (true positive rate; >0.94). It can be observed that the lowest sensitivity for Models 1 and 2 was the low concentration (0.94 and 0.95, respectively).  Table 3 shows that both Model 1 (NIR inputs) and Model 2 (e-nose inputs) had high overall accuracy (> 95%) to classify the beer samples into control, low and high concentrations of faulty aromas. None of these models was under-or overfitted because their training MSE values were lower than the testing stage, which indicates they gave a high performance. Furthermore, in their receiver operating characteristic (ROC) curves (Figure 7), it can be seen that the three classifications in the two models had very high sensitivity (true positive rate; > 0.94). It can be observed that the lowest sensitivity for Models 1 and 2 was the low concentration (0.94 and 0.95, respectively).     Table 4 shows that Models 3 (low concentration) and 4 (high concentration) developed using the NIR absorbance values as inputs had high overall accuracy with 99% and 98%, respectively. None of the models presented any signs of under-or overfitting as the training performance (Models 3 and 4: MSE < 0.001) was lower than the testing (Model 3: MSE = 0.003; Model 4: MSE = 0.005). Figure 8 depicts the ROC curves for Models 3 and 4, in which it can be observed that all of the classification categories had high sensitivity (>0.89). In Model 3, acetic acid and H 2 S were the lowest in sensitivity (0.89), while in Model 4, the lowest were samples earthy, light-struck, and H 2 S (0.89).  Table 4 shows that Models 3 (low concentration) and 4 (high concentration) developed using the NIR absorbance values as inputs had high overall accuracy with 99% and 98%, respectively. None of the models presented any signs of under-or overfitting as the training performance (Models 3 and 4: MSE < 0.001) was lower than the testing (Model 3: MSE = 0.003; Model 4: MSE = 0.005). Figure 8 depicts the ROC curves for Models 3 and 4, in which it can be observed that all of the classification categories had high sensitivity (>0.89). In Model 3, acetic acid and H2S were the lowest in sensitivity (0.89), while in Model 4, the lowest were samples earthy, light-struck, and H2S (0.89).   Table 5 shows that Models 5 (low concentration) and 6 (high concentration) developed using the e-nose outputs as inputs had high overall accuracy (97% and 96%, respectively). In both, Models 5 and 6, the training performance (Models 5 and 6: MSE < 0.001) was lower than the testing (Model 5: MSE = 0.009; Model 6: MSE = 0.011). Based on the ROC curves (Figure 9), all classification categories had high sensitivity (>0.87), with caprylic acid being the lowest sensitivity in Model 5 (0.90) and diacetyl and ferrous sulfate in Model 6 (0.87). Table 5. Statistical results of the artificial neural network classification models developed using the electronic nose outputs as inputs. Abbreviations: MSE: means squared error.

Stage
Samples Accuracy Error Performance (MSE)   Table 5 shows that Models 5 (low concentration) and 6 (high concentration) developed using the e-nose outputs as inputs had high overall accuracy (97% and 96%, respectively). In both, Models 5 and 6, the training performance (Models 5 and 6: MSE < 0.001) was lower than the testing (Model 5: MSE = 0.009; Model 6: MSE = 0.011). Based on the ROC curves (Figure 9), all classification categories had high sensitivity (>0.87), with caprylic acid being the lowest sensitivity in Model 5 (0.90) and diacetyl and ferrous sulfate in Model 6 (0.87).

Near-Infrared Spectroscopy (NIR)
The NIR range used in this study includes the aromatic overtones, which corresponded to 1596-2396 nm. This NIR range has been used in previous studies to characterize and model different volatile aromatic compounds in beers [19,25,32] and wine [33,34]. The major raw NIR peaks shown for all samples were in 1900-2000 nm and >2300 nm (Figure 2), which correspond to overtones of compounds such as carboxylic acid, pOH, water, amides, alcohol, proteins, and carbohydrates [35]. After a first derivative transformation (Figure 3), several other peaks and valleys were enhanced, such as other water overtones, thiols, and starch, all of which are found in beer [35]. The major differences in the latter case were observed between 1600 and 1700 nm, where aromatic compounds and alkyls can be found, at 1890 nm as the major peak, which corresponds to carboxylic acid, which may be present in beer samples in the form of different compounds such as acetic and lactic acids, and may derivate into esters, which are responsible for different aromas in beer [29,35], and between 1950 and 2150 nm, where proteins, amides, alcohol, and sucrose are found [35,36]. Interestingly, one off-aroma addition was significantly different compared to the rest (hydrogen sulfide) for low concentrations (Figure 3a) but not for high concentrations used (Figure 3b). The latter effect may be related to the initial interaction between the H 2 S compound and other minerals in the beer, increasing the chemical fingerprint in the specific overtones, which decrease at higher H 2 S concentrations [37]. These patterns and differences between the chemical fingerprint for samples with different fault additions are assessed using machine learning for discrimination purposes and potential identification and classification using ANN modeling techniques.

Low-Cost e-Nose and Beer Chemometrics
The low-cost e-nose used was preliminarily tested on different commercial beers for ML classification purposes and determination of aroma profiles compared to gas chromatography [29]. The same e-nose was successfully used to assess and quantify smoke taint compounds in wine [38]. The variation of the e-nose sensors response after the addition of fault compounds is significant for most of the sensors ( Figure 4) and with differences between variations for low concentrations (Figure 4a) compared to high concentrations ( Figure 4b) of faults, which helps to justify the classifications performed by the machine learning modeling techniques.
The latter is also supported by the general correlation matrix analysis ( Figure 5) between all the fault compound additions and the different e-nose sensors. More negative and statistically significant correlations (p < 0.05) were found between acetaldehyde and most sensors except for MQ7 and with a positive correlation with MG811. The negative correlations between acetaldehyde and sensors sensitive to alcohol may be associated with the fact that acetaldehyde is often produced from the oxidation of ethanol [8,39]. The same trends were shown for sensor MQ4 (CH 4 ) and acetic acid; this negative correlation was found in breweries, where CH 4 production decreased acetic acid formation [40]. As expected, contamination was positively correlated with diacetyl, and butyric acid was formed when the latter two are mixed. Furthermore, Indole was negatively correlated to MQ3, MQ137, MQ8, and MQ135. The negative correlation between Indole and the sensors sensitive to alcohol may be since Indole is formed by coliform bacteria, which is eliminated with the presence of alcohol; therefore, at higher alcohol, lower indole production [8]. Furthermore, as expected, hydrogen sulfide was positively correlated with MQ136 (H 2 S) sensor; however, the correlation, although positive, was weak due to the sensitivity (1-100 mg L −1 ) of the sensor [29], which is higher than the concentrations used in the samples for this study (0.03 and 0.07 mg L −1 ). The rest of the fault compounds were positively correlated at different levels with most of the sensors.
There are fewer variations concerning basic chemometrics, such as pH and alcohol levels, which can be explained through the interactions of fault compounds (Figure 6), especially at high concentrations (Figure 6b), which were more significant for the case of pH compared to alcohol concentration. As expected, an increase in alcohol and a decrease in pH can be observed in samples with compounds that have an alcohol group and acidic faults, respectively.

Supervised Machine Learning Classification Models and Deployment
Models 1 and 2 were developed to determine whether the beers are a control (no faults), low, or high concentration of faults. This will indicate which model should be used to further predict the specific fault present in the sample: Models 3 (low concentration) or 4 (high concentration) for NIR and Models 5 (low concentration) or 6 (high concentration) for e-nose.
All six models based on NIR and e-nose resulted in high accuracies (>95% for NIR and e-nose, Models 1 and 2; >98% for NIR Models 3 and 4, and >96% for e-nose, Models 5 and 6). These accuracies and performances are consistent with those presented in previous studies using NIR and e-nose for beer to assess aroma compounds and for the classification of commercial beers [22,29].
In terms of practicality, even though the e-nose models have lower performance than the NIR models, the low-cost and integrability nature of the e-nose makes it more flexible for deployment to be added to different brewing processes with automated data acquisition and interpretation through AI. The e-nose processed data can be readily available to brewers for decision-making using wireless data transmission through the Internet of Things (IoT) [41,42]. On the contrary, the commercial NIR instrument used in this study cannot be integrated with AI as it can only incorporate models based on partial least squares (PLS), which have the limitation of assessing regression levels of single compounds per model for manual and punctual measurements, also requiring a trained operator for the instrument usage, data acquisition, handling, and interpretation. Furthermore, additional software is required for PLS modeling and integration with the NIR at a considerable cost.
Another advantage of the e-nose and AI method proposed compared to NIR is the implementation and deployment costs using the beforementioned data transmission and cloud processing using AI since it is based on numeric data transmission from the different voltage readings of sensors. The NIR instrument can be cost-prohibitive for craft brewing companies compared to the cost of the e-nose hardware, which corresponds to 2.5% of the NIR instrument.

Conclusions
The comparative accuracies of ML models developed for NIR and e-nose make the latter method cost-effective, reliable, and easy to deploy for craft, medium, and big brewing companies. The latter also allows the implementation and deployment of this method with the option of replication to assess multiple batches simultaneously. Due to the portability of new versions of e-noses considering local data storage and analysis using local and inexpensive microprocessors (i.e., Jetson ® from NVIDIA, Arduino ® or Raspberry Pi ® ), these could be added to robots such as the RoboBEER for quality traits and consumer perception assessment using AI. Further studies are required to model fault assessment on different beer styles with more complex aroma profiles, such as lambic and within different stages of the brewing process. The results presented here are from a preliminary study on the integration of low-cost sensor technology and AI, and the models developed can be enriched with further data making the most of the learning capabilities of the ANN models considered.