Development of a Rapid Method to Assess Beer Foamability Based on Relative Protein Content Using RoboBEER and Machine Learning Modeling

: Foam-related parameters are associated with beer quality and dependent, among others, on the protein content. This study aimed to develop a machine learning (ML) model to predict the pattern and presence of 54 proteins. Triplicates of 24 beer samples were analyzed through proteomics. Furthermore, samples were analyzed using the RoboBEER to evaluate 15 physical parameters (color, foam, and bubbles), and a portable near-infrared (NIR) device. Proteins were grouped according to their molecular weight (MW), and a matrix was developed to assess only the signiﬁcant correlations ( p < 0.05) with the physical parameters. Two ML models were developed using the NIR (Model 1), and RoboBEER (Model 2) data as inputs to predict the relative quantiﬁcation of 54 proteins. Proteins in the 0–20 kDa group were negatively correlated with the maximum volume of foam (MaxVol; r = − 0.57) and total lifetime of foam (TLTF; r = − 0.58), while those within 20–40 kDa had a positive correlation with MaxVol (r = 0.47) and TLTF (r = 0.47). Model 1 was not as accurate (testing r = 0.68; overall r = 0.89) as Model 2 (testing r = 0.90; overall r = 0.93), which may serve as a reliable and a ﬀ ordable method to incorporate the relative quantiﬁcation of important proteins to explain beer quality. ion cyclotron resonance and the bottom-up strategies, which identify the peptides after enzymatic


Introduction
Foam is an important attribute to determine beer quality as it is related to its chemical composition and directly affects the sensory descriptors, such as aroma release, color, and mouthfeel [1,2]. Proteins are an essential component for foamability and foam stability because they act as surfactant substances, which are able to reduce the interfacial tension and increase the viscosity and elasticity of the liquid [3,4]. This is due to the hydrophilic and hydrophobic properties of their structure. When proteins unfold at the bubble interphase, the hydrophobic molecules make contact with the air and the hydrophilic stay in the liquid phase; this promotes the development of a layer in the interface and increases the stability of foam [4][5][6]. Furthermore, proteins, along with polyphenols, are responsible for chill haze formation; this happens when the beer is cooled at temperatures <0 • C and is the result of protein aggregation and oxidized flavonoids [7,8].
Some of the traditional methods that have been used to assess the beer proteome include gel-free top-down analysis of intact proteins with high-resolution instruments based on Fourier transform ion cyclotron resonance and the bottom-up strategies, which identify the peptides after enzymatic digest, which is usually performed using trypsin [9]. Other methods that have been used in beer are two-dimensional (2-D) gel electrophoresis, coupled with mass spectrometry [10,11], high-performance liquid chromatography (HPLC) [12], enzyme-linked immunosorbent assay (ELISA) [9], matrix-assisted laser desorption/ionization mass spectrometry (MALDI-TOF MS) [13], and liquid-chromatography mass-spectroscopy (LC-MS) [14,15]. However, these techniques often require expensive equipment, consumables for every test, a large laboratory space, personnel with specialized training, and are often time-consuming.
Emerging techniques derived from artificial intelligence, such as machine learning, robotics, and computer vision, have been used in the food and beverage industries to overcome most of the drawbacks of traditional methods to assess different components in food products and beverages. Some examples of these are the use of a robotic pourer along with computer vision to evaluate 15 physical parameters (color, foam, and bubbles) in beer using a 5-min video [1,16], and the use of low-cost electronic noses along with machine learning to assess aromas and alcohol content in brewages as an alternative method to gas chromatography-mass spectrometry [17,18], among others. However, there are no known publications proposing affordable and rapid methods to assess proteomics or related parameters using any of the emerging techniques mentioned.
This study aimed to assess the role of relative proteins and their molecular weight pattern for different beer samples and to construct a machine learning model to predict the relative pattern quantification of those proteins as well as their effect on foamability and beer quality. To achieve this, significant correlations (p < 0.05) between the identified proteins grouped by molecular weight and physical parameters (color, foam, and bubbles) measured using an automatic pourer (RoboBEER; The University of Melbourne, Melbourne, Vic, Australia) were assessed. Furthermore, all samples were measured using a near-infrared (NIR) portable device to obtain their chemical fingerprinting in the 1600-2400 nm range. Two machine learning models were constructed and compared to predict the relative quantification of 54 proteins identified in the beer samples (i) using the absorbance values of the chemical fingerprinting as inputs for Model 1, and (ii) using the physical parameters (color, foam, and bubbles) as inputs for Model 2.

Samples Description
A total of 24 beers from various types and countries (Table 1) were used for this study to ensure a broad range of protein profiles were analyzed. Samples were also selected from three fermentation types, (i) spontaneous, (ii) bottom, and (iii) top, as these have different foaming characteristics and, therefore, a distinct chemical composition. Replicate bottles (N = 3) of each beer were analyzed. Samples were prepared as detailed by Kerr et al. [19]. To desalt proteins by precipitation, 50 µL of each sample was added to 1 mL of 1:1 methanol/acetone and incubated at −20 • C for 16 h. Precipitated proteins were centrifuged at room temperature at 18,000 rcf for 10 min, the supernatant was discarded, and proteins resuspended in 100 µL of 100 mM ammonium acetate, 10 mM dithiothreitol (DTT), and 1 µg trypsin (Proteomics grade, Sigma-Aldrich, St. Louis, MO, USA). Proteins were digested by incubation at 37 • C for 16 h. Trypsin was added at equal amounts to all samples to allow normalization of the relative protein abundance. SWATH-MS was implemented as described below, using triplicates to reduce retention time variation and improve data quality.

Mass Spectrometry
Peptides were desalted with C18 ZipTips (Millipore) and analyzed by liquid chromatography-electrospray ionization tandem mass spectrometry (LC-ESI-MS/MS) using a Prominence nanoLC system (Shimadzu Corporation, Kyoto, Japan) and TripleTof 5600 instrument with a Nanospray III interface (SCIEX, Mulgrave, Victoria, Australia) essentially as previously described by Xu et al. [20]. Approximately, 2 µg or 0.4 µg of desalted peptides, as estimated by the ZipTip binding capacity, were injected for data-dependent acquisition (DDA) or data-independent acquisition (DIA), respectively. LC parameters were identical for DDA and DIA, and LC-MS/MS was performed essentially as previously described by Zacchi and Schulz [21]. Peptides were separated on a VYDAC EVEREST reversed-phase C18 HPLC column (300 Å pore size, 5 µm particle size, 150 µm i.d. × 150 mm) at a flow rate of 1 µL/min with buffer A (1% acetonitrile and 0.1% formic acid) and buffer B (80% acetonitrile with 0.1% formic acid) using a gradient of 10-60% buffer B over 48 min, for a total run time of 70 min per sample. Gas and voltage settings were adjusted as required. For DDA analyses, an MS TOF scan from m/z of 350-1800 was performed for 0.5 s followed by DDA of MS/MS in high sensitivity mode with automated CE selection of the top 20 peptides from m/z of 100-1800 for 0.05 s per spectrum and dynamic exclusion of peptides for 5 s after 2 selections. Identical LC conditions were used for DIA SWATH, with an MS-TOF scan from an m/z of 350-1800 for 0.05 s followed by high-sensitivity DIA of MS/MS from m/z of 100-1800 with 26 m/z isolation windows with 1 m/z window overlap each for 0.1 s across an m/z range of 400-1250. The collision energy was automatically assigned by the Analyst software (SCIEX, Mulgrave, Victoria, Australia) based on the m/z window ranges.

Proteomics Data Analysis
Peptides and proteins were identified using ProteinPilot 5.1 (SCIEX, Mulgrave, Victoria, Australia), searching against all eukaryotic proteins in UniProtKB (downloaded 14 June 2018; 557485 total entries), with settings: sample type, identification; instrument, TripleTof 5600; species, none; ID focus, biological modifications; enzyme, trypsin; search effort, thorough ID. The results from ProteinPilot were used as an ion library to measure the abundance of peptides and proteins using PeakView 2.1 (SCIEX, Mulgrave, Victoria, Australia), with settings: shared peptides, allowed; peptide confidence threshold, 99%; false discovery rate, 1%; XIC extraction window, 6 min; XIC width, 75 ppm. PeakView output was reformatted with a Python script (Python Software Foundation, Wilmington, DE, USA) that also applied a peptide FDR cut-off of 1% to remove low-quality ion measurements for that peptide from each sample (https://github.com/bschulzlab/reformatMS). For protein-centric analyses, protein abundances were normalized to the sum of all protein intensities in a sample, as previously described [22].

Physical Parameters (Color, Foam, and Bubbles)-RoboBEER
The beer samples were evaluated with an automatic pourer (RoboBEER) to get the physical parameters (color, foam, and bubbles). This robot is able to automatically and constantly pour 80 ± 10 mL of the sample while being recorded for 5 min using a smartphone camera. Furthermore, it is integrated with an ethanol gas MQ3 (Henan Hanwei Electronics Co., Ltd., Zhengzhou, China) and carbon dioxide (CO 2 ; MG811; Henan Hanwei Electronics Co., Ltd., China) sensors that record values in real-time during the pouring and video recording, as well as an infrared temperature sensor MLX90614 (Melxis NV, Ieper, Belgium) to make sure all the beers are evaluated at similar conditions. Furthermore, the videos were analyzed with computer vision algorithms developed in Matlab ® R2019b (Mathworks Inc., Natick, MA, USA) to obtain the maximum volume of foam (MaxVol), lifetime of foam (LTF), total lifetime of foam (TLTF), foam drainage (FDrain), color in both CIELab and RGB scales, and bubble size distribution grouped as small (SmBubb), medium (MedBubb), and large (LgBubb). A more detailed description of the technique may be obtained in the papers published by Gonzalez Viejo et al. [1,[23][24][25].

Near-Infrared Analysis
A portable NIR device microPHAZIR™ (RX Analyzer; Thermo Fisher Scientific, Waltham, MA, USA) was used to evaluate the samples. This device can read the NIR spectra within the 1600-2400 nm range, recording a value every 7-9 nm. A Whatman ® filter paper (Whatman plc. Maidstone, UK; qualitative grade 3, 7.0 cm) was submerged in each of the beer samples (20-23 • C) to be measured by placing a white background in the top to avoid signal noise from the environment. Then, a dry filter paper was read to subtract the values from the sample results and eliminate the cellulose present in the paper. Three readings from each bottle of each sample were recorded and averaged to reduce variability. This method was validated by Gonzalez Viejo et al. [6].

Statistical Analysis
An ANOVA was conducted for the protein values to assess significant differences (p < 0.05) between samples using XLSTAT software (Addinsoft Inc. New York, NY, USA). Furthermore, a correlation matrix was developed using an algorithm developed in Matlab ® R2019b to assess the significant correlations (p < 0.05) between the physical parameters (color, foam, and bubbles) and the identified proteins categorized into five groups based on their molecular weight (MW; 0-20 kDa, 20-40 kDa, 40-60 kDa, 60-80 kDa and >80 kDa).
Two artificial neural network (ANN) regression models were constructed using a Matlab ® R2019b code to test in a loop 17 different training algorithms. The best models based on the best performance, correlation coefficients, and no signs of overfitting were obtained using the Bayesian regularization algorithm. Model 1 consisted of using the absorbance values obtained with NIR within the 1600-2400 nm spectra as inputs, while Model 2 was constructed with the 15 physical parameters (color, foam, and bubbles); both models were used to predict the relative quantification of 54 proteins found in the beer samples (Table 2). For both models, samples were divided using a random data division (dividerand) algorithm as 70% for the training stage, and 30% for testing using a performance algorithm based on mean squared error (MSE). As shown in Supplementary Material ( Figure S1), the models consisted of a feedforward network with two layers, a tan-sigmoid and a linear transfer function for the hidden and output layers, respectively. Furthermore, a trimming exercise was performed using 3, 5, and 10 neurons to assess the best performance, with 10 being the most accurate and with no overfitting.
The ANOVA results showed that there were significant differences (p < 0.05) between samples in 136 out of 150 proteins identified in the beers (data not shown). Regarding the NIR measurements, all beers presented the highest peak at 1940 nm ( Figure S2) in which the water overtone is found [28]. Even though all beers had a similar trend up to 2200 nm, the different samples vary from 2200 to 2400 nm, at which most proteins and carbohydrates are detected [29]. These differences found in proteins with both the proteomics and NIR methods are due to the variations in the raw material, type of fermentation, and production process, which produce a different composition in the distinct beer styles. Figure 1 shows the correlations (p < 0.05) between the physical parameters (color, foam, and bubbles) and the 150 proteins grouped by MW. It can be observed that proteins with MW within 0-20 kDa were negatively correlated with MaxVol (r = −0.57), TLTF (r = −0.58), and CO 2 (r = −0.52), and positively correlated with FDrain (r = 0.62). Proteins in the 20-40 kDa group had a positive correlation with "a" from the CIELab color scale (r = 0.49), MaxVol (r = 0.47), and TLTF (r = 0.47), and a negative correlation with FDrain (r = −0.51). Those proteins with MW between 40 and 60 kDa did not have any significant correlations with any of the parameters, while the 60-80 kDa group was negatively correlated with FDrain (r = −0.48) and positively correlated with SmBubb (r = 0.47), CO 2 (r = 0.64), TLTF (r = 0.57), and MaxVol (r = 0.53). On the other hand, the proteins above 80 kDa had a positive correlation with "G" (r = 0.46) and "B" (r = 0.57) from the RGB color scale and "L" (r = 0.42), and a negative correlation with "b" (r = −0.54) from the CIELab scale.
It is well known that the proteins associated with foamability and foam stability are mainly the LTP1 (MW 12 kDa), proteins Z4 (MW 43 kDa), and Z7 (MW 43 kDa), and hordeins (MW 30-35 kDa) [2,[30][31][32]. However, their correlation with foaming parameters varies according to different studies, Lusk et al. [33] found that LTP1 increased foam stability; however, Evans et al. [34,35] observed that LTP1 and protein Z7 did not present a correlation with foam stability. The latter was also observed in this study, as no correlation was found between the foam parameters and the proteins within 40-60 kDa. As previously mentioned, in the present study, proteins with low MW (0-20 kDa), in which LTP1 is categorized (Table S1), had a negative correlation with MaxVol and TLTF, but a positive correlation with FDrain. This may be owing to the capacity of LTP1 to bind lipids, which contributes to the differences in its influence in foam [30]. Furthermore, no positive effect of LTP1 on foamability for high-carbonated beers has been reported [35]. Hordeins are within the 20-40 kDa MW group (Table S1), which was positively correlated with MaxVol and TLTF; this confirms that these proteins may promote foam formation and stability. Proteins with higher MW (60-80 kDa) presented similar results to the 20-40 kDa group, although there are no known published studies identifying this positive correlation; however, in the present study, it was shown to contribute to smaller bubbles in the foam (SmBubb).
Beverages 2020, 6, x FOR PEER REVIEW 7 of 11 positive correlation; however, in the present study, it was shown to contribute to smaller bubbles in the foam (SmBubb).  Table 3 depicts the results of the ANN models. Model 1 presented a high overall correlation coefficient (r = 0.89) to predict 54 proteins with the chemical fingerprinting measured using NIR (1600-2400 nm). However, the r value (r = 0.68) for the testing stage was moderate and much lower than the training stage. Additionally, the slope of the testing stage was low. In contrast, Model 2 had higher overall accuracy (r = 0.93) to predict the same 54 proteins but using the physical parameters (color, foam, and bubbles) measured using RoboBEER. This model presented similar results for the three stages (training, testing, and overall) with high slope values (b ≈ 0.90). Furthermore, the performance shows there are no signs of overfitting of the model as the MSE value of the training was lower than the testing stage. On the other hand, Figure 2 shows the overall models in which a higher dispersion of the data can be observed for Model 1 compared to Model 2.  Table 3 depicts the results of the ANN models. Model 1 presented a high overall correlation coefficient (r = 0.89) to predict 54 proteins with the chemical fingerprinting measured using NIR (1600-2400 nm). However, the r value (r = 0.68) for the testing stage was moderate and much lower than the training stage. Additionally, the slope of the testing stage was low. In contrast, Model 2 had higher overall accuracy (r = 0.93) to predict the same 54 proteins but using the physical parameters (color, foam, and bubbles) measured using RoboBEER. This model presented similar results for the three stages (training, testing, and overall) with high slope values (b ≈ 0.90). Furthermore, the performance shows there are no signs of overfitting of the model as the MSE value of the training was lower than the testing stage. On the other hand, Figure 2 shows the overall models in which a higher dispersion of the data can be observed for Model 1 compared to Model 2.  (a) (b) Figure 2. Overall artificial neural network models to predict 54 proteins in beer (Table 2), showing the correlation coefficient (r) for a) Model 1 using the absorbance values of near-infrared spectra (1600-2400 nm), and b) Model 2 using the 15 physical parameters (color, foam, and bubbles) measured using RoboBEER ( Figure 1).

Machine Learning Modeling
Despite the chemical fingerprinting from NIR spectra within 1600-2400 nm presenting several overtones, which are related to the protein content, it was found in this study that these are not appropriate for the prediction of specific proteins in beer as the model (Model 1; Table 3) did not show a high correlation coefficient for the testing stage. Furthermore, this model (Model 1) presented underfitting of values along the entire scale, which makes it unsuitable for the prediction of proteins. However, the physical parameters (color, foam, and bubbles) were able to predict the 54 proteins with high accuracy (Model 2; Figure 2). The latter may be due to the significant contribution of proteins to foamability and, according to Figure 1, the color and lightness (L) of beer. This model presented underfitting focused on the lowest values (normalized: −1, real value: 0); in this particular case, this is not an issue as it is easy to detect given a good performance metric of the general model and known minimum values. Furthermore, this may be easily solved by using a simple rule to assign the logical 0 values on negative values for proteins on the model deployment after the denormalization of the output values.
Even though proteins within 40-60 kDa did not present significant correlations with the physical parameters (color, foam, and bubbles), some of these proteins were included in the model (Table 2) as they contributed to its high accuracy. This may be explained since the ANN is a non-linear  (Table 2), showing the correlation coefficient (r) for (a) Model 1 using the absorbance values of near-infrared spectra (1600-2400 nm), and (b) Model 2 using the 15 physical parameters (color, foam, and bubbles) measured using RoboBEER (Figure 1).
Despite the chemical fingerprinting from NIR spectra within 1600-2400 nm presenting several overtones, which are related to the protein content, it was found in this study that these are not appropriate for the prediction of specific proteins in beer as the model (Model 1; Table 3) did not show a high correlation coefficient for the testing stage. Furthermore, this model (Model 1) presented underfitting of values along the entire scale, which makes it unsuitable for the prediction of proteins. However, the physical parameters (color, foam, and bubbles) were able to predict the 54 proteins with high accuracy (Model 2; Figure 2). The latter may be due to the significant contribution of proteins to foamability and, according to Figure 1, the color and lightness (L) of beer. This model presented underfitting focused on the lowest values (normalized: −1, real value: 0); in this particular case, this is not an issue as it is easy to detect given a good performance metric of the general model and known minimum values. Furthermore, this may be easily solved by using a simple rule to assign the logical 0 values on negative values for proteins on the model deployment after the denormalization of the output values.
Even though proteins within 40-60 kDa did not present significant correlations with the physical parameters (color, foam, and bubbles), some of these proteins were included in the model (Table 2) as they contributed to its high accuracy. This may be explained since the ANN is a non-linear modeling method that can find complex relationships within the inputs and targets for their prediction [6], in contrast with the matrix in Figure 1, which only shows linear correlations.
Although some studies have successfully developed predictive models for proteins using NIR data as inputs using partial least squares regression (PLS), these only predict the total protein content; therefore, it is not able to provide more specific and multitarget results [36][37][38][39]. Other results with higher accuracy and performance of ANN modeling using calculated parameters rather than raw data (such as NIR spectra) have been reported for different purposes, such as other beer quality parameters, such as sensory attributes [23] and type of fermentation, using the physical parameters (color, foam, and bubbles) [1] and from different studies related to the classification of grapevine leaves into cultivars based on morphometric and colorimetric parameters [40] and prediction of cocoa aromas from canopy architecture parameters of cacao trees obtained using remote sensing and computer vision algorithms [41].

Conclusions
The high accuracy obtained by Model 2 showed that the proposed method might be the first reliable, objective, affordable, and rapid technique to evaluate the influence of specific proteins on beer foamability and quality using an artificial intelligence approach with the aid of robotics and machine learning. Therefore, it may potentially be utilized to predict the influence of different proteins on beer quality traits within the production line of both large-and small-scale breweries to assess beer quality. Furthermore, this method allowed a total of 69 physicochemical parameters [15 physical parameters (color, foam, and bubbles), and 54 protein] to be obtained, which is beyond the number of analyses that breweries are able to conduct to assess beer quality in every batch; this will allow the offering of products with higher quality.
Supplementary Materials: The following are available online at http://www.mdpi.com/2306-5710/6/2/28/s1, Table S1: Identified proteins showing their molecular weight (MW) and in which beer samples they were found (Y = detected; X = not detected); Figure S1: Diagram of the two-layer feedforward models showing the inputs, number of neurons, and outputs/targets used to construct both models; Figure S2: Absorbance values of the near-infrared spectra of 24 different beer samples.