Development of Artiﬁcial Neural Network Models to Assess Beer Acceptability Based on Sensory Properties Using a Robotic Pourer: A Comparative Model Approach to Achieve an Artiﬁcial Intelligence System

: Artiﬁcial neural networks (ANN) have become popular for optimization and prediction of parameters in foods, beverages, agriculture and medicine. For brewing, they have been explored to develop rapid methods to assess product quality and acceptability. Di ﬀ erent beers ( N = 17) were analyzed in triplicates using a robotic pourer, RoboBEER (University of Melbourne, Melbourne, Australia), to assess 15 color and foam-related parameters using computer-vision. Those samples were tested using sensory analysis for acceptability of carbonation mouthfeel, bitterness, ﬂavor and overall liking with 30 consumers using a 9-point hedonic scale. ANN models were developed using 17 di ﬀ erent training algorithms with 15 color and foam-related parameters as inputs and liking of four descriptors obtained from consumers as targets. Each algorithm was tested using ﬁve, seven and ten neurons and compared to select the best model based on correlation coe ﬃ cients, slope and performance (mean squared error (MSE). Bayesian Regularization algorithm with seven neurons presented the best correlation ( R = 0.98) and highest performance (MSE = 0.03) with no overﬁtting. These models may be used as a cost-e ﬀ ective method for fast-screening of beers during processing to assess acceptability more e ﬃ ciently. The use of RoboBEER, computer-vision algorithms and ANN will allow the implementation of an artiﬁcial intelligence system for the brewing industry to assess its e ﬀ ectiveness.


Introduction
Machine learning is defined as the computer-based system that is able to learn and find patterns among the data to predict specific outputs [1,2]. There are different types of machine learning from which two main categories are derived: (i) pattern recognition or classification and (ii) fitting or regression [3]. The first is mainly used for decision making as it classifies samples into two or more categories, the most publicized applications can be found in medical diagnosis [4,5], food and beverages to classify into types of brewages [6][7][8] and level of liking of brewages [8,9], in agriculture for identification of grapevine cultivars [10], and to estimate plant water status [11], among others. Fitting or regression is used to predict specific values of certain variables such as chemical compounds [7,12], sensory descriptors [13], and microbial spoilage [14] among others.
There are different types of regression algorithms, which can be classified within categories such as linear regression, regression trees, support vector machines, Gaussian process, ensemble of trees

Color and Foam-Related Parameters
Color and foam-related parameters were obtained using a robotic pourer, RoboBEER (University of Melbourne, Melbourne, Australia), to ensure uniform pouring. RoboBEER works with two Lego ® servo motors and has three sensors attached that work with Arduino ® (Arduino, Ivrea, Italy): (i) temperature, (ii) alcohol and (iii) carbon dioxide (CO 2 ) gas release and is coupled with an iPhone 5S to record 5 min videos of the pouring (Figure 1). These videos were then analyzed with Matlab ® R2018b (Mathworks Inc., Matick, MA, USA) using customized computer vision algorithms. The first algorithm worked in a semi-automatic way, which consisted of standardizing and scaling the glass size by selecting the height and glass rim in the first frame of the video, followed by the manual selection of the foam height every 30 frames for the algorithm to automatically calculate the foam and beer volume. These results were then used to develop the foam volume versus time curve and to calculate the following parameters: (i) maximum volume of foam (MVol), (ii) total lifetime of foam (TLTF), (iii) lifetime of foam (LTF), and (iv) foam drainage (FDrain). Furthermore, a single frame of the video (highest in foam) was processed using other algorithms in Matlab ® to assess color in two scales CIELab [(v) L, (vi) a, (vii) b] and RGB [(viii) R, (ix) G, (x) B] as well as bubble size distribution divided in (xi) small (SmB), (xii) medium (MedB), and (xiii) large bubbles (LgB), the latter were analyzed based on the "Hough Transformation" from the middle section of the foam and classifying bubble size based on the diameter measured in pixels. Additionally, the parameters (xiv) alcohol (OH) and (xv) CO 2 gas release from the sensors were obtained. More details about the robotic pourer and computer vision analysis can be found in the paper from Gonzalez Viejo et al. [6]. All data were analyzed using customized codes in Matlab ® and a Titan Xp GPU (NVIDIA Corporation, Santa Clara, CA, USA).

Color and Foam-Related Parameters
Color and foam-related parameters were obtained using a robotic pourer, RoboBEER (University of Melbourne, Melbourne, Australia), to ensure uniform pouring. RoboBEER works with two Lego ® servo motors and has three sensors attached that work with Arduino ® (Arduino, Ivrea, Italy): (i) temperature, (ii) alcohol and (iii) carbon dioxide (CO2) gas release and is coupled with an iPhone 5S to record 5 min videos of the pouring (Figure 1). These videos were then analyzed with Matlab ® R2018b (Mathworks Inc., Matick, MA, USA) using customized computer vision algorithms. The first algorithm worked in a semi-automatic way, which consisted of standardizing and scaling the glass size by selecting the height and glass rim in the first frame of the video, followed by the manual selection of the foam height every 30 frames for the algorithm to automatically calculate the foam and beer volume. These results were then used to develop the foam volume versus time curve and to calculate the following parameters: (i) maximum volume of foam (MVol), (ii) total lifetime of foam (TLTF), (iii) lifetime of foam (LTF), and (iv) foam drainage (FDrain). Furthermore, a single frame of the video (highest in foam) was processed using other algorithms in Matlab ® to assess color in two scales CIELab [(v) L, (vi) a, (vii) b] and RGB [(viii) R, (ix) G, (x) B] as well as bubble size distribution divided in (xi) small (SmB), (xii) medium (MedB), and (xiii) large bubbles (LgB), the latter were analyzed based on the "Hough Transformation" from the middle section of the foam and classifying bubble size based on the diameter measured in pixels. Additionally, the parameters (xiv) alcohol (OH) and (xv) CO2 gas release from the sensors were obtained. More details about the robotic pourer and computer vision analysis can be found in the paper from Gonzalez Viejo et al. [6]. All data were analyzed using customized codes in Matlab ® and a Titan Xp GPU (NVIDIA Corporation, Santa Clara, CA, USA). . Equipment used to assess beers physical measurements; (a) robotic pourer, RoboBEER, which was used to assess the color and foam-related parameters and (b) a frame of a video taken to analyze the beer using computer vision algorithms.

Sensory Session
A double-blind sensory session to assess beer acceptability was conducted with 30 consumers using a 9-point hedonic scale. According to the Power analysis, this sample size of consumers is enough to compare samples in a sensory test (1-β > 0.99). The session was conducted in individual booths with uniform lighting located in the sensory laboratory of the Faculty of Veterinary and Agricultural Sciences of The University of Melbourne. Before the sensory session, participants were asked to sign a consent form in accordance with the ethics approval 1545786.2 by the Human Ethics Advisory Group (HEAG) of the Faculty of Veterinary and Agricultural Science at The University of Melbourne. The beer samples were semi-randomized in two blocks of eight and nine samples at refrigeration temperature (4 °C) and participants were provided with crackers and water to cleanse the palate and to allow them to rest between samples to avoid fatigue. The sensory attributes evaluated and used as targets for the model construction consisted of (i) carbonation mouthfeel (MCarb), (ii) bitter taste (TBitt), (iii) flavor, and (iv) overall liking (overall). . Equipment used to assess beers physical measurements; (a) robotic pourer, RoboBEER, which was used to assess the color and foam-related parameters and (b) a frame of a video taken to analyze the beer using computer vision algorithms.

Sensory Session
A double-blind sensory session to assess beer acceptability was conducted with 30 consumers using a 9-point hedonic scale. According to the Power analysis, this sample size of consumers is enough to compare samples in a sensory test (1-β > 0.99). The session was conducted in individual booths with uniform lighting located in the sensory laboratory of the Faculty of Veterinary and Agricultural Sciences of The University of Melbourne. Before the sensory session, participants were asked to sign a consent form in accordance with the ethics approval 1545786.2 by the Human Ethics Advisory Group (HEAG) of the Faculty of Veterinary and Agricultural Science at The University of Melbourne. The beer samples were semi-randomized in two blocks of eight and nine samples at refrigeration temperature (4 • C) and participants were provided with crackers and water to cleanse the palate and to allow them to rest between samples to avoid fatigue. The sensory attributes evaluated and used as targets for the model construction consisted of (i) carbonation mouthfeel (MCarb), (ii) bitter taste (TBitt), (iii) flavor, and (iv) overall liking (overall).

Machine Learning Modelling
Seventeen training algorithms (Table 2) were used to develop artificial neural network models using a customized Matlab ® code capable of testing all the algorithms in a loop. The models were  A neuron trimming exercise (5, 7 and 10 neurons) was performed for each algorithm. Ten was the largest number of neurons tested as using fewer neurons and obtaining good models without overfitting is the best practice. Using a larger number of neurons would most likely lead to overfitting. All models were developed using a random data division considering 70% (n = 35) of samples used for training, 15% (n = 8) for validation using a mean squared error performance algorithm, and 15% (n = 8) for the testing stage with a default derivative function. The models were constructed based on a two-layer feedforward network with a tan-sigmoid function in the hidden layer and a linear transfer function in the output layer ( Figure 2).

Machine Learning Modelling
Seventeen training algorithms (Table 2) were used to develop artificial neural network models using a customized Matlab ® code capable of testing all the algorithms in a loop. The models were developed using as inputs the normalized values (from −1 to 1) of the 15 color and foam-related parameters measured with the RoboBEER: (i) MVol, (ii) TLTF, (iii) LTF, (iv) FDrain, (v) L, (vi) a (vii) b, (viii) R, (ix) G, (x) B, (xi) SmB, (xii) MedB, (xiii) LgB, (xiv) OH and (xv) CO2, and the four sensory attributes as targets/outputs: (i) MCarb, (ii) TBitt, (iii) flavor, and (iv) overall. A neuron trimming exercise (5, 7 and 10 neurons) was performed for each algorithm. Ten was the largest number of neurons tested as using fewer neurons and obtaining good models without overfitting is the best practice. Using a larger number of neurons would most likely lead to overfitting. All models were developed using a random data division considering 70% (n = 35) of samples used for training, 15% (n = 8) for validation using a mean squared error performance algorithm, and 15% (n = 8) for the testing stage with a default derivative function. The models were constructed based on a two-layer feedforward network with a tan-sigmoid function in the hidden layer and a linear transfer function in the output layer ( Figure 2).  The statistical analysis to evaluate and compare the accuracy of the models developed consisted of the correlation coefficient (R), determination coefficient (R 2 ), mean squared error (MSE) to assess performance and slope (b) for each stage (i) training, (ii) validation, (iii) testing, and (iv) overall model as well as the p-value for the overall model. For the three best models, the percentage of outliers using 95% confidence bounds were obtained.  Table 3 shows the statistical data of the best and worse models developed from each group of training algorithms. For the backpropagation with Jacobian derivatives algorithm, there was no worse model as those from both algorithms within the group produced two of the best models. Tables S1-S3 in Supplementary Material show the statistical data of the models developed using the 17 training algorithms. Correlations from all models were significant with a p-value < 0.0001. It can be observed that the algorithms with the lowest R and R 2 were from the gradient descent backpropagation with five and seven neurons ( Table 3; Table S1), the batch training with weight and bias learning rate with seven neurons (Table 3) and the sequential order weight and bias with five neurons (Table S3). On the other hand, the models with the highest R and R 2 were with those developed using seven neurons from both algorithms belonging to the backpropagation with Jacobian derivatives function (LM and BR) and the RPROP with R values consistently over 0.90 for all stages (Table 3). Furthermore, the slope from these three best models was close to unity (b~1) for all stages, with the RPROP having the lowest slope values with a b = 0.90 for the overall model (Table 3; Figure 3). On the other hand, the three models had low MSE values (≤0.06) for the three stages and overall model. Table 3 also shows the best model from the supervised weight and bias algorithms; however, this still had some signs of overfitting as the validation and testing performances were not as close (MSE = 0.10 and 0.06, respectively) and the R values were lower than the three best models. Table 3. Statistical results of the best and worse models developed using the algorithms from the three different groups. Numbers in bold represent the models with the highest correlation and determination coefficients from each group of algorithms.  Figure 3 shows the training, validation, testing and overall models of the three best algorithms developed using 7 neurons. Model 1 (Figure 3a), which was developed with the Levenberg-Marquardt algorithm, had a training R = 0.96, and validation, testing and overall R = 0.95, furthermore, the overall model had 6.86% of outliers according to the 95% confidence bounds. Figure 3b shows Model 2 with the Bayesian regularization algorithm with R = 0.99 for the training stage, R = 0.98 for testing and overall model with R = 0.98 and 5.88% outliers, this algorithm does not use a validation stage. On the other hand, Figure 3c depicts Model 3 developed using the RPROP algorithm, which also had a high R = 0.95 for training and validation stages, R = 0.93 for testing and an overall model with R = 0.95 and a low percentage of outliers (4.90%). It can be observed that in the overall models, some predicted values are >1 or < −1, this is because the targets were normalized based on the range of data obtained in the study (3)(4)(5)(6)(7); however, the liking hedonic scale is within the 1-9 range, therefore, a value <−1 or >1, will still fit within the 1-9 scale when reversing the normalization.

Results
Beverages 2019, 5, 33 7 of 10 Figure 3 shows the training, validation, testing and overall models of the three best algorithms developed using 7 neurons. Model 1 (Figure 3a), which was developed with the Levenberg-Marquardt algorithm, had a training R = 0.96, and validation, testing and overall R = 0.95, furthermore, the overall model had 6.86% of outliers according to the 95% confidence bounds. Figure 3b shows Model 2 with the Bayesian regularization algorithm with R = 0.99 for the training stage, R = 0.98 for testing and overall model with R = 0.98 and 5.88% outliers, this algorithm does not use a validation stage. On the other hand, Figure 3c depicts Model 3 developed using the RPROP algorithm, which also had a high R = 0.95 for training and validation stages, R = 0.93 for testing and an overall model with R = 0.95 and a low percentage of outliers (4.90%). It can be observed that in the overall models, some predicted values are >1 or < −1, this is because the targets were normalized based on the range of data obtained in the study (3)(4)(5)(6)(7); however, the liking hedonic scale is within the 1-9 range, therefore, a value < −1 or > 1, will still fit within the 1-9 scale when reversing the normalization.

Discussion
According to Beale, et al. [24], an indicator of a good model with no overfitting is when the validation correlation coefficient is close to the value from the training stage, which was met by the three best models found in this paper (Table 3 and Figure 3). The Bayesian regularization model (Model 2) does not have a validation stage; however, the R values of the other three stages are high and similar. Furthermore, an indication of a model with no overfitting is that the training Figure 3. Models showing the three stages (training, validation and testing) as well as overall model of the three best algorithms found to assess liking of beer from morpho-colorimetric parameters from beer and beer foam: (a) Levenberg Marquardt, (b) Bayesian Regularization and (c) Resilient Backpropagation, showing the correlation coefficient (R) and 95% confidence bounds. In all graphs, the x-axis represents the observed data and y-axis the predicted or estimated values. N/A = not applicable.

Discussion
According to Beale, et al. [24], an indicator of a good model with no overfitting is when the validation correlation coefficient is close to the value from the training stage, which was met by the three best models found in this paper (Table 3 and Figure 3). The Bayesian regularization model (Model 2) does not have a validation stage; however, the R values of the other three stages are high and similar. Furthermore, an indication of a model with no overfitting is that the training performance (MSE) must be lower than the other stages, and the gap between the validation and testing MSE must be small [3,24]. This was also met by the best models found in this paper ( Table 3).
The Levenberg-Marquardt algorithm (Model 1) is a backpropagation function, which works by calculating the second derivatives of a cost function. The advantages of this algorithm are: (i) that it is capable of giving a solution even though its start-point is far from the final minimum, (ii) its processing time is one of the lowest compared to other algorithms, (iii) the training algorithm stops when it finds the maximum epoch and (iv) the best performance value is achieved, or when it finds Beverages 2019, 5, 33 8 of 10 that the gradient value is lower than its minimum [25]. However, some disadvantages include: (i) it may not always secure a global optimum for an unrestrained optimization issue and ii) it may require higher memory usage [26]. On the other hand, the Bayesian regularization algorithm (Model 2) works using the same principles of Levenberg Marquardt but updating the weights and biases according to the optimization. The main advantages of this algorithm include: (i) lower memory usage, (ii) it has a good generalization for noisy or small datasets, (iii) it avoids overfitting effectively and (iv) it does not require a validation stage [17,25,27]. The RPROP (Model 3) works through an adaptation of the weight values according to the information of the local gradient, based only on the sign of the derivative. Its purpose is to avoid the negative effects of the small magnitude of partial derivatives which often result in small or null changes in weights and biases. The training stops when it reaches the maximum number of epochs or time, or when the best performance has been reached [28,29]. Some of the advantages of RPROP are: (i) the performance is better than other techniques used for adaptation [30] and (ii) it has fast convergence and low memory usage [31].
Based on the results from the three best models found to assess beer liking and acceptability by consumers, and considering the advantages and disadvantages of the algorithms, it can be said that Model 2 is the most appropriate for the prediction of beer liking using beer color and foam-related parameters. This is based on the highest correlation coefficient (R = 0.98), best performance, good fit within the confidence bounds with a low number of outliers, overall slope b = 1 and, therefore, no signs of overfitting. Furthermore, the dataset used met the small database requirements (N = 51), which is appropriate for the Bayesian Regularization.
The implementation of the models presented in this paper would allow a reduction in time and costs for the brewers when developing new products. It may also be used to do a fast-screening of any new developments without the need to conduct large sensory tests with consumers, which requires time for preparation, data gathering and analysis as well as financial resources for sampling and recruiting of consumers. This model allows accurate prediction of the liking of carbonation mouthfeel, flavor, bitterness, and overall liking using the physical parameters related to color and foam, this being possible because consumers are able to judge beer quality and acceptability based only on the visual attributes which give the first impression [8,9,32]. Furthermore, there is a relationship between the foam and color-related parameters, and bitterness as the iso-α-acids derived from hops are responsible for bitterness, but also contribute to foamability and foam stability due to their tensio-active properties. Furthermore, hops contribute to the development of aromas and flavors in beer, and foam aids in the release of aromas and flavors when bubbles burst [8,13,33,34].
Since the models are based on an automated data gathering process by using the RoboBEER and video analysis of pouring using computer vision algorithms, an artificial intelligence (AI) application may be implemented. This will offer to the beer industry a completely automated process to predict liking and acceptability of different beers by consumers.

Conclusions
The comparison of different artificial neural network algorithms aids in the selection of the best model making sure that it has no overfitting and it has the best performance. However, it is also important to consider the advantages and disadvantages of the algorithms in accordance with the dataset details and intended application to make the best choice. The best algorithm for the specific model presented in this paper was the Bayesian Regularization with very high accuracy (R = 0.98), and it would aid in the optimization of costs and time for breweries to assess beer acceptability without the need of recruiting consumers and running sensory sessions, being able to get the results within minutes. This is important, especially when having a large number of prototypes when developing new beer products. The use of the RoboBEER, computer vision algorithms and the ANN algorithms found in this research will allow the implementation of an AI system for the brewing industry to assess the effectiveness of beer making in terms of quality and acceptability of consumers.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2306-5710/5/2/33/s1, Table S1: Statistical results of the models developed using the backpropagation with Jacobian derivatives algorithm. Numbers in green and bold represent the models with the highest correlation and determination coefficients. Table S2: Statistical results of the models developed using the backpropagation with gradient derivative algorithms. Numbers in red and italics represent the models with the lowest correlation and determination coefficients, while those in green and bold represent the highest values. Table S3: Statistical results of the models developed using the supervised weight and bias algorithms. Numbers in red and italics represent the models with the lowest correlation and determination coefficients.