Machine Learning Methods Applied for Modeling the Process of Obtaining Bricks Using Silicon-Based Materials

Most of the time, industrial brick manufacture facilities are designed and commissioned for a particular type of manufacture mix and a particular type of burning process. Productivity and product quality maintenance and improvement is a challenge for process engineers. Our paper aims at using machine learning methods to evaluate the impact of adding new auxiliary materials on the amount of exhaust emissions. Experimental determinations made in similar conditions enabled us to build a database containing information about 121 brick batches. Various models (artificial neural networks and regression algorithms) were designed to make predictions about exhaust emission changes when auxiliary materials are introduced into the manufacture mix. The best models were feed-forward neural networks with two hidden layers, having MSE < 0.01 and r2 > 0.82 and, as regression model, kNN with error < 0.6. Also, an optimization procedure, including the best models, was developed in order to determine the optimal values for the parameters that assure the minimum quantities for the gas emission. The Pareto front obtained in the multi-objective optimization conducted with grid search method allows the user the chose the most convenient values for the dry product mass, clay, ash and organic raw materials which minimize gas emissions with energy potential.


Introduction
Due to the numerous external factors, such as the quality of raw materials, the environmental protection requirements that need to be satisfied, and the variations of process parameters, many production processes are complex and extremely energy-and timeconsuming. For these reasons, it is not always possible to identify the relation between the product quality and process input variables, hence, the researchers' effort to integrate artificial intelligence into production processes for data storage, learning, reasoning and decision making.
Neural networks may be used for monitoring, but also for predicting various parameters [1,2]. The existing literature research on brick manufacture includes, for instance, artificial intelligence tools for automating the autoclaved aerated concrete (AAC) brick manufacturing process by creating prototype equipment. The raw material mixing process has been optimized so as to obtain bricks of appropriate hardness, more homogeneous than in the case of their manual manufacture [3]. The relation between the composition of materials which AAC bricks are made from and their mechanical properties was modeled by Zulkifli et al. using neural networks [4]. Other studies use various artificial intelligence tools to develop optimal models for brick-mortar structural composites with improved resistance to elastic deformation [5] or to predict the shear strength of brick masonry walls [6]. and other regression methods [24] are used in order to find the most appropriate one that can provide a good approximation for the data. Secondly, after the modeling step, a multi-objective optimization is performed to find the most appropriate values for the inputs (i.e., dry product mass, clay, ash and organic raw materials) that lead to the minimum values of the outputs (i.e., CO, NO and CH 4 ). Because the dimensionality of the problem is not too big, for this multi-objective optimization, a grid search approach is applied. The obtained results are satisfactory and in good agreement with the experimental practice.

Experimental Setting
The impact of adding auxiliary materials on the burning performance in an industrial brick burning facility was assessed by analyzing the exhaust gases in the furnace chimney. Burning performance is defined as the sum of the characteristics that have in view the preservation of the physical-mechanical properties of the bricks, the reduction of costs and keeping the previous technological flow unchanged, from the formation-extrusion area to the finished/burned product area. The combustion process took place in a tunnel oven by heating the dry products to high temperatures. All the technological features were kept constant: the same amount of product, the same combustion rate, the same combustion curve settings and the same exhaust gas sampling location. Only the type of the used auxiliary raw material of an organic nature was changed: sawdust and sunflower seed husks. In the manufacturing mix, there were used 15% ash and various percentages of sunflower seed husks or sawdust (0 and 3.5%) for 121 brick production batches and the impact on the noxious substances emitted from the furnace chimney was evaluated. The gases exhausted by the furnace chimney were analyzed using a Testo 350 flue gas analyzer (Testo, Titisee-Neustadt, Germany) equipped with detection and measurement cells specific to those gases (CO, NO x , C x H y ), metrologically calibrated. The analyzer's CO measuring resolution is 0.1 ppm, that of NO is 1 ppm, for NO 2 is 0.1 ppm and for C x H y is 1 ppm. For these parameters and for this type of equipment, other researchers reported a measurement uncertainty of ±2.51% [25] and ±5% [26]. For the accuracy of the results, a series of determinations were performed on different days of operation, by keeping constant the other parameters, for 15 min, with readings from minute to minute. The results taken into account were the arithmetic mean of the 15 readings. The same sampling location and length of the sampling probe were also maintained so that we can consider that the same measurement conditions for the exhaust gases at the furnace chimney were present. The uncertainties calculated for CO were ±7.7%, for NO 9.5% and for CH 4 8.4%. Acceptable emission limits imposed by the local environmental agency are: NO x < 250 mg/m 3 ; CO < 1500 mg/m 3 and volatile organic compounds <20 mg/m 3 .
Based on the experimental measurements performed on dry product mass, number of pieces/kiln car, total tons/day, amount of clay, amount of ash, amount of organic raw materials and values of pollutants measured in the chimney (CO, NO and CH 4 ), a database containing information on 121 batches of bricks was developed.

Statistical, Modeling and Optimization
The statistical processing of the available experimental data was performed with the specialized SigmaPlot 11.00 software (Systat Software Inc., San Jose, CA, USA). The information related to the average values, standard deviation, standard mean error, confidence interval of the average, amplitude, maximum value, minimum value, median, distribution ranges of 25% and 75% of data and evaluation of normal data distribution was provided: skewness and kurtosis tests, Kolmogorov-Smirnov test, Shapiro-Wilk test, the sum of the data and the sum of the squares which is a measure of the deviation from the average value.
The NeuroSolutions specialized software (NeuroDimension Inc., Boston, MA, USA) was used to build neural models as forward propagation networks. The experimental data provide information about 121 batches of bricks. 100 of them were used in the neural network training stage and 21 were kept for the testing stage. In order to establish the topology of the artificial neural networks (ANNs) with the best possible results, several networks of the type shown in Figure 1 were tested, with 4 inputs, one or two hidden layers with 4 to 80 hidden neurons and an output for predicting the amount of CO, NO and CH 4 exhausted in the flue gas chimney. the sum of the data and the sum of the squares which is a measure of the deviation from the average value.
The NeuroSolutions specialized software (NeuroDimension Inc., Boston, MA, USA) was used to build neural models as forward propagation networks. The experimental data provide information about 121 batches of bricks. 100 of them were used in the neural network training stage and 21 were kept for the testing stage. In order to establish the topology of the artificial neural networks (ANNs) with the best possible results, several networks of the type shown in Figure 1 were tested, with 4 inputs, one or two hidden layers with 4 to 80 hidden neurons and an output for predicting the amount of CO, NO and CH4 exhausted in the flue gas chimney. Other regression algorithms were also used for modeling. Nearest neighbor (NN) and k-nearest neighbor (kNN) are instance-based algorithms [27], where the predicted value of a query instance is computed either as the value of the closest training point according to the Euclidian distance metric, or as a weighted average taking into account k such closest neighbors, where their weights are set as a function of the distance between the query instance and the corresponding training instance. For example, many times the inverse distance function is used: wi = 1/di.
The K* algorithm [28] has a similar philosophy, but it uses entropy as a distance metric, justified by the idea that the distance between two instances can be defined as the complexity of transforming one instance into the other. It uses a global blend (gb) parameter, which can be considered as a sphere of influence that implicitly specifies how many neighbors are significant. Support vector regression (SVR) is a method inspired from support vector machines classification [29], whose idea is the minimize the error by finding output values that lie within a given margin named an ε-tube. The objective is to maximize the number of points that can be placed inside this tube, i.e., within the margin. Different kernels can be applied to the data to transform them into a configuration easier to learn, e.g., polynomial of different degrees, radial basis function (RBF) or Pearson universal kernel (PUK). When after such transformation the data still cannot be fit completely into the ε-tube, some errors are allowed and the cost parameter C controls the strictness of the objective function that is optimized by the algorithm: a higher value for C will lead to a smaller margin with a lower error, while a lower value of C will allow a greater error on the training set but with a wider margin, which in turn may lead to better generalization capabilities. Random Forest [30] is an ensemble method where a forest is composed of a set of trees. The trees are constructed by recursively choosing a partition after an attribute from a random subset of attributes. Also, each tree is built on a slightly different dataset using bagging, i.e., sam- Other regression algorithms were also used for modeling. Nearest neighbor (NN) and k-nearest neighbor (kNN) are instance-based algorithms [27], where the predicted value of a query instance is computed either as the value of the closest training point according to the Euclidian distance metric, or as a weighted average taking into account k such closest neighbors, where their weights are set as a function of the distance between the query instance and the corresponding training instance. For example, many times the inverse distance function is used: w i = 1/d i .
The K* algorithm [28] has a similar philosophy, but it uses entropy as a distance metric, justified by the idea that the distance between two instances can be defined as the complexity of transforming one instance into the other. It uses a global blend (gb) parameter, which can be considered as a sphere of influence that implicitly specifies how many neighbors are significant. Support vector regression (SVR) is a method inspired from support vector machines classification [29], whose idea is the minimize the error by finding output values that lie within a given margin named an ε-tube. The objective is to maximize the number of points that can be placed inside this tube, i.e., within the margin. Different kernels can be applied to the data to transform them into a configuration easier to learn, e.g., polynomial of different degrees, radial basis function (RBF) or Pearson universal kernel (PUK). When after such transformation the data still cannot be fit completely into the ε-tube, some errors are allowed and the cost parameter C controls the strictness of the objective function that is optimized by the algorithm: a higher value for C will lead to a smaller margin with a lower error, while a lower value of C will allow a greater error on the training set but with a wider margin, which in turn may lead to better generalization capabilities. Random Forest [30] is an ensemble method where a forest is composed of a set of trees. The trees are constructed by recursively choosing a partition after an attribute from a random subset of attributes. Also, each tree is built on a slightly different dataset using bagging, i.e., sampling from the initial dataset uniformly with replacement. In this way, the trees are sufficiently diverse to capture different perspectives of the training set. For a query instance, each tree computes an output value and the random forest ensemble calculates the average of these individual values. Table 1 presents statistical description of the data. We find lower values of the standard deviation for the dry product mass (Col 1) and the quantity of organic raw materials (Col 6), which indicates a lower spread of these experimental data. The amplitude, respectively the difference between the maximum and the minimum value, which indicates the range of values in which the distribution of experimental data extends, has values between 8 and 963. The median, the statistical parameter that indicates the middle of the data series, as long as it is organized in ascending or descending direction, has values close to the average value which shows a uniform distribution of experimental data. The analysis of the experimental data distribution reveals a negative asymmetry for Col 1-6 and a positive one for Col 7-9. The flattening indices of the variation curve of the analyzed data (kurtosis) have small values, which indicates a good distribution of the data, i.e., the fact that there are few data that have values very different from the average. The Kolmogorov-Smirnov normality test, which quantifies the degree of overlap between the cumulative distribution of the analyzed variables and the cumulative distribution of the variable following the shape of the Gaussian curve, indicates a normal distribution of data in the case of noxious substances measured in the chimney: Col 7 (CO mg/m 3 ) and Col 9 (CH 4 mg/Nm 3 ). This is also confirmed by the results obtained with the Shapiro-Wilk normality test.

Statistical Processing of Experimental Data
In order to design neural models that link a series of selected parameters regarding the pollutants released in a brick factory to the raw materials used, the statistical processing of the available experimental data was performed first.
To see if there are statistically significant differences between the 9 pairs of data series, the Kruskal-Wallis test was applied, namely the multiple pair comparison procedures (Dunn's Method). According to the Q values from the Dunn's test, presented in Table 2, it was established that there are no statistically significant differences (p > 0.05) between the total tons (Col 3) and the amount of CO (Col 7); between the total tons (Col 3) and the amount of clay (Col 4); between the quantity of organic raw materials (Col 6) and the dry product mass (Col 1); between the amount of clay (Col 4) and the amount of CH4 (Col 9), and between the amount of ashes (Col 5) and the amount of NO (Col 8). Following the statistical processing of the experimental data and for practical reasons, the following data series were used for modeling with neural networks: dry product mass (Col 1), quantity of clay (Col 4), quantity of ashes (Col 4), organic raw materials (Col 4), and the values of noxious substances measured in the chimney: CO, NO, and CH 4 (Col 7, Col 8 and Col 9).

Neural Network Modeling
The mean square error (MSE), correlation coefficient (r 2 ) and percentage error Ep (%) were used as criteria for choosing the best topology. Network topology was encoded by (m:n:p), where m is the number of neurons in the input layer, n-the number of neurons in the hidden neuron layer, and p-the number of neurons in the output layer.
In order to avoid the overtraining of the chosen neural networks, the variation of the MSE error with the number of training epochs was analyzed and it was concluded that when the number of epochs exceeds 70,000, the performance stops improving. Therefore, the number of training epochs used was 70,000 for all neural models developed.
In order to predict the amount of CO exhausted in the flue gas chimney, neural models with the MSE, r 2 and Ep topologies and errors shown in Table 3 were used. According to Figure 2, almost all results are within a ±22% confidence interval. were used as criteria for choosing the best topology. Network topology was encoded by (m:n:p), where m is the number of neurons in the input layer, n-the number of neurons in the hidden neuron layer, and p-the number of neurons in the output layer. In order to avoid the overtraining of the chosen neural networks, the variation of the MSE error with the number of training epochs was analyzed and it was concluded that when the number of epochs exceeds 70,000, the performance stops improving. Therefore, the number of training epochs used was 70,000 for all neural models developed.
In order to predict the amount of CO exhausted in the flue gas chimney, neural models with the MSE, r 2 and Ep topologies and errors shown in Table 3 were used. According to Figure 2, almost all results are within a ±22% confidence interval.   Tables 4 and 5 show the neural models developed to predict the amount of NO and CH4, respectively, exhausted in the flue gas chimney. The best performance was achieved in the training stage using the ANN (4:40:20:1) model for NO and ANN (4:60:30:1) model for CH4. Figures 3 and 4 compare the results achieved using these models with the experimental ones and the conclusion is that for the most cases, the confidence interval was  Tables 4 and 5 show the neural models developed to predict the amount of NO and CH 4 , respectively, exhausted in the flue gas chimney. The best performance was achieved in the training stage using the ANN (4:40:20:1) model for NO and ANN (4:60:30:1) model for CH 4 . Figures 3 and 4 compare the results achieved using these models with the experimental ones and the conclusion is that for the most cases, the confidence interval was of these figures are also within the ±22% confidence interval, with the exception of three values for NO and two values for CH 4 .    The possibility to make predictions regarding the presence in the flue gases of the quantity of gases with energy potential (CH4), and also of the quantity of polluting gases (NOx and CO) allow decision makers to decide whether to invest in facility retrofitting or to change certain operating limits so as to keep process and product performance in the The possibility to make predictions regarding the presence in the flue gases of the quantity of gases with energy potential (CH 4 ), and also of the quantity of polluting gases (NO x and CO) allow decision makers to decide whether to invest in facility retrofitting or to change certain operating limits so as to keep process and product performance in the comfort zone, while improving costs. Table 6 shows the regression results for each of the three outputs of the problem (CO, NO and CH 4 ) applying different algorithms and combinations of parameter values. The results are presented in terms of the correlation coefficient (r), where a value closer to 1 designates a better data fit. For each variant of the algorithm, the results for the training set and for 10-fold cross-validation (CV) are displayed. However, we are interested in good generalization, therefore the CV results are used to select the best models. The best values obtained for each of the three outputs are marked in bold. As one can see, there is no single best model for all three outputs. Still, k-nearest neighbor and random forest are the algorithms that stand out at the most promising for this problem. The second variant of kNN (k = 10) is better for CO, but much worse for NO than kNN (k = 6). Thus, kNN (k = 6, w = 1/d) and random forest (1000 trees) were selected for the second step of the optimization, together with a combination of models, a separate one for each output.

Modeling with Regression Methods
Indeed, the results of cross-validation are not very good, but can be considered satisfactory and useful. What is very important to emphasize in this approach is that the experiments were performed in industrial conditions, which means very large amounts of time, materials and energy per batch. Hence, the limited number of experiments which, together with the accuracy of the experimental determinations, influence the modeling results. Even in these conditions, the results obtained are of real use to industrial practice through the indications provided regarding the composition of the materials used.

Process Optimization
In the second stage, the model found previously is used for optimization, i.e., finding the most appropriate values for the inputs (dry product mass, clay, ash and organic raw materials) that lead to the minimum values of the outputs (CO, NO and CH 4 ). Since the outputs are more than one, it leads to a multi-objective optimization problem. Specialized algorithms exist for this type of problems, e.g., a widely used evolutionary algorithm is NSGA-II [31]. An evolutionary algorithm is very useful to explore the problem space, especially when the dimensionality of the problem is big. However, in our case, the problem has only four attributes. Since a short execution time is not a requirement here, and we are more interested in the quality of the solution, we chose a grid search approach instead.
In this case, the space problem is explored on each dimension with a proportional step. Let x min and x max be the minimum and maximum values of input x. The number of steps s, which determines the resolution on the axis, is defined by the user. In our case, s = 50 for all inputs. The step size is therefore s size = (x max − x min )/s. This leads to a number of 50 4 = 6,250,000 points (a large, but tractable amount) for which the outputs are computed.
From these triplets (CO i , NO i , CH 4i ), the non-dominated solutions are identified. A potential solution i dominates another potential solution j if i is better or equal to j for all objectives, and strictly better than j for at least one objective. Because the Pareto front is a 3D surface, a 2D (x, y) plot was used with the z axis represented by color.
The results of the optimization, i.e., the Pareto front, using the kNN (k = 6, w = 1/d) models are presented in Figure 5. The Pareto front using the Random Forest (1000 trees) models is presented in Figure 6.  Figure 7 displays the Pareto front obtained for three separate models with kNN (k = 10, w = 1/d), kNN (k = 6, w = 1/d) and random forest (1000 trees) for the outputs CO, NO, and CH 4 , respectively. They are the models which yielded the best individual results, as shown in Table 6. The last two fronts are much more diverse than the first. Still, all three fronts contain non-dominated solutions and the actual combination of reaction conditions should be chosen by the practitioner according to the estimated relative importance of the outputs. The obtained results reveal that no dominant solutions were obtained so that we can say that all the considered input parameters (dry product mass, clay, ash and organic raw materials) significantly influence the values of the output parameters (CO, NO and CH 4 ). For the prediction of CO and CH 4 , which are gas emissions with energy potential, discharged to the furnace, the best correlation coefficients with kNN (k = 10) and Random Forest (1000 trees) were obtained, respectively: 0.6355 and 0.7384. These values are slightly lower than those reported by other authors in the literature, but which used regression algorithms, namely Random Forest regression to assess the impact of adding solid organic waste in the raw material used to obtain burnt bricks (on a laboratory scale) on their functional properties [7]. We consider that the use of the models developed in our study to predict the amount of CO and CH 4 can provide very useful information for process engineers, because they use experimental data obtained in an industrial plant in real working conditions in which up to 90,000 of pieces of bricks a day are obtained. It is possible to evaluate, with the help of the elaborated models, the impact of the modification of the composition of the raw material used for the manufacture of the bricks on the quantity of noxious substances that are evacuated to the furnace basket. In order to comply with the limits imposed by legal regulations for environmental protection, it is possible to reduce the number of test batches with different mixtures of raw materials which shall lead to maintaining and improving product performance on the same plant, respecting the same combustion curves in advantageous economic conditions.

Conclusions
This study approaches a real problem of industrial practice, i.e., trying to evaluate the impact of adding auxiliary materials (sawdust and sunflower seed husks) on the burning performance in an industrial brick burning facility. The research was conducted experimentally and by simulation.
Based on the experimental measurements performed on dry product mass, number of pieces/kiln car, total tons/day, amount of clay, amount of ash, amount of organic raw materials and values of pollutants measured in the chimney (CO, NO and CH 4 ), a database containing information on 121 batches of bricks was developed.
The simulation methods applied here include artificial neural networks and regression algorithms (nearest neighbor, k-nearest neighbor, support vector regression, random forest) for modeling and a grid search for multi-objective optimization.
The best models were feed-forward neural networks with two hidden layers, having MSE < 0.01 and r 2 > 0.82 and, as regression models, kNN with error < 0.6.
These models were included into the optimization procedure to determine the working conditions (i.e., dry product mass, clay, ash and organic raw materials) that lead to the minimum quantities of gas emission (i.e., CO, NO and CH 4 ). For the prediction of CO and CH 4 , which are gas emissions with energy potential, discharged to the furnace, the best correlation coefficients with kNN (k = 10) and random forest (1000 trees) were obtained, i.e., 0.6355 and 0.7384.
The results obtained by simulation are useful for industrial practice replacing expensive experiments that consume significant resources of time, energy and materials.
In addition, the paper intends to present a new working methodology, a study by simulation based on tools provided by artificial intelligence, a method that can be applied under different conditions and on different data sets.