The Application of a Hybrid Model Using Mathematical Optimization and Intelligent Algorithms for Improving the Talc Pellet Manufacturing Process

Moisture is one of the most important factors impacting the talc pellet process. In this study, a hybrid model (HM) based on the combination of intelligent algorithms, self-organizing map (SOM), the adaptive neuron fuzzy inference system (ANFIS) and metaheuristic optimizations, genetic algorithm (GA) and particle swarm optimization (PSO) is introduced, namely, HM-GA and HM-PSO. The main purpose is to predict the moisture in the talc pellet process related to symmetry in the aspect of real-world application problem. In the combination process, SOM classifies the suitable input data. The GA and PSO, as the training algorithms of ANFIS, are investigated to compare the prediction skill. Five factors, including talc powder, water, temperature, feed speed, and air flow of 52 experiment cases designed by central composite design (CCD), are the training set data. Three different measures evaluate the capacity of moisture prediction. The comparison results show that the HM-PSO can provide the smallest difference between train and test datasets under the condition of the moisture being less than 5%. As a result, the HM-PSO model achieves the best result in predicting the moisture for the talc pellet process with R = 0.9539, RMSE = 1.0693, and AAD = 0.393, compared to others.


Introduction
Talc is a mineral occurring naturally in the form of crystalline hydrated magnesium silicate, with a chemical formula of M g3 SiO 10 (OH) 2 . Talc has low abrasion, high thermal conductivity and stability, low electrical conductivity, and high oil and grease adsorption [1]. Due to its unique surface chemistry, lamellar crystal habit, and properties, talc minerals are widely applied commercially and industrial, such as in cosmetics, pharmaceuticals, paints, polymers, and ceramics. Furthermore, the method evaluated the suitable properties of talc, which can contribute to the industry in terms of the efficiency of production planning.
In the talc forming process, many factors, especially moisture, plays an important role. The more accurate the moisture forecasting, the better the quality of the talc pellet. Over the last decade, artificial neural networks (ANNs) have become a popular technique for data prediction due to their high accuracy. Loveday et al. [2] apply an ANN to decrease the time for palm oil production. The developed model provides a reliable result and high efficiency. Adaptive neuro-fuzzy inference systems (ANFIS) is a kind of ANN usually applied in various fields to study, for example, economic order quality, water level prediction, medicine, and markets [3]. The ANFIS is an algorithm that combines the advantages of both ANN and fuzzy inference systems (FIS). It has the ability to capture From Figure 1, the talc pellet forming process consists of five steps. Firstly, talc is ground by a Raymond mill. Secondly, the ground talc is conveyed into a mixing tank. Thirdly, the mixed talc is compacted with mechanical force by a double roller and the mixture pressed through a 5 mm sieve to form material for sintering. Fourthly, talc is sintered with LPG by using a spiral drying conveyer. Finally, the talc pellet is produced and sent to the hopper and drying tube for further use.
In talc pellet moisture measurement, talc obtained according to the testing condition from 52 experimental designs can be measured, as in the ASTM D2216-98 standard [25]. The moisture of the processed can be calculated as: where C M is the talc pellet moisture (%), 1 W is the weight of an empty talc container (g), 2 W is the weight of talc before drying (g), and 3 W is the weight after drying (g).

Self-Organizing Map
The basic concept of SOM is in the concept of a transformation a complex, high-dimensional input data into a simple low-dimensional discrete output [26]. The SOM, an unsupervised learning algorithm, comprises three essential phases: competition, cooperation, and adaptation. Before training, the initial values of the learning rate, radius of the neighborhood, the number of iterations and the number of patterns, and the SOM array size are required [27].
At the start of learning the weight vector, i w , is generated by a random number and input vector, x, is a random distribution which corresponds to the column index. The set of weight vectors is formed as    [ ],i 1, 2,..., , j 1, 2,..., where x k is the number of row and y k is the number of columns. The three phases for calculating the SOM algorithm are shown below. In competition, the Euclidian distance between the input vector and the neuron with the weight vector of the given neuron, c w , is computed as: From Figure 1, the talc pellet forming process consists of five steps. Firstly, talc is ground by a Raymond mill. Secondly, the ground talc is conveyed into a mixing tank. Thirdly, the mixed talc is compacted with mechanical force by a double roller and the mixture pressed through a 5 mm sieve to form material for sintering. Fourthly, talc is sintered with LPG by using a spiral drying conveyer. Finally, the talc pellet is produced and sent to the hopper and drying tube for further use.
In talc pellet moisture measurement, talc obtained according to the testing condition from 52 experimental designs can be measured, as in the ASTM D2216-98 standard [25]. The moisture of the processed can be calculated as: where M C is the talc pellet moisture (%), W 1 is the weight of an empty talc container (g), W 2 is the weight of talc before drying (g), and W 3 is the weight after drying (g).

Self-Organizing Map
The basic concept of SOM is in the concept of a transformation a complex, high-dimensional input data into a simple low-dimensional discrete output [26]. The SOM, an unsupervised learning algorithm, comprises three essential phases: competition, cooperation, and adaptation. Before training, the initial values of the learning rate, radius of the neighborhood, the number of iterations and the number of patterns, and the SOM array size are required [27].
At the start of learning the weight vector, w i , is generated by a random number and input vector, x, is a random distribution which corresponds to the column index. The set of weight vectors is formed as w i = [w ij ], i = 1, 2, . . . , k x , j = 1, 2, . . . , k y where k x is the number of row and k y is the number of columns. The three phases for calculating the SOM algorithm are shown below.
In competition, the Euclidian distance between the input vector and the neuron with the weight vector of the given neuron, w c , is computed as: The neuron with the most similar weight vector to the input will search for the winner neuron, the best matching unit (BMU). BMU is calculated as: In cooperation, the collected neighborhood function is used in this study is the Gaussian function, computed as: The parameter η c ij represents the radius of the neighborhood between nodes w c and w ij . Two-dimensional vectors, R c and R ij , include w c and w ij [28].
In adaptation, the weight vector is adjusted after obtaining the winning neuron to increase the similarity with the input vector. The rule for updating the weight vector is given by: Here, h c ij (t) is a neighborhood function and t is the order number of a current iteration. The learning rate functions, α(t) is defined as follows: Here, T is the number of total iterations and t is the order number of a current iteration [28]. However, it is under the condition η c ij ≤ αmax(k x k y ), 1 for all cases of analysis. The percent of occurrence or frequency of each pattern is the number of occurrences divided by the total number of samples. The probability that specific humidity would map to any pattern is 1/n, where n is the number of patterns. The significance of the frequency can be determined by calculating the 95% confidence interval around the expected probability of 1/n. Assuming that the process is a binomial, the 95% confidence limits are calculated by: where p is the probability that any sample maps to any pattern and N is the number of input vector used to train the map [27].

Adaptive Neuro-Fuzzy Inference System
The adaptive neuro-fuzzy inference system (ANFIS) was first introduced in 1993 by Jang [29]. It is the method that powerfully integrates ANNs and fuzzy inference systems (FIS). For constructing a set of fuzzy, if-then rules with appropriate membership are applied to determine the relationship between the input and output variables. There are two inference systems in fuzzy logic while the inference system of Takagi-Sugeno-Kang is usually applied [13]. Figure 2 shows the structure of ANFIS.
where c ij is the midpoint value and s ij is the standard deviation value of the input variable at x j .

•
Layer 2: Calculate each node by multiplying the fuzzy value. The output is calculated as: Symmetry 2020, 12, 1602 5 of 18 • Layer 3: Sum the fuzzy value of every node to one value by: • Layer 4: Normalize the fuzzy value of every node by: where f i are the consequent parameters from Takag-Sugeno-Kang's pattern.

•
Layer 5: Sum all output from layer four to obtain the final output by: Symmetry 2020, 12, x FOR PEER REVIEW 5 of 18 Figure 2. An architecture of ANFIS [12].
From Figure 2, there are five layers in ANFIS. The fuzzy if-then rules are considered to explain the rule of each layer as follows:  Layer 1: Adjust every node by using Equation (9): where ij c is the midpoint value and ij s is the standard deviation value of the input variable at j x .
 Layer 2: Calculate each node by multiplying the fuzzy value. The output is calculated as:  Layer 3: Sum the fuzzy value of every node to one value by:  Layer 4: Normalize the fuzzy value of every node by: where f are the consequent parameters from Takag--Sugeno-Kang's pattern.

The ANFIS Training Algorithm
Genetic Algorithm The genetic algorithm (GA) is one of the most effective algorithms in metaheuristic optimization, first presented by [30] and completed in 1989 by [31]. GA is an imitated process of natural selection and genetics to find the optimal formula for predicting. The basic procedures of GA are as follows.

•
Chromosome encodes: Design the chromosomes as the system-represented solution by using any encoding method on the solving condition.

•
Population initialization: Initialize the prototype population at the beginning of GA. The first population group is randomly created by matching with the defined population size.
• The fitness function: Define the score of each possible solution. Every chromosome implies the fitness of the inheritance consideration for themselves in order to create the next-generation chromosome. • Selection: Select the genetic operator that supports the worthy member to transfer into the next generation. The process of selecting the best chromosome among the whole population is normally selected by good origin for good species according to the natural selection concept. • Crossover: The copying of the new chromosome is pasted at a random position of the father and behind the random position of the mother to become the first offspring chromosome. The second offspring chromosome occurs by the same process as the first offspring while switching the position of the father and mother. Termination condition: Terminate the procedure when the condition is satisfied.

Particle Swarm Optimization
Particle swarm optimization (PSO), invented for solving the non-linear optimization introduced by Kennedy and Eberhart [32], is based on the concept of the foraging of bird flock behavior to find the optimized solution area. Each of the birds in the flock is represented with the particle. In each particle, the fitness value implies the distance between the particle and food source as having the best fitness value in each interval the fitness value of the particle which be found by the Equation (14): (14) In defining the particle, x i is the defined fitness function. Accordingly, PSO begins with randomizing a set of particle positions, then optimizing by adjusting the parameters in each decision cycle. Each particle keeps their best position value, P best,i during that interval, including the whole particle best position data, in every process interval t, and the movement speed would be adjusted by using P best,i and G best , which can be demonstrated by Equation (15) at the next time step, t + 1, where t ∈ [0, . . . , N] and can be calculated by Equation (16) at time step t, respectively [33]: where P best,i is the best position that the individual particle, i has visited since the first time step, G best is the best position discovered by any of the particles in the entire swarm, where P best,i is the best particle. In this method, each individual particle, i ∈ [1, . . . , n], where n > 1, has been calculated in the search space x i . The new velocity is calculated as in Equation (17): where v t ij is the velocity of the particle i in the dimension j of time t, ω is an inertia weight, x t ij is a position, P t best,i is the best position of a particle, G best is the best position of the whole particle system, c 1 and c 2 are the constant accelerations in searching, and r t 1 j and r t 2 j are the random numbers between 0 and 1 at time t.

Performance Evaluation
In order to evaluate the superiority of the model generated by ANFIS, three techniques, absolute average deviation (AAD), root mean square error (RMSE), and correlation coefficient (R), are applied. R (Equation (18)) is used to measure how close the predicted value is to the experimental value. The closer each of these values is to 1 indicates a better prediction. RMSE (Equation (19)) and AAD (Equation (20)) are employed to investigate the accuracy of the model predictions [34]: where Q is the target value, E is the prediction value, and N is the total number of input data.

The Proposed Model
In this section, the proposed model is introduced. It is a novel technique based on a combination of SOM and ANFIS. The experimental design is described in Section 3.1. The hybrid model is introduced in Section 3.2, and the experiment setting is determined in Section 3.3.

The Experimental Design
In order to design the experiment, the central composite design (CCD) of the response surface methodology (RSM) is applied [35]. There are five factors influencing the talc pellet forming process in predicting the proper moisture. The input data consists of talc powder, water, temperature, feed speed, and air flow. According to the CCD, the maximum and minimum values of each variable are adjusted, as shown in Table 1. It consists of 52 experiments cases as shown in Table 2. According to Table 1, by using CCD, the maximum value of five input data, the scale value for α rotatability relative to ±1.0 in this study is tested at 2.378 when implemented in the real experiment, but the forming process failed when the scaling value for finding optimal α rotatability is changed at 2.00, 1.682, and 1.414, and also with rotatability values of −2.00, −1.682, and −1.414, respectively [36]. The experimental results found that α = 1.682 can be applied to the real forming process.

A Hybrid Model
A hybrid model (HM) is introduced based on the combination between SOM and ANFIS trained by GA and PSO. There are two main processes, including SOM and ANFIS, with two training algorithms: GA and PSO. Firstly, the SOM algorithm is applied to classify the appropriate input data before feeding into the ANFIS. Secondly, the selected input data are computed by ANFIS. In ANFIS, GA and PSO are employed as training algorithms. The HM trained by GA and PSO are called HM-GA and HM-PSO, respectively. The schematic of the proposed model is shown in Figure 3.

The Experimental Setting
In order to set the appropriate parameters of SOM, ANFIS, GA, and PSO, there are no theoretical existing criteria [37]. Table 3 shows the parameter setting of SOM and ANFIS. According to previous studies [13][14][15]38], all parameters of GA and PSO are determined, as shown in Table 4. Two parameters of GA, including the crossover percentage and the mutation percentage, are varied to find the optimal value. The crossover percentage is investigated from 0.6 to 0.9 with a step of 0.1. The mutation percentage is varied in the range (0, 1), with a step of 0.2. The proportion between the training and test dataset is 70% and 30%, respectively. In neural networks, the amount of training and testing data are dependent on many different aspects of the experiment. Hence, there is no minimum or maximum for sample size data. Generally, the 70% and 30% split for training and testing samples, respectively, can ensure better performance for generalization and accuracy models [39]. Symmetry 2020, 12, x FOR PEER REVIEW 9 of 18 Figure 3. The schematic HM for moisture prediction in the talc forming process.

The Experimental Setting
In order to set the appropriate parameters of SOM, ANFIS, GA, and PSO, there are no theoretical existing criteria [37]. Table 3 shows the parameter setting of SOM and ANFIS. According to previous studies [13][14][15]38], all parameters of GA and PSO are determined, as shown in Table 4. Two parameters of GA, including the crossover percentage and the mutation percentage, are varied to find the optimal value. The crossover percentage is investigated from 0.6 to 0.9 with a step of 0.1. The mutation percentage is varied in the range (0, 1), with a step of 0.2. The proportion between the training and test dataset is 70% and 30%, respectively. In neural networks, the amount of training and testing data are dependent on many different aspects of the experiment. Hence, there is no minimum or maximum for sample size data. Generally, the 70% and 30% split for training and testing samples, respectively, can ensure better performance for generalization and accuracy models [39].

Result and Discussion
In this section, the optimal map size of SOM is found in Section 4.1. Meanwhile, the optimal parameters of HM-GA and HM-PSO are searched in Sections 4.2 and 4.3, respectively. The comparison of the results and a discussion between HM-GA and HM-PSO is interpreted in Section 4.4.

The Results of the SOM Algorithm
In order to predict talc pellet moisture using HM, there are two main algorithms, including SOM and ANFIS. Firstly, the SOM algorithm is applied to classify the significant input data. The map size is a major parameter that impacts on the computational time. It can support classifying the number of clusters. The suitable case can be obtained by the user requirements [40]. In this study three different map sizes of 2 × 2, 3 × 3, and 4 × 4 are investigated. According to the experiment, it is found that only four clusters are of the optimal dimension map size. If the map size is greater than 2 × 2, some clusters have no members. Table 5 shows the frequency of the input data mapped to each cluster. From Table 5 it can be seen that there are two main patterns. The probability of occurrence for each pattern of a 2 × 2 map size is 1/4 or 25%. According to Equation (7), the confidence interval is in the range of 13.23 to 36.77%. From Table 6, the patterns of nodes (1,1) and (2,2) show the only two nodes with frequency values outside the confidence interval. After classifying the input data by SOM, 47 experimental cases, 25 from node (1,1) and 22 from node (2,2), are collected to use as input data for the HM model.

The Optimal Parameter of HM-GA
To obtain the optimized structure of HM-GA model, the influencing parameters, such as population size, crossover percentage, and mutation percentage, are changed to find the suitable structure, as shown in Figure 4a,c. The root mean square error is applied to measure the applicability of the optimum parameters.
From Figure 4a, the population sizes are changed in a range between 50 and 500, presented in 10 different colors. The red line gives the smallest RMSE value and reaches stability after 5000 iterations. Thus, the proper population size is examined at 350. A crossover percentage with a population size of 350 is varied in a range between 0.6 and 0.9, as shown in Figure 4b and indicated with four different colors. It can be clearly seen that the red line shows the smallest RMSE value. Hence, the crossover percentage of 0.7 is the appropriate case for this study. Simultaneously, a mutation percentage with population size of 350 and crossover percentage of 0.7 is considered in the range between 0 and 1, as shown in Figure 4c and displayed with nine different colors. At a mutation percentage of 0.3, the red line shows the experimental results providing the smallest RMSE value. As mentioned in these figures, it can be summarized that, for the HM-GA model, a population size equal to 350, crossover percentage equal to 0.7, and mutation percentage equal to 0.3 leads to the best predictive network in moisture predicting for the talc pellet forming process.

The Optimal Parameter of HM-PSO
The optimal parameter can increase the reliability of the model. For HM-PSO, the important parameters of PSO were assessed using a trial and error process. The different values of population sizes and inertia weight are investigated.
As in Figure 5a, 10 different cases of population size varied from 50 to 500 for iterations are presented. In Figure 5b, the inertia weight changed from 0.2 to 1 by a step of 0.2 for 5000 iterations are examined. From these figures, it can be concluded that, for the HM-PSO model, a population size equal to 450 and an inertia weight equal to 1.0 can lead to the best predictive network.

The Comparison Results Approach from HM-GA and HM-PSO
In order to predict the moisture in the talc pellet process, SOM is applied to select the appropriate input data and then the data are fed into ANFIS. The GA and PSO are chosen as training methods. As mentioned in Section 4.2, the optimal population size, crossover percentage, and mutation percentage of HM-GA and ANFIS-GA are 350, 0.8, and 0.6, respectively. As in Section 4.3, the optimal population size and inertia weight of HM-PSO and ANFIS-PSO are 450 and 1.0, respectively. To compare the performance of the proposed HM-GA and HM-PSO model, the ANFIS model without clustering trained by GA and PSO is examined, namely, ANFIS-GA and ANFIS-PSO. Table 6 shows the comparison of the performance between four models of HM-GA, HM-PSO, ANFIS-GA and ANFIS-PSO.
From Table 6, in the process of creating a model generated by the training dataset, ANFIS-GA has the highest R of 0.9784, and the lowest RMSE and AAD of 0.7203 and 0.314, respectively. In the test process, the HM-PSO has the highest R of 0.9192, the lowest RMSE of 0.9785, and the lowest AAD of 0.376. The relationship between the target and predicted moisture is a strong positive association. The value of AAD is used to measure the average distance between each data point and the mean. As can be seen in Table 6, the HM-PSO model demonstrates the smallest different value of AAD between the training and test data. Meanwhile, other models have larger different values of AAD between the training and test data. It can be obviously seen that three indicators of the HM-PSO indicates a smaller difference in value to both the training and test data than others. Additionally, the convergence speed of HM-PSO is faster and the predictive values conform with the measured values than others. It can be concluded that the HM-PSO has the most reliable results in predicting moisture in the talc forming process. Figure 6 shows the difference between the target and output values of the training and testing datasets.
According to Figure 6, the variation between the predicted moisture and target moisture is displayed. The red line represents the target moisture and the blue line is the predicted moisture. In the training process of each experiment, two models with SOM perform more similarity than two other models without SOM, although all models provide a high correlation coefficient. Simultaneously, for testing process, it is clearly seen that there are many experiment cases of ANFIS-GA and ANFIS-PSO that are too different between the target and predicted moisture values. By using SOM, the differentiation can be reduced, as in Figure 6a,b. The correlation between the target and predicted moisture values is shown in Figure 7.
In Figure 7, the relationship between the target and predicted moisture of four models is performed. It can be seen that the predicted moisture of HM-PSO, Figure 7b, lies in a relatively straight line for both the training and test data. The R value obtained from the HM-PSO is close to 1, R = 0.9539 for training data and R = 0.9192 for test data. Regarding the model, HM-PSO is a representative model for moisture prediction in the talc forming process. to 350, crossover percentage equal to 0.7, and mutation percentage equal to 0.3 leads to the best predictive network in moisture predicting for the talc pellet forming process.  parameters of PSO were assessed using a trial and error process. The different values of population sizes and inertia weight are investigated. As in Figure 5a, 10 different cases of population size varied from 50 to 500 for iterations are presented. In Figure 5b, the inertia weight changed from 0.2 to 1 by a step of 0.2 for 5000 iterations are examined. From these figures, it can be concluded that, for the HM-PSO model, a population size equal to 450 and an inertia weight equal to 1.0 can lead to the best predictive network.

The Comparison Results Approach from HM-GA and HM-PSO
In order to predict the moisture in the talc pellet process, SOM is applied to select the appropriate input data and then the data are fed into ANFIS. The GA and PSO are chosen as training methods. As mentioned in Section 4.2, the optimal population size, crossover percentage, and mutation percentage of HM-GA and ANFIS-GA are 350, 0.8, and 0.6, respectively. As in Section 4.3, the optimal  Table 6 shows the comparison of the performance between four models of HM-GA, HM-PSO, ANFIS-GA and ANFIS-PSO. From Table 6, in the process of creating a model generated by the training dataset, ANFIS-GA has the highest R of 0.9784, and the lowest RMSE and AAD of 0.7203 and 0.314, respectively. In the test process, the HM-PSO has the highest R of 0.9192, the lowest RMSE of 0.9785, and the lowest AAD of 0.376. The relationship between the target and predicted moisture is a strong positive association. The value of AAD is used to measure the average distance between each data point and the mean. As can be seen in Table 6, the HM-PSO model demonstrates the smallest different value of AAD between the training and test data. Meanwhile, other models have larger different values of AAD between the training and test data. It can be obviously seen that three indicators of the HM-PSO indicates a smaller difference in value to both the training and test data than others. Additionally, the convergence speed of HM-PSO is faster and the predictive values conform with the measured values than others. It can be concluded that the HM-PSO has the most reliable results in predicting moisture in the talc forming process. Figure 6 shows the difference between the target and output values of the training and testing datasets.
According to Figure 6, the variation between the predicted moisture and target moisture is displayed. The red line represents the target moisture and the blue line is the predicted moisture. In the training process of each experiment, two models with SOM perform more similarity than two other models without SOM, although all models provide a high correlation coefficient. Simultaneously, for testing process, it is clearly seen that there are many experiment cases of ANFIS-GA and ANFIS-PSO that are too different between the target and predicted moisture values. By using SOM, the differentiation can be reduced, as in Figure 6a,b. The correlation between the target and predicted moisture values is shown in Figure 7.

Conclusions
The hybrid model, based on a combination of SOM and ANFIS, is introduced as the proposed model for moisture prediction in the talc forming process in Uttaradit, Thailand. The GA and PSO algorithms are selected as the training algorithms of ANFIS. Five important factors-talc powder, water, temperature, feed speed, and airflow-affecting moisture in the talc pellet forming process were recognized and appropriate data were collected. In order to verify the proposed model, HM-GA, HM-PSO, ANFIS-GA, and ANFIS-PSO are compared. As a result, the HM-PSO model gives a high correlation coefficient for both training and test data with R = 0.9539 and R = 0.9192, respectively. Furthermore, HM-PSO still has a similar RMSE value for training and test data, of about 0.09. For other models, it has a rather large, different RMSE value between training and test data. Therefore, HM-PSO performs more reliably compared to the other algorithms. Since it is a real-world problem occurring in Uttaradit, Thailand, no one applies this method to the talc pellet process. The results, therefore, cannot compare with earlier research. The HM has some limitations according to the optimal parameters: It is only suitable for this study. For other real-world problems, it is necessary to identify the optimal parameters of HM. In this study, it can be said that the idea of raw data management by using SOM to identify a similar group of data is very helpful to obtain the most significantly information to feed into ANFIS. This can reduce the computation time during the training process. The method with clustering can improve the prediction skill compared to the method without clustering, efficiently. For further study, more clustering methods, such as k-means, and more ANFIS training algorithms, such as bee colony or ant colony optimization, should be applied and compared with this study.