Hybrid Serving of DOE and RNN-Based Methods to Optimize and Simulate a Copper Flotation Circuit

: Prediction of metallurgical responses during the ﬂotation process is extremely vital to increase the process efﬁciency using a proper modeling approach. In this study, two new variants of the recurrent neural network (RNN) method were used to predict the copper ore ﬂotation indices, i.e., grade and recovery within different operating conditions. The model input parameters including pulp pH and solid content as well as frother and collector dosages were ﬁrst analysed and then optimized using a two-step factorial approach. The statistical analysis showed a reliable correlation between operating parameters and copper grade and recovery with coefﬁcients of 99.86% and 94.50%, respectively. The main effect plots indicated that pulp pH and solid content positively affect copper grade while increasing the frother and collector dosages negatively inﬂuenced the quality of the ﬁnal concentrate. Despite the same effect from pulp pH, reverse effects from other variables were observed for copper recovery. Process optimization revealed that maximum copper recovery of 44.39% with a grade of 11.48% could be achieved under the optimal condition as pulp pH of 10, solid content of 20%, and frother and collector concentrations of 25 g/t and 9.9 g/t, respectively. Then, the predictive efﬁciency of long short-term memory (LSTM) and gated recurrent unit (GRU) networks with proper structure were evaluated using mean square error ( MSE ), root mean square error ( RMSE ), mean absolute percentage error ( MAPE ), and correlation coefﬁcient ( R 2 ). The simulation results showed that the LSTM network with higher R 2 of 0.963 and 0.934 for copper grade and recovery, respectively, was more effective than the GRU algorithm with the corresponding values of 0.956 and 0.919, respectively. The results show that the LSTM model could be useful in predicting the copper ﬂotation behaviour in response to changes in the operating parameters. recovery. The results demonstrate that LSTM and GRU are useful deep networks for predicting time series and sequential data.


Introduction
Nowadays, simulation and modelling science are widely used as a reliable, fast, and low-cost solution to predict the behaviour of mineral processing units. Using simulations and modelling before or instead of experimental studies involving a large number of laboratory tests can significantly reduce executive and manpower costs. Furthermore, the flexibility of such approaches makes it possible to change or modify various stages of the process design without the need for a significant change in the initial redesign [1,2]. So far, several methods have been developed for modelling and simulation of mineral processing units, of which empirical and semi-empirical types and statistical models are the most common.
Despite the widespread use of these models, especially in simulation software development, each of the above approaches is associated with limitations that make their application in real operational environments challenging. For example, empirical and semi-empirical parametric models are developed based on a series of data sets from a specific process or multiple processes. Therefore, their use in new processes with different operating conditions leads to inefficiency and reduced accuracy in some cases. However, most models have tried to manage this limitation to some extent by considering the calibration coefficients [3,4]. The development of statistical models is easier than empirical models. However, the most important challenge of statistical models, in addition to being limited to the initial data of model development, is to include parameters that usually lack physical meaning. For example, variables that are the product of more than two major effects (the primary variable) and are abundantly observed in the design of experiment (DOE)-based models and cannot be interpreted in the model [5,6]. Besides, although mineral processing processes have relatively simple mechanical and physical aspects, they have very complex mechanisms, some of which still remain unknown. Metallurgical and operational factors as well as their interactions are major factors responsible for such complexities. As such, modelling and simulation of these processes have always posed a controversial topic.
In recent years, the use of expert system methods (ESM) such as artificial neural networks, and genetic algorithms have been proposed to simulate complicated separation techniques [7][8][9]. Intelligent algorithms process experimental data and transmit underlying knowledge in order to create a network structure. It is possible to apply ESM to implement intricate functions in a wide range of fields such as process automation, control and monitoring, medical diagnosis, and image analysis. Nowadays, we can solve difficult problems for humans and usual computers using efficiently trained intelligent algorithms [10][11][12]. Forecasting with ESMs has been one of the main uses of the algorithms, which have also shown excellent results. The good performance of ESMs has made them popular in different scientific fields, including mining and mineral processing. For instance, Jorjani et al. [13] used artificial neural networks to simulate the process of leaching rare earth elements from apatite concentrate on an industrial scale and showed that a reasonably accurate model could be developed using the improved ANN algorithm. In a later study, Milivojevic et al. [14] simulated the nickel ore leaching process to demonstrate that expert systems are more reliable than linear regression-based statistical models. In another research, Hoseinian et al. [15] used a hybrid neural-genetic algorithm to simulate copper recovery during a column leaching process of a copper ore sample on a pilot scale, and found that using an appropriate algorithm could yield reliable prediction results. In a recent study, using hybrid artificial neural networks and particle swarm optimization (PSO), Sobouti et al. [16] simulated lead recovery during the leaching of lead concentrate. The noteworthy aspect of the study was the wide range of operating parameters used in the simulation, such as temperature, liquid/solid ratio, stirring speed, fluoroboric acid concentration, and leaching time. They showed that an effective simulation of the process could be achieved using an optimized ANN-PSO algorithm. Artificial neural networks have also been successfully used to simulate mineral processing operations reported by Vyas et al. [17]. In their study, ANN was used to simulate and predict the spent catalyst bioleaching process with acceptable accuracy, and simulation results were presented in both numerical and graphical forms. According to Ghobadi et al. [18], the copper flotation circuit was modelled and optimized simultaneously using a genetic algorithm. They found that in comparison to conventional mathematical methods, an oriented genetic algorithm reduces the calculation time by 1/60 for a two-stage flotation system and can provide higher optimization accuracy. In another study, Gholami and Khoshdast [19] found that with a limited number of operating data, multiple metallurgical responses of the bioflotation process of coal can be accurately simulated using the ANN method. Their investigation into different algorithms for developing the ANN model demonstrated that the simulation accuracy is greatly influenced by the choice of the network algorithm. Recently, Gholami et al. [20] coupled historical data and deep learning techniques to predict the flotation behaviour of a copper mine in response to mineralogical and operational variables. They showed that mixed statistical/intelligent methods can be a promising approach to accurately simulate the flotation process with an accuracy of more than 95%.
The effective application of the recurrent neural network (RNN) method has also been reported by a few researchers. For example, radial basis function neural network, recurrent neural network, and multivariate nonlinear regression have been used to predict the metallurgical performance of the flotation column [21]. It was shown that the recurrent neural network forecasted the metallurgical performance of the flotation column better than the radial basis function neural network and multivariate nonlinear regression models. The same conclusions have also been recently reported by some other researchers [22,23] to estimate the mineral grade in the field of flotation. Inapakurthi et al. [24] proposed a method using RNN for simulation of industrial grinding circuit in the lead-zinc ore beneficiation process. They also demonstrated that the RNN model can successfully control a GC while tracking its set point without violating any constraints.
The main conclusion stemming from the above studies is that intelligent modelling methods along with limited amount of data of operational/process parameters can be used to simulate mineral processing operations successfully. Considering the results reported in the above studies, the simulation has a high degree of accuracy in most cases, which is highly desirable from an application perspective [25]. Hence, in this study, a midsize copper mineral processing plant was targeted to intelligent simulation using long short-term memory (LSTM) and gated recurrent unit (GRU) networks as two well-known RNN methods. Among intelligent modelling methods for time series and sequential data predictions, RNNs are the commonly used ones. LSTM and GRU have a similar overall structure, but LSTM is more complex. Therefore, in addition to accurate prediction of output variables, a comparison of these two methods is also considered in this paper. The main parameters on the efficiency of the flotation process including pulp pH, solid content, and concentrations of frother and collector were first optimized by the DOE method and then, were considered as inputs in the model. Moreover, an accuracy analysis was performed to investigate the reliability of copper grade and recovery prediction.

A Brief Description of the Processing Plant
Studies were performed in the Takht Gonbad copper processing plant (Sirjan, Iran). With a practical capacity of 230 t/h, this plant beneficiates a copper sulfidic ore with an average copper grade of 0.45% to a concentrate with a grade of 22 ± 2% through several stages of roughing and cleaning. The copper grade in the final tailings of this plant is about 0.1%. According to Figure 1, the crushed ore is first broken down by four ball mills with capacities of 50 to 80 t/h to reduce the particle sizes to 80% finer than 100 µm. Comminuted particles are fed to the hydrocyclone unit, classifying them into two fine and coarse parts. The hydrocyclone overflow, consisting of particles 80% finer than 75 µm, is directed to a conditioning tank with a capacity of 50 m 3 to prepare the feed for the flotation circuit. At this stage, flotation reagents including sodium isopropyl xanthate (Z11) and sodium dithiophosphate (DTU) as copper mineral collectors, methyl isobutyl carbinol (MIBC) and a polypropylene glycol with molecular weight of 395.61 g/mol (A65) as frothers, lime as pH regulator, and NaHS as pyrite depressant (according to the mineralogical composition of the input feed) are added to the pulp. The concentration of reagents is adjusted according to the metallurgical conditions of the plant. The range of changes in the amount of consumption of these reagents is presented in Table 1. Prepared pulp with a solid content of about 25% (w/w) is first introduced into the rougher line with five 50 m 3 tank cells. The concentrate of this step is directed to an eight-cell row, including four cleaner and four recleaner cells to increase the copper grade. The final concentrate of this stage leaves the circuit as the final concentrate after several stages of recirculation in the cleaning circuit. The tailing is also returned to the beginning of the rougher circuit. Rougher tailing transfers to the scavenging circuit consisting of two 6-cell rows. The tailing of this unit is considered the final tailings of the plant, but the concentrate of the first two cells from both lines is returned to the cleaning circuit, and the concentrate of the last four cells is fed to the rougher circuit.

Ore Sample and Reagents Used
A representative sample was taken from the hydrocyclone overflow as the feed of the flotation circuit ( Figure 1) to determine the physical and mineralogical characteristics of the studied ore. Samples were collected using an automatic scoop sampler with an adjustable container size. Light microscopy (Axio Imager 2 Pol, Zeiss, Germany) was used to determine the mineralogical composition of the ore. The particle size distribution of the sample was measured using the standard dry sieve analysis method. All the applied reagents, including MIBC and A65 frothers, Z11 and DTU collectors, and NaHS depressant were sourced from the company's warehouse.

Ore Sample and Reagents Used
A representative sample was taken from the hydrocyclone overflow as the feed of the flotation circuit ( Figure 1) to determine the physical and mineralogical characteristics of the studied ore. Samples were collected using an automatic scoop sampler with an adjustable container size. Light microscopy (Axio Imager 2 Pol, Zeiss, Jena, Germany) was used to determine the mineralogical composition of the ore. The particle size distribution of the sample was measured using the standard dry sieve analysis method. All the applied reagents, including MIBC and A65 frothers, Z11 and DTU collectors, and NaHS depressant were sourced from the company's warehouse.

Screening of Operational Variables
Due to the variety of the operating parameters, first, the effectiveness of each operational variable was evaluated using a fractional factorial experimental design utilizing Design-Expert software (Demo version 7.0.0, from Stat-Ease Inc., Minneapolis, MN, USA). The levels of each variable were selected based on monitoring the process over six months (autumn and winter 2021). Table 1 lists the variables and their levels considered in the screening design. The experimental design used for screening studies and their practical results are presented in Table 2. These studies evaluated the grade and recovery of copper in the rougher stage concentrate as the process responses.

Statistical Optimization Studies
Influential operating variables for optimization studies were selected according to the results of the screening studies as listed in Table 3. Then, a full factorial design was used to investigate the real impact of each parameter, their interactions, as well as process optimization. The optimization design with practical results is presented in Table 4. To evaluate the nonlinear effects of each parameter, the centre point was used with six replications. Moreover, each main experiment was replicated twice to achieve more reliable results and eliminate bias in the effects. Alike the screening tests, copper grade, and recovery at the rougher stage were considered as the process responses.

Flotation Experiments and Calculations
All the experiments were carried out in a standard D-12 Denver ® flotation machine equipped with a 4 L cell. To perform each test in the screening ( Table 2) and optimization (Table 4) experimental designs, after setting the operating conditions according to the run number in each design, the pulp mixture was agitated at impeller speed of 1000 rpm for 5 min in the flotation cell to ensure that all ore particles were well suspended. After conditioning, water was added to the cell to a specified level after conditioning. The pulp level was maintained constant during each test by constantly adding water as required. At the end of each experiment, the collected concentrates and tailings were weighed and dried in an oven at 60 • C for 24 h. Samples were then sent to the analysis laboratory to determine their copper content. The efficiency of the flotation process was evaluated in terms of final recovery and the grade of Cu using the following equation [26,27]: where R (%) is recovery, F (kg) and C (kg) are the total mass of feed and concentrate, respectively, f (%) and c (%) are elemental grades (%) of feed and concentrate, respectively.

(A) Recurrent neural network:
Since the long short-term memory (LSTM) and gated recurrent unit (GRU) networks are improved variants of the recurrent neural network (RNN) and are used in state-of-the-art deep learning applications, the structure of RNN was discussed first. The RNN is a type of feedforward neural network that maintains internal memory and is able to remember information throughout time. This property makes it proper for processing time series and sequential data [28]. The network structure and circuit diagram of RNN are shown in Figure 2.
determine their copper content. The efficiency of the flotation process was evaluated in terms of final recovery and the grade of Cu using the following Equation [26,27]: where R (%) is recovery, F (kg) and C (kg) are the total mass of feed and concentrate, respectively, f (%) and c (%) are elemental grades (%) of feed and concentrate, respectively.

(A) Recurrent neural network:
Since the long short-term memory (LSTM) and gated recurrent unit (GRU) networks are improved variants of the recurrent neural network (RNN) and are used in state-of-the-art deep learning applications, the structure of RNN was discussed first. The RNN is a type of feedforward neural network that maintains internal memory and is able to remember information throughout time. This property makes it proper for processing time series and sequential data [28]. The network structure and circuit diagram of RNN are shown in Figure 2. In the circuit diagram, RNN takes X0 from the input sequence and then delivers h0 as the output, which together with X1 are the inputs of the next step. Similarly, h1 along with X2 are the inputs to the next step, and so on. In such a manner, RNN constantly remembers the information during the training. The current state formula is as follows: In the circuit diagram, RNN takes X 0 from the input sequence and then delivers h 0 as the output, which together with X 1 are the inputs of the next step. Similarly, h 1 along with X 2 are the inputs to the next step, and so on. In such a manner, RNN constantly remembers the information during the training. The current state formula is as follows: where x t and h t are the input and output sequence of a RNN unit, respectively. Equation (3) is also used to apply tanh as the activation function, which helps to regulate the values that flow through the network. Finally, y t is the network output [29]: where W and h represent the weight and hidden vectors, respectively. W hh is the weight in the previous hidden state, W hx is the weight in the current input state, and W hy represents the weight in the output state. tanh is an activation function that implements nonlinearity. Although RNN is theoretically designed to predict time series, it is difficult to predict long time series in practice because of the length of information, which can cause a vanishing gradient [30]. LSTM and GRU are the updated versions of RNN to overcome this challenge.

(B) Long short-term memory network (LSTM):
LSTM is a special kind of RNN network that was introduced first by Hochreiter and Schmidhuber [31] and is able to tackle the problem of vanishing gradient and long length input processing of RNN. The LSTM network has internal mechanisms called gates. These gates (including input gates, forget gates, and output gates) control the data flow and also specify what data is important in the sequence and should be retained and what data should be removed. In this way, the network passes important information along the sequence chain to obtain the final output [32]. A structure diagram of LSTM is shown in Figure 3.
where W and h represent the weight and hidden vectors, respectively. Whh is the weight in the previous hidden state, Whx is the weight in the current input state, and Why represents the weight in the output state. tanh is an activation function that implements nonlinearity. Although RNN is theoretically designed to predict time series, it is difficult to predict long time series in practice because of the length of information, which can cause a vanishing gradient [30]. LSTM and GRU are the updated versions of RNN to overcome this challenge.

(B) Long short-term memory network (LSTM):
LSTM is a special kind of RNN network that was introduced first by Hochreiter and Schmidhuber [31] and is able to tackle the problem of vanishing gradient and long length input processing of RNN. The LSTM network has internal mechanisms called gates. These gates (including input gates, forget gates, and output gates) control the data flow and also specify what data is important in the sequence and should be retained and what data should be removed. In this way, the network passes important information along the sequence chain to obtain the final output [32]. A structure diagram of LSTM is shown in Figure 3. In Figure 3, xt, ht, and Ct are the input, the output, and the cell state at time t, respectively. gf controls the information flow from the previous time step and is called the forget gate. This gate determines whether or not the information from the previous step is used. The update gate, which is indicated with gu, is responsible for controlling the new information flow. This gate determines whether new information should be used at the current time step. go is the output gate and specifies how much of the previous information of time steps (previous and current) is transferred to the next time step. W, b, and σ are the weight of the model, the bias of the model and the activation function, respectively. The following equations represent the calculation process: In Figure 3, x t , h t, and C t are the input, the output, and the cell state at time t, respectively. g f controls the information flow from the previous time step and is called the forget gate. This gate determines whether or not the information from the previous step is used. The update gate, which is indicated with g u , is responsible for controlling the new information flow. This gate determines whether new information should be used at the current time step. g o is the output gate and specifies how much of the previous information of time steps (previous and current) is transferred to the next time step. W, b, and σ are the weight of the model, the bias of the model and the activation function, respectively. The following equations represent the calculation process: (C) Gated recurrent unit network (GRU): GRU was introduced by Chung et al. [33] to address the shortcomings of the traditional recurrent neural network and also to reduce the overload of LSTM architecture. The GRU has only two gates, including update and reset gates, compared with the LSTM. These gates are basically two vectors that are used to decide whether the information is transmitted to the output or not [34]. A structure diagram of GRU is shown in Figure 4 and relevant equations can be found below: (C) Gated recurrent unit network (GRU): GRU was introduced by Chung et al. [33] to address the shortcomings of the traditional recurrent neural network and also to reduce the overload of LSTM architecture. The GRU has only two gates, including update and reset gates, compared with the LSTM. These gates are basically two vectors that are used to decide whether the information is transmitted to the output or not [34]. A structure diagram of GRU is shown in Figure 4 and relevant equations can be found below: The reset gate specifies how much of the previous information is not needed (forgotten) and how much of the previous step information is used in the current step. The update gate specifies whether to use the previous state or current input (or a combination of both) at a time step. xt is the input at time t, and σ is the activation function and ht and ht−1 are the output at time t and t − 1, respectively. The calculation process is expressed by the following equations: where gr and gu are the outputs of reset gate and update gate at time t.

Modelling Process
RNNs are considered the state-of-the-art algorithm for sequential data. LSTM and GRU are RNN-based algorithms that were developed with new designs to address the weaknesses of traditional RNNs. In this study, four inputs were included as pH, solid The reset gate specifies how much of the previous information is not needed (forgotten) and how much of the previous step information is used in the current step. The update gate specifies whether to use the previous state or current input (or a combination of both) at a time step. x t is the input at time t, and σ is the activation function and h t and h t−1 are the output at time t and t − 1, respectively. The calculation process is expressed by the following equations:Ĉ where g r and g u are the outputs of reset gate and update gate at time t.

Modelling Process
RNNs are considered the state-of-the-art algorithm for sequential data. LSTM and GRU are RNN-based algorithms that were developed with new designs to address the weaknesses of traditional RNNs. In this study, four inputs were included as pH, solid content (%, w/w), A65 concentration (g/t), and Z11 concentration (g/t); and the outputs were Cu grade (%) and its recovery (%). To analyse the methods and estimate outputs, four models were developed using the inputs. Table 4 shows the inputs and outputs as the operation factors and responses, respectively. Although controlling the flow of information is the same in both LSTM and GRU, LSTM wraps the hidden state into a memory unit, and GRU just passes the full hidden content without any control directly to the next cell. The models' parameters, including number of hidden layers, number of epochs, etc., were carefully examined and selected based on the complexity of the data and trial and error procedure. Indeed, as a result of increasing the number of parameters, overfitting occurred. An overfitted model has high accuracy on training data but low accuracy on test data. For the number of epochs, increasing this parameter had no significant effect on the accuracy of the model; it only increased the time it took to develop the model. The optimal amounts of parameters were selected according to the mentioned points. In addition, the GRU and LSTM algorithms are similar in general structure, and the most important difference is that there are fewer parameters in GRU, which accounts for its faster training time. One of the purposes of using these two algorithms in this project, in addition to effectively predicting the outputs, is to examine the results of these two models in order to select a more appropriate algorithm. LSTM has a significant advantage over the GRU algorithm in accordance with its more complex structure. Figure 5 shows the flowchart of the modelling procedure to estimate the Cu grade and recovery using RNNs.
occurred. An overfitted model has high accuracy on training data but low accuracy on test data. For the number of epochs, increasing this parameter had no significant effect on the accuracy of the model; it only increased the time it took to develop the model. The optimal amounts of parameters were selected according to the mentioned points. In addition, the GRU and LSTM algorithms are similar in general structure, and the most important difference is that there are fewer parameters in GRU, which accounts for its faster training time. One of the purposes of using these two algorithms in this project, in addition to effectively predicting the outputs, is to examine the results of these two models in order to select a more appropriate algorithm. LSTM has a significant advantage over the GRU algorithm in accordance with its more complex structure. Figure 5 shows the flowchart of the modelling procedure to estimate the Cu grade and recovery using RNNs. Random forest (RF) and ANN with the Levenberg-Marquardt optimization algorithm (ANN-LMA) as the two other popular prediction methods were also implemented for comparison:

•
Random forest is a powerful learning method for classification and regression problems by constructing a multitude of decision trees at training time. This non-parametric method uses ensemble learning to avoid overfitting. To find the optimum number of trees, different numbers were tested. Based on the results, 26 and 22 trees Random forest (RF) and ANN with the Levenberg-Marquardt optimization algorithm (ANN-LMA) as the two other popular prediction methods were also implemented for comparison:

•
Random forest is a powerful learning method for classification and regression problems by constructing a multitude of decision trees at training time. This non-parametric method uses ensemble learning to avoid overfitting. To find the optimum number of trees, different numbers were tested. Based on the results, 26 and 22 trees were found to have the most accurate prediction for Cu grade and Cu recovery, respectively, and increasing the number of trees did not have significant impact on the accuracy of results. The optimum depth of the tree was also found to be 4 for both Cu grade and recovery models, and increasing the maximum depth of trees resulted in overfitting; • LMA is known as the preferred method for minimization in nonlinear least squares problems. LMA interpolates between the Gauss-Newton algorithm and the gradient descent method. Compared with Gauss-Newton, Levenberg-Marquardt is more robust and, in many cases, it will find a solution even if it starts very far from the final minimum. ANN-LMA was found to be superior in Ref. [19] against other metaheuristic algorithms. The proper structure for ANN-LMA was found to be 5-12-4-1 for the Cu recovery model and 5-10-5-1 for the Cu grade model based on trial and error.
The coding and modelling process was implemented using MATLAB software (Math-Works R2021b v9.11, MathWorks, Inc., Natick, MA, USA). Before modelling, normalization was applied to improve the networks training phase using Equation (15) [12]: where X n and X i are normalized and actual values, respectively. X min and X max are the minimum and maximum values of each subset (inputs-outputs). Besides the modelling process, the Spearman correlation analysis, to find out the relationship between inputs and outputs, and sensitivity analysis to calculate the effectiveness of each input data on the outputs were also applied; the outcomes are presented in the Section 3.

Results of Ore Characterization
The mineralogical results showed that chalcopyrite and chalcocite are the predominant copper-bearing minerals in the ore. The most important gangue minerals were pyrite and clay minerals. The particle size distribution of the studied ore sample is shown in Figure 6. The feed to the flotation circuit contains 80% by weight of particles finer than 78 µm.
problems. LMA interpolates between the Gauss-Newton algorithm and the gradient descent method. Compared with Gauss-Newton, Levenberg-Marquardt is more robust and, in many cases, it will find a solution even if it starts very far from the final minimum. ANN-LMA was found to be superior in Ref. [19] against other metaheuristic algorithms. The proper structure for ANN-LMA was found to be 5-12-4-1 for the Cu recovery model and 5-10-5-1 for the Cu grade model based on trial and error.
The coding and modelling process was implemented using MATLAB software (MathWorks R2021b v9.11, USA). Before modelling, normalization was applied to improve the networks training phase using Equation (15) [12]: where Xn and Xi are normalized and actual values, respectively. Xmin and Xmax are the minimum and maximum values of each subset (inputs-outputs). Besides the modelling process, the Spearman correlation analysis, to find out the relationship between inputs and outputs, and sensitivity analysis to calculate the effectiveness of each input data on the outputs were also applied; the outcomes are presented in the Section 3.

Results of Ore Characterization
The mineralogical results showed that chalcopyrite and chalcocite are the predominant copper-bearing minerals in the ore. The most important gangue minerals were pyrite and clay minerals. The particle size distribution of the studied ore sample is shown in Figure 6. The feed to the flotation circuit contains 80% by weight of particles finer than 78 μm.

Statistical Analysis of Screening DOE
To determine the most important parameters affecting metallurgical responses, Pareto charts were drawn based on the results of variance analysis. Pareto charts are quick tools used to assess the significance level and the type of impact (increasing or decreasing) of the parameters under study. In these charts, effects with a value greater than t-value are identified as the significant factors. The type of effect is also shown by the software Figure 6. Particle size distribution of the copper ore fed to the flotation circuit.

Statistical Analysis of Screening DOE
To determine the most important parameters affecting metallurgical responses, Pareto charts were drawn based on the results of variance analysis. Pareto charts are quick tools used to assess the significance level and the type of impact (increasing or decreasing) of the parameters under study. In these charts, effects with a value greater than t-value are identified as the significant factors. The type of effect is also shown by the software and based on the relevant statistical calculations with different colours. According to Pareto charts of grade and copper recovery in Figure 7, it can be seen that the most significant operating parameters affecting the grade of concentrate are solid percentage (B) > pH (A) > and concentration of A65 frother (C). Generally, the copper grade decreases significantly with the increasing solid content mainly due to transporting gangue minerals (i.e., pyrite, silicates and clay minerals) into the concentrate through the false flotation process [35]. This result is in line with the outcomes presented by previous studies [20,36]. Copper grade is also significantly affected by pH and A65 concentration. Although the effects of DTU (E) and MIBC (C) concentrations are relatively high, these effects are not statistically significant. select the most effective operational variables for detailed studies. In addition, according to the type of effect, the amount of levels considered in the optimization plan was also modified. For example, according to Figure 7, the effect of solid content on both responses is negative; therefore, the levels of this parameter in the screening design (20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) were reduced in the optimization design i.e., 10-20, and similarly, the amount of levels was increased for the frother concentration and decreased for the collector concentration. The pH levels were not changed due to the operational limitations.

Statistical Analysis of Optimization DOE
The first step in analysing the impact of operational variables on the process responses is developing a parametric model that can accurately predict the desired response in the operating space, i.e., within the low to high levels intended for the variables [37]. In the second step, after developing the initial model by the software, abnormal data were identified by examining the model parameters and the model was optimized by the user to achieve the best fitting results. The result of these measures for the data obtained in the flotation experiments was the development of nonlinear models for all process responses as below: Although the solid content (B) and the concentration of Z11 (F) are the most effective factors affecting copper recovery, their effects are both negative. It should be noted that since the interaction effects of the parameters in factorial designs cannot be analysed due to bias, providing any physical interpretation for the main effects observed in these diagrams cannot be reliable. For this reason, the results of this design have only been used to select the most effective operational variables for detailed studies. In addition, according to the type of effect, the amount of levels considered in the optimization plan was also modified. For example, according to Figure 7, the effect of solid content on both responses is negative; therefore, the levels of this parameter in the screening design (20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) were reduced in the optimization design i.e., 10-20, and similarly, the amount of levels was increased for the frother concentration and decreased for the collector concentration. The pH levels were not changed due to the operational limitations.

Statistical Analysis of Optimization DOE
The first step in analysing the impact of operational variables on the process responses is developing a parametric model that can accurately predict the desired response in the operating space, i.e., within the low to high levels intended for the variables [37]. In the second step, after developing the initial model by the software, abnormal data were identified by examining the model parameters and the model was optimized by the user to achieve the best fitting results. The result of these measures for the data obtained in the flotation experiments was the development of nonlinear models for all process responses as below: Cu Recovery (%) = 45.91 + 0.
where the factors are described in a coded form. The validation parameters for the developed models are listed in Table 5. According to Fisher's F-test and marginal probability value (p model < 0.0001), which are shown in Table 5, all of the suggested prediction models are significant. For assessing the significance of a predictive model, the residuals normal probability plot is an effective tool [38]. According to the normal probability plots shown in Figure 8, all responses were relatively uniform, confirming the assumptions of normality and the independent nature of residuals during the statistical analyses. In addition, the high values of the adjusted correlation coefficients also indicate the significance of the prediction models. The Pred R 2 values were reasonably high, indicating that the model is able to explain variability in the prediction of new observations with adequate accuracy, which is in reasonable agreement with the Adj R 2 values [39]. Another statistical measure, called adeq precision, shows the signal-to-noise ratio, and any value over 4 is considered desirable [6]. In this investigation, the ratios were 134.99 and 20.59 for copper grade and recovery, respectively. These values show an adequate signal so that the models can be used to navigate the design space and predict appropriately.
where the factors are described in a coded form. The validation parameters for the developed models are listed in Table 5. According to Fisher's F-test and marginal probability value (p model < 0.0001), which are shown in Table 5, all of the suggested prediction models are significant. For assessing the significance of a predictive model, the residuals normal probability plot is an effective tool [38]. According to the normal probability plots shown in Figure 8, all responses were relatively uniform, confirming the assumptions of normality and the independent nature of residuals during the statistical analyses. In addition, the high values of the adjusted correlation coefficients also indicate the significance of the prediction models. The Pred R 2 values were reasonably high, indicating that the model is able to explain variability in the prediction of new observations with adequate accuracy, which is in reasonable agreement with the Adj R 2 values [39]. Another statistical measure, called adeq precision, shows the signal-to-noise ratio, and any value over 4 is considered desirable [6]. In this investigation, the ratios were 134.99 and 20.59 for copper grade and recovery, respectively. These values show an adequate signal so that the models can be used to navigate the design space and predict appropriately.  Model Equations (16) and (17) were used to assess the significance of operating variables on process responses. Tables 6 and 7 show the ANOVA results within a confidence interval of 95%. As shown in Tables 6 and 7, the effects of all operational variables on  (16) and (17) were used to assess the significance of operating variables on process responses. Tables 6 and 7 show the ANOVA results within a confidence interval of 95%. As shown in Tables 6 and 7, the effects of all operational variables on process responses are statistically meaningful due to p-values less than 0.05. As seen in the analysis of variance tables, no interaction was considered in the analyses. Moreover, except in some cases for the copper recovery analysis, the dual interaction effects are significant. The statistically meaningless interactions for copper recovery are those between pH and solid content and Z11 concentration as well as the interaction between solid content and Z11 concentration.

Interpretation of the Main Effects
The main effect plots can be used to analyse the effects of the operating variables on the process response. These plots are an effective tool for assessing the influence of each variable on the target response. The response variation is shown in these plots for the level of variables used in the experimental design. Design Expert software uses, by default, the developed prediction model (Equations (16) and (17) Increasing the pH, in addition to improving the grade of copper, has also increased the recovery (Figure 10). As the pH increases, the efficiency of the xanthate collectors (Z11 in this study) improves, resulting in increased particles floatability. For this reason, particle recovery is also expected to improve. The negative effect of solid content on the recovery can be evaluated according to the interaction of this factor with the frother concentration, which is presented in Section 3.3.2. Given that the main effect plot for each factor is plotted while other factors are kept constant at their mid-level, the individual effect of the solid content may lead to the misleading conclusion that increasing the solid content in any condition causes reduction, leading to recovery. According to Figure 9, the copper grade increases with the increasing pH. Obviously, as the pH increases, the pyrite depression rate also increases, and as a result, the copper grade in the concentrate improves. The effect of increasing the solid content on copper grade is also positive and increasing. This effect can be attributed to the improved particle distribution within the flotation cells, thereby increasing the probability of copper-containing particles colliding with the bubbles and transporting them to the froth and concentrate. However, as shown in Figure 10, copper recovery decreased with the increasing solid content, which can be ascribed to the interaction between the solid content and the frother concentration, as will be discussed later. Increasing the concentration of both frother and collector reduces the grade of copper in the concentrate. With an increasing frother concentration, due to increased bubble stability and increasing the rate of bubbly regime in the pulp zone, the entrainment rate will most likely increase due to the swarm phenomenon, and finally, the copper grade decreases as a consequence of the gangue particles transfer to the concentrate [40,41]. The effect of increasing the collector concentration is also due to the increase of hydrophobicity level and floatability of gangue minerals and, as a result, their transfer to the concentrate. Therefore, the copper grade in concentrate is expected to decrease. These phenomena, namely the increase in entrainment rate and the improvement of gangue floatability, will increase the solid transfer rate into the concentrate and, consequently, increase the recovery; this prediction is clearly seen in Figure 10.

Evaluation of the Interactive Effects
The surface plots of the response of a process against other independent variables can provide valuable insights into not only the individual effect of operating factors, but the potential interaction between them as well [42,43]. Thus, the surface plots for the studied flotation experiments were presented. As shown in Figures 11 and 12, the surface response plots illustrate the nonlinear effects of most interactions among four variables. As mentioned earlier, the interaction between solid content and frother concentration ( Figure  12) shows that maximum copper recovery can be obtained at a low level of frother concentration, whereas individual effects yielded contrary results. This conclusion can be directly attributed to the significant interaction between those variables. Due to this effect, it seems that the balance between the amount of particles in the system and the number of stable bubbles plays a significant role in improving the process efficiency. The same Increasing the pH, in addition to improving the grade of copper, has also increased the recovery (Figure 10). As the pH increases, the efficiency of the xanthate collectors (Z11 in this study) improves, resulting in increased particles floatability. For this reason, particle recovery is also expected to improve. The negative effect of solid content on the recovery can be evaluated according to the interaction of this factor with the frother concentration, which is presented in Section 3.3.2. Given that the main effect plot for each factor is plotted while other factors are kept constant at their mid-level, the individual effect of the solid content may lead to the misleading conclusion that increasing the solid content in any condition causes reduction, leading to recovery.

Evaluation of the Interactive Effects
The surface plots of the response of a process against other independent variables can provide valuable insights into not only the individual effect of operating factors, but the potential interaction between them as well [42,43]. Thus, the surface plots for the studied flotation experiments were presented. As shown in Figures 11 and 12, the surface response plots illustrate the nonlinear effects of most interactions among four variables. As mentioned earlier, the interaction between solid content and frother concentration ( Figure 12) shows that maximum copper recovery can be obtained at a low level of frother concentration, whereas individual effects yielded contrary results. This conclusion can be directly attributed to the significant interaction between those variables. Due to this effect, it seems that the balance between the amount of particles in the system and the number of stable bubbles plays a significant role in improving the process efficiency. The same behaviours are observed for copper grade, as it may be concluded that the highest copper grade can be achieved at a high level for pH and solid content and a low level for reagents dosages when referring to the main effect plots. However, Figure 11 clearly shows different results. Similar results are observed in the case of copper recovery when evaluating the interactions given in Figure 12. grade can be achieved at a high level for pH and solid content and a low level for reagents dosages when referring to the main effect plots. However, Figure 11 clearly shows different results. Similar results are observed in the case of copper recovery when evaluating the interactions given in Figure 12.

Process Optimization and Verification Studies
Maximum Cu grade and recovery were separately defined as target responses to find suitable operating conditions for each case. The operating conditions and the predicted results suggested by the software are given in Table 8. To assess the accuracy of the predictions in Table 8, the plant was permitted to operate for 10 days under the suggested operating conditions and the metallurgical responses were monitored by regular sampling from each operating shift. Finally, an average value was reported as the final practical result, as given in Table 8. Compared to the grade values, the higher differences between the practical recovery results and predicted values are due to the lower accuracy of the prediction model developed by the software (Table 5).

Correlation Coefficient Analysis
Correlation coefficient is an indicator that is used to measure the dependence or relationship between two variables. The correlation coefficient between each variable and the outputs (Cu grade and Cu recovery) confirms the existence of correlation. In this study, the Spearman correlation coefficient was used, which is defined as Equation (18) [44]: where ρ is Spearman's rank correlation coefficient, d i is the difference between the two ranks of each variable, and n is the number of samples. The correlation coefficient between each variable and the outputs is presented in Figure 13 and Table 9. The coefficients indicate that the variables had a good correlation and could be used to estimate Cu grade and Cu recovery.

Sensitivity Analysis
In order to determine the effect of each input on the amount of output, a sensitivity analysis was also applied. In this paper, the input parameters were pH, solid content, A65 concentration, and Z11 concentration, and output parameters were Cu recovery and Cu grade. To calculate the sensitivity analysis, the following equation was used [46]: where xi and xj are the input and output datasets, respectively. The effect of each input parameter on the outputs is shown in Figure 14. It can be seen in the figure that the solid content influenced both Cu recovery and grade the most. The results also confirm the correlation results to a great extent. Using the results of different tests, the variables that had a greater impact on the outputs were selected, which ultimately led to simulation models with greater accuracy and less error.  According to Table 9 and for this study, pH and A65 concentration had a good positive correlation with Cu grade; on the contrary, the solid content had a strong negative effect on Cu grade. It means that should the pH increase, the Cu grade decreases, as was also reported in Ref. [45]. For the Cu recovery, Z11 concentration and solid content had high negative correlation with this output, and pH and A65 concentration had positive correlation, although their correlation coefficient is lower than 0.5. It is important to note that in the correlation coefficient methods, the closer the coefficient is to 1 and −1, indicates a direct or inverse relationship between the two variables. However, a cause-and-effect relationship is not necessarily present, and the conclusion cannot be drawn on this basis.

Sensitivity Analysis
In order to determine the effect of each input on the amount of output, a sensitivity analysis was also applied. In this paper, the input parameters were pH, solid content, A65 concentration, and Z11 concentration, and output parameters were Cu recovery and Cu grade. To calculate the sensitivity analysis, the following equation was used [46]: where x i and x j are the input and output datasets, respectively. The effect of each input parameter on the outputs is shown in Figure 14.
grade. To calculate the sensitivity analysis, the following equation was used [46]: where xi and xj are the input and output datasets, respectively. The effect of each input parameter on the outputs is shown in Figure 14. It can be seen in the figure that the solid content influenced both Cu recovery and grade the most. The results also confirm the correlation results to a great extent. Using the results of different tests, the variables that had a greater impact on the outputs were selected, which ultimately led to simulation models with greater accuracy and less error. It can be seen in the figure that the solid content influenced both Cu recovery and grade the most. The results also confirm the correlation results to a great extent. Using the results of different tests, the variables that had a greater impact on the outputs were selected, which ultimately led to simulation models with greater accuracy and less error.

Model Prediction Analysis
The dataset was divided into three categories, including training (65%), validating (15%), and testing (20%). The validation dataset helps to tune hyper-parameters during the training of the models to prevent the models from over-fitting in the testing phase. The test performance of the 'RNNs' models is shown in Figures 15 and 16. The dataset was divided into three categories, including training (65%), validating (15%), and testing (20%). The validation dataset helps to tune hyper-parameters during the training of the models to prevent the models from over-fitting in the testing phase. The test performance of the 'RNNs' models is shown in Figures 15 and 16.
Mean square error (MSE), root mean square error (RMSE), mean absolute percentage error (MAPE), and R 2 were computed using the following equations in order to evaluate the performance of each model [43,44]: where is the actual data, is the estimated data, is the mean of actual data, ̅ is the mean of the estimated data, and N is the number of sample sets.  The performance of the LSTM, GRU, RF, and ANN-LMA models in predicting Cu grade and its recovery is shown in Tables 10 and 11. Based on the results, the performance of the RNNs is better than the other models; however, in the estimation of Cu recovery, the RF and ANN-LMA results are so close to the GRU. The RNN results are also close to each other (LSTM provides better accuracy). Although RF has a better result than ANN-LMA due to its tree structure in Cu grade estimation, when there are many calculations, a large number of trees can make the algorithm too slow and ineffective for real-time predictions. Both GRU and LSTM could validly estimate Cu grade and recovery. The results demonstrate that LSTM and GRU are useful deep networks for predicting time series and sequential data. Although the accuracy of the RNN models is close, the number of parameters and time efficiency of the algorithms should be considered in practical applications [47][48]. Mean square error (MSE), root mean square error (RMSE), mean absolute percentage error (MAPE), and R 2 were computed using the following equations in order to evaluate the performance of each model [43,44]: where y is the actual data,ŷ is the estimated data, a is the mean of actual data, e is the mean of the estimated data, and N is the number of sample sets. The performance of the LSTM, GRU, RF, and ANN-LMA models in predicting Cu grade and its recovery is shown in Tables 10 and 11. Based on the results, the performance of the RNNs is better than the other models; however, in the estimation of Cu recovery, the RF and ANN-LMA results are so close to the GRU. The RNN results are also close to each other (LSTM provides better accuracy). Although RF has a better result than ANN-LMA due to its tree structure in Cu grade estimation, when there are many calculations, a large number of trees can make the algorithm too slow and ineffective for real-time predictions. Both GRU and LSTM could validly estimate Cu grade and recovery. The results demonstrate that LSTM and GRU are useful deep networks for predicting time series and sequential data. Although the accuracy of the RNN models is close, the number of parameters and time efficiency of the algorithms should be considered in practical applications [47,48]. The study showed that GRU performed faster with less CPU usage because it has fewer parameters for training. LSTM is more sophisticated and provides more parameters such as the number of weight matrices, number of bias vectors, learning rates, etc. This can be considered as a penalty for such models, as described in detail by Hassanzadeh et al. [49]. A model's learning rate controls how quickly it can adapt to a new problem and is a crucial parameter for efficient training [29]. Larger learning rates result in rapid changes and require fewer training epochs; however, smaller learning rates allow the model to learn a more optimal set of weights, but may take significantly longer to train. The structure of GRU is more straightforward. It has one gate less than LSTM, which reduces matrix multiplication, and it can save time. However, through empirical research and as reported by Cahuantzi et al. [50] and Yang et al. [51], the advantage of GRU is only relevant for small datasets and when you have little memory. In other scenarios, when dealing with more extensive sequences, LSTM is preferred. Generally, it can be concluded that if the dataset is small, the GRU algorithm is preferable, whereas LSTM is favourable and more accurate for larger datasets. Furthermore, parameter optimization for the purpose of studying the influence of different setting parameters on these networks could be investigated in future research. To find insights on the structure of the estimation models, the importance and impact of each input variable on the outputs should be specified. The model estimation error was calculated when input values were randomly shuffled in order to assess the features importance. It is expected that permutation will increase the model's estimation error if the model relies on an input value for estimation. Due to the superior performance of LSTM models for Cu grade and recovery, this process was applied to LSTM models and the results are shown in Figure 17. Features importance results are well aligned with results obtained through the Spearman correlation and sensitivity analysis. Besides, LSTM's success in developing models and extracting meaningful characteristics from training data can also be confirmed by the results of feature importance and statistical analysis (Section 3.2). of LSTM models for Cu grade and recovery, this process was applied to LSTM models and the results are shown in Figure 17. Features importance results are well aligned with results obtained through the Spearman correlation and sensitivity analysis. Besides, LSTM's success in developing models and extracting meaningful characteristics from training data can also be confirmed by the results of feature importance and statistical analysis (Section 3.2).

Conclusions
The metallurgical response of a copper processing plant was predicted using two efficient variants of the recurrent neural network (RNN) method based on the effective operating parameters, including pulp pH and solid content as well as the concentrations of the frother and collector. For this purpose, the process was first evaluated using a twostep screening/optimization DOE based on factorial designs. Statistical results based on analysis of variance showed that there is a reliable correlation between the metallurgical responses, i.e., copper grade and recovery in concentrate, and operating variables.

Conclusions
The metallurgical response of a copper processing plant was predicted using two efficient variants of the recurrent neural network (RNN) method based on the effective operating parameters, including pulp pH and solid content as well as the concentrations of the frother and collector. For this purpose, the process was first evaluated using a two-step screening/optimization DOE based on factorial designs. Statistical results based on analysis of variance showed that there is a reliable correlation between the metallurgical responses, i.e., copper grade and recovery in concentrate, and operating variables. ANOVA results indicated that all operating variables significantly affect the metallurgical responses, such that the copper grade increased by increasing the pulp pH and solid content and decreased as the dosage of frother and collector were increased. Contrary results were observed with respect to the copper recovery, with the exception that, like grade, copper recovery increased with the increase of pulp pH due to interaction effect with other factors.
Afterwards, the optimization process was performed based on the identified significant factors and by applying particular adjustment in terms of the technical aspects of the process. The effect of the studied variables was also interpreted based on the main and interaction effects to obtain a meaningful vision for the simulation step. The correlation coefficient, mean, root mean square, and mean absolute percentage errors as well as variance account for values for the training and testing datasets for the copper grade and recovery using the long short-term memory (LSTM), and gated recurrent unit (GRU) networks were compared. The results showed that the LSTM algorithm was more efficient than the GRU network and can be applied to predict the metallurgical responses during the flotation process.