Soft Sensors in the Primary Aluminum Production Process Based on Neural Networks Using Clustering Methods

Primary aluminum production is an uninterrupted and complex process that must operate in a closed loop, hindering possibilities for experiments to improve production. In this sense, it is important to have ways to simulate this process computationally without acting directly on the plant, since such direct intervention could be dangerous, expensive, and time-consuming. This problem is addressed in this paper by combining real data, the artificial neural network technique, and clustering methods to create soft sensors to estimate the temperature, the aluminum fluoride percentage in the electrolytic bath, and the level of metal of aluminum reduction cells (pots). An innovative strategy is used to split the entire dataset by section and lifespan of pots with automatic clustering for soft sensors. The soft sensors created by this methodology have small estimation mean squared error with high generalization power. Results demonstrate the effectiveness and feasibility of the proposed approach to soft sensors in the aluminum industry that may improve process control and save resources.


Introduction
Although pure aluminum (Al) is one of nature's most abundant elements, it is extremely difficult to extract, and extraction is not possible without the occurrence of some chemical reaction. Al is always attached to some other chemical element in the form of salts or oxides, which makes separation necessary. In the 1880s, the young students Charles Hall and Paul Héroult used electrolysis to separate the Al of oxygen from alumina (Al 2 O 3 ) grains into salts fluxes such as cryolite (Na 3 AlF 6 ). This is the Hall-Héroult process [1,2] by which the primary aluminum industries perform can obtain Al up to 99.9% purity. Basically, this is the separation of alumina into alumina and oxygen, but the process also requires the participation of other elements such as flux salts, gases, and chemical additives to maintain process stability, which makes the process more complex [1,3]. This paper describes the process of designing soft sensors using the third methodology, which could present the best trade-off between complexity and quality of results. The engineering expertise is useful for determining the key process variables to include, and the ANN technique helps in variable indirect estimation within electrolytic bath furnace modeling using real data from an Al smelter plant. This paper's major contributions are as follows: clustering data by pots section; considering three different phases of pots, based on lifespan division; and comparing and proposing neural network estimators as soft sensors to replace manual measurements with automatic. The results show this is possible, since the models generate estimations with small errors. It is important to highlight ANN models created are dynamic, because delayed inputs were considered to estimate the current outputs. Briefly, the flowchart of the proposed method is presented by Figure 1. • An ANN for each pot, which might be too complex and difficult to apply, since it is necessary to tune hundreds of ANNs.

•
One ANN for a certain cluster of pots, which present similar behaviors.
This paper describes the process of designing soft sensors using the third methodology, which could present the best trade-off between complexity and quality of results. The engineering expertise is useful for determining the key process variables to include, and the ANN technique helps in variable indirect estimation within electrolytic bath furnace modeling using real data from an Al smelter plant. This paper's major contributions are as follows: clustering data by pots section; considering three different phases of pots, based on lifespan division; and comparing and proposing neural network estimators as soft sensors to replace manual measurements with automatic. The results show this is possible, since the models generate estimations with small errors. It is important to highlight ANN models created are dynamic, because delayed inputs were considered to estimate the current outputs. Briefly, the flowchart of the proposed method is presented by Figure 1. The rest of this work is organized as follows. Section 2 describes the primary Al production process and describes the layout of the Al smelter concerned in this paper. Section 3 addresses in detail the design of the ANN-based estimation models. Results and discussions are presented in Section 4. Finally, Section 5 provides the conclusions.

Brief Description of the Primary Aluminum Production Process
Softness, lightness, high thermal conductivity, and high recyclability are important properties of Al. A wide variety of products are derived from this metal, which has helped it to become the most frequently consumed nonferrous metal around the world [64]. The primary Al production process is complex, due to the handling of variables from multiple disciplines, such as electrical, chemical, and physical [65].
The raw material of Al is alumina. Direct Al extraction from alumina requires a temperature over 2000 °C [66]. The machinery to maintain this high temperature is expensive, and so is the energy waste under these requirements. From the late nineteenth century, the Hall-Héroult process has been used as an alternative to produce Al, as it consumes less energy and requires a lower temperature (about 960 °C) [1][2][3]. To reduce the heat, cryolite is used as an electrolytic bath and several chemical components are added together with alumina [67].
This process is widely known as Al smelting, which uses electrolysis pots, also named pots or reduction pots [68]. A pot (Figure 2) consists of a steel shell with a lining of fireclay brick for heat insulation, which, in turn, is lined with carbon bricks to hold the molten electrolyte. Steel bars carry the electric current through the insulating bricks into the carbon cathode floor of the pot. Carbon anode blocks are hooked onto steel rods and immersed in the electrolyte. Alumina molecules are The rest of this work is organized as follows. Section 2 describes the primary Al production process and describes the layout of the Al smelter concerned in this paper. Section 3 addresses in detail the design of the ANN-based estimation models. Results and discussions are presented in Section 4. Finally, Section 5 provides the conclusions.

Brief Description of the Primary Aluminum Production Process
Softness, lightness, high thermal conductivity, and high recyclability are important properties of Al. A wide variety of products are derived from this metal, which has helped it to become the most frequently consumed nonferrous metal around the world [64]. The primary Al production process is complex, due to the handling of variables from multiple disciplines, such as electrical, chemical, and physical [65].
The raw material of Al is alumina. Direct Al extraction from alumina requires a temperature over 2000 • C [66]. The machinery to maintain this high temperature is expensive, and so is the energy waste under these requirements. From the late nineteenth century, the Hall-Héroult process has been used as an alternative to produce Al, as it consumes less energy and requires a lower temperature (about 960 • C) [1][2][3]. To reduce the heat, cryolite is used as an electrolytic bath and several chemical components are added together with alumina [67].
This process is widely known as Al smelting, which uses electrolysis pots, also named pots or reduction pots [68]. A pot (Figure 2) consists of a steel shell with a lining of fireclay brick for heat insulation, which, in turn, is lined with carbon bricks to hold the molten electrolyte. Steel bars carry the electric current through the insulating bricks into the carbon cathode floor of the pot. Carbon anode blocks are hooked onto steel rods and immersed in the electrolyte. Alumina molecules are dissolved by the heat and decomposed into Al and oxygen (O) by electric current that flows through the electrolyte [69]. In modern smelters, process-control computers connected to remote sensors ensure optimal operation of electrolysis pots [70]. Electrolysis furnaces are organized within reduction rooms-standard Al smelting uses around four reduction rooms and between 900 and 1200 pots in total, depending on the smelter. dissolved by the heat and decomposed into Al and oxygen (O) by electric current that flows through the electrolyte [69]. In modern smelters, process-control computers connected to remote sensors ensure optimal operation of electrolysis pots [70]. Electrolysis furnaces are organized within reduction rooms-standard Al smelting uses around four reduction rooms and between 900 and 1200 pots in total, depending on the smelter. According to the stoichiometric relation (Equation 1), alumina is consumed in the production process together with the solid carbon of the anodes. Theoretically, this consumption is 1.89 kg of Al2O3 for each 1.00 kg of Al + , whereas 0.33 kg of carbon (C + ) produces 1.22 kg of carbon dioxide (CO2). In practice, typical values are 1.93 kg Al2O3 to 1.00 kg Al + and between 0.40 and 0.45 kg of C + to 1.00 kg Al + , with an emission of about 1.50 kg CO2 [69]. 2Al2O3 (dissolved) + 3C+ (solid) => 4Al+ (liquid) + 3CO2 (gas). (1) Several sensors monitor the entire process continuously, acquiring data from the entire plant. Data are stored and organized in databases, which became a rich patrimony of the plants, as they keep the historical information on each production pot. This data collection supports the building of automatic decision-making systems and guides for the engineers [71][72][73][74]. Many control systems display the data acquired in real time for the permanent monitoring of the process. Plant control systems for Al smelting have two modes of operation [74,75]: • Automatic control: Data are collected and processed by computers and/or microcontrollers, which then drive a control action on the plant without direct human intervention. Examples: control of electrical resistance of the pot by the anode-cathode distance (ACD) using pulse width modulation (PWM) to drive the lifting/lowering of anodes; and the control of alumina to be added to the electrolytic bath through mathematical models.

•
Manual control: Data are collected through plant floor sensors or manually measured by process operators, but the calculation of the output is performed by the process engineers, taking into account mathematical models and their expertise. Examples: thermocouple to measure the temperature of the pots (Figure 3), percentage of fluoride alumina in the bath (laboratory result), metal level of the pot, replacement of anodes, and Al tapping from the pot. According to the stoichiometric relation (Equation (1)), alumina is consumed in the production process together with the solid carbon of the anodes. Theoretically, this consumption is 1.89 kg of Al 2 O 3 for each 1.00 kg of Al + , whereas 0.33 kg of carbon (C + ) produces 1.22 kg of carbon dioxide (CO 2 ). In practice, typical values are 1.93 kg Al 2 O 3 to 1.00 kg Al + and between 0.40 and 0.45 kg of C + to 1.00 kg Al + , with an emission of about 1.50 kg CO 2 [69].
Several sensors monitor the entire process continuously, acquiring data from the entire plant. Data are stored and organized in databases, which became a rich patrimony of the plants, as they keep the historical information on each production pot. This data collection supports the building of automatic decision-making systems and guides for the engineers [71][72][73][74]. Many control systems display the data acquired in real time for the permanent monitoring of the process. Plant control systems for Al smelting have two modes of operation [74,75]:

•
Automatic control: Data are collected and processed by computers and/or microcontrollers, which then drive a control action on the plant without direct human intervention. Examples: control of electrical resistance of the pot by the anode-cathode distance (ACD) using pulse width modulation (PWM) to drive the lifting/lowering of anodes; and the control of alumina to be added to the electrolytic bath through mathematical models.

•
Manual control: Data are collected through plant floor sensors or manually measured by process operators, but the calculation of the output is performed by the process engineers, taking into account mathematical models and their expertise. Examples: thermocouple to measure the temperature of the pots (Figure 3), percentage of fluoride alumina in the bath (laboratory result), metal level of the pot, replacement of anodes, and Al tapping from the pot. be added to the electrolytic bath through mathematical models.

•
Manual control: Data are collected through plant floor sensors or manually measured by process operators, but the calculation of the output is performed by the process engineers, taking into account mathematical models and their expertise. Examples: thermocouple to measure the temperature of the pots (Figure 3), percentage of fluoride alumina in the bath (laboratory result), metal level of the pot, replacement of anodes, and Al tapping from the pot. The experiments conducted in this paper were derived from a real Brazilian Al smelter, from which real data were used to generate results. The pots are arranged in four reductions, each of which has two rooms, and each room has 120 pots, resulting in 960 pots. Figure 4 shows the overall layout of this factory.  The experiments conducted in this paper were derived from a real Brazilian Al smelter, from which real data were used to generate results. The pots are arranged in four reductions, each of which has two rooms, and each room has 120 pots, resulting in 960 pots. Figure 4 shows the overall layout of this factory. Electrically, Al reduction pots are connected in series. This connection allows the continuous electric current (approximately 180 kA) to be the same in all pots. It should be noted that for a room there are two lines of electricity, each line composed of two sections, which in turn contain 30 pots, resulting in 32 different sections for the entire smelter. Figure 5 outlines the arrangement of the sections for reduction I and the first room. This same organization is present in all rooms of the Electrically, Al reduction pots are connected in series. This connection allows the continuous electric current (approximately 180 kA) to be the same in all pots. It should be noted that for a room there are two lines of electricity, each line composed of two sections, which in turn contain 30 pots, resulting in 32 different sections for the entire smelter. Figure 5 outlines the arrangement of the sections for reduction I and the first room. This same organization is present in all rooms of the smelter concerned and these pots' disposition was used as clusters empirically; each cluster is a section.

Design of Estimation Models
The full database has hundreds of thousands of samples and hundreds of process features (variables) from 2006 to 2016. The following subsection depicts the preprocessing steps performed in the original database in order to generate the datasets used in this work.

Data Extraction, Imputation, and Split
Data extraction considered the entire life of each pot, in other words a lifespan from 1 to 1500 days, taking into account an average of five years of operation. Table 1 shows all variables available in the database. Therefore, features selection considered Pearson correlation (R), between input and output, to rank variables by degree of importance. It is important to know that some variables have a large number of null values, so they were discarded. R is calculated as: where n is sample size, xi and yi are the individual sample points indexed with i, and ̅ and are the sample averages.

Design of Estimation Models
The full database has hundreds of thousands of samples and hundreds of process features (variables) from 2006 to 2016. The following subsection depicts the preprocessing steps performed in the original database in order to generate the datasets used in this work.

Data Extraction, Imputation, and Split
Data extraction considered the entire life of each pot, in other words a lifespan from 1 to 1500 days, taking into account an average of five years of operation. Table 1 shows all variables available in the database. Therefore, features selection considered Pearson correlation (R), between input and output, to rank variables by degree of importance. It is important to know that some variables have a large number of null values, so they were discarded. R is calculated as: where n is sample size, x i and y i are the individual sample points indexed with i, and x and y are the sample averages.  Table 2 lists the most important inputs associated with output variables selected to create the estimation models. Firstly, the inputs have been determined after a Pearson correlation study (Equation (2)). After that, process engineers validated the feature selection to the model. It is important to note that all input variables are delayed by one step, because neural models emulate a first order dynamic system with delayed inputs to estimate the current output. The final selected dataset had about 1,728,000 samples and eleven inputs and three outputs. Temperature Metal Level NME cm ---Some variables, such as temperature, percentage of fluoride, and metal level, are collected manually by physical sensors or through laboratory analysis, generating different sampling frequencies.
Other variables, for instance real resistance and raw voltage, are collected online via sensors without human interference. Most of the variables are sampled on a daily basis; however, variables that are collected manually have other sampling frequencies. This fact causes null data to be present between measurements when combining variables from different samplings. Missing data were imputed by calculating a linear interpolation between the previous and subsequent measurements, according to the variable sampling. According to process engineers, linear interpolation fits well, because the chemical process is slow and it has been validated before. Figure 6 shows an imputation example for bath temperature. The soft sensors described in this work have the advantage of being capable of estimating missing data after they have been properly trained. Process engineers also agree there are three different types of behaviors produced by pots according to their lifespan: a lifespan of 1-100 days is considered a "starting point"; 101-1200 days as a "stationary regime"; and 1201-1500 days as the "shutdown point". This lifespan division is the second method used to cluster the entire dataset (the first is clustering by section, explained before). These ranges may vary according to the pot, but they are the same on average. Figure 7 summarizes behaviors and the amount of data for each lifespan division. The different behaviors also may be verified when the dataset of each group is statistically analyzed. Figure 8 shows histograms of each input variable for each group. The ALF3A variable has zero values at the starting point, because it is not observed in this phase, so this variable may be discarded when models for this phase are created. The PNA2O variable at the starting point has a larger number of samples less than 0.4; in the stationary regime and shutdown point, the higher concentration of samples is more than 0.4. The behavior of input variables between stationary regime and shutdown point is similar. Process engineers also agree there are three different types of behaviors produced by pots according to their lifespan: a lifespan of 1-100 days is considered a "starting point"; 101-1200 days as a "stationary regime"; and 1201-1500 days as the "shutdown point". This lifespan division is the second method used to cluster the entire dataset (the first is clustering by section, explained before). These ranges may vary according to the pot, but they are the same on average. Figure 7 summarizes behaviors and the amount of data for each lifespan division. Process engineers also agree there are three different types of behaviors produced by pots according to their lifespan: a lifespan of 1-100 days is considered a "starting point"; 101-1200 days as a "stationary regime"; and 1201-1500 days as the "shutdown point". This lifespan division is the second method used to cluster the entire dataset (the first is clustering by section, explained before). These ranges may vary according to the pot, but they are the same on average. Figure 7 summarizes behaviors and the amount of data for each lifespan division. The different behaviors also may be verified when the dataset of each group is statistically analyzed. Figure 8 shows histograms of each input variable for each group. The ALF3A variable has zero values at the starting point, because it is not observed in this phase, so this variable may be discarded when models for this phase are created. The PNA2O variable at the starting point has a larger number of samples less than 0.4; in the stationary regime and shutdown point, the higher concentration of samples is more than 0.4. The behavior of input variables between stationary regime and shutdown point is similar. The different behaviors also may be verified when the dataset of each group is statistically analyzed. Figure 8 shows histograms of each input variable for each group. The ALF3A variable has zero values at the starting point, because it is not observed in this phase, so this variable may be discarded when models for this phase are created. The PNA2O variable at the starting point has a larger number of samples less than 0.4; in the stationary regime and shutdown point, the higher concentration of samples is more than 0.4. The behavior of input variables between stationary regime and shutdown point is similar.  Analyzing the output variables histogram for each behavior (Figure 9), it is possible to observe that the TMP variable at the starting and shutdown points had a range of values greater than the stationary regime, ratifying the instability thesis. Another behavior verified was about the NME variable: at the starting point it had a large accumulation of samples at 24, but in the stationary and shutdown phases the accumulation was 25. The ALF variable at the starting point had a larger sample concentration less than 10; in the other two phases the concentration was greater than 10. Analyzing the output variables histogram for each behavior (Figure 9), it is possible to observe that the TMP variable at the starting and shutdown points had a range of values greater than the stationary regime, ratifying the instability thesis. Another behavior verified was about the NME variable: at the starting point it had a large accumulation of samples at 24, but in the stationary and shutdown phases the accumulation was 25. The ALF variable at the starting point had a larger sample concentration less than 10; in the other two phases the concentration was greater than 10.  Besides histograms, the difference in TMP variation can be observed in the three phases by Figure 10. In starting point, the mean is equals 970.5 • C, because the pot must be reheated; in stationary regime, the mean decreases to 963.7 • C, the standard mean of the plant; and in shutdown point, it also decreases to 958.8 • C, since the pot is being cooled to turn off. TMP was chosen to perform this analysis, because it is one of the most monitored process variables. Besides histograms, the difference in TMP variation can be observed in the three phases by Figure 10. In starting point, the mean is equals 970.5 °C, because the pot must be reheated; in stationary regime, the mean decreases to 963.7 °C, the standard mean of the plant; and in shutdown point, it also decreases to 958.8 °C, since the pot is being cooled to turn off. TMP was chosen to perform this analysis, because it is one of the most monitored process variables. The following subsection shows the steps performed in the original database in order to generate the resulting models.

Strategy for Modeling
Data clustered by each section and by each lifespan division were used to build models to estimate TMP, ALF, and NME using the ANN technique. It is important to know that each ANN model has only one of three outputs and two different training algorithms were used to create them: Levenberg−Marquardt (LM) and back propagation (BP). Besides, three strategies were used for each technique: 1. Consider 70% of the data from each cluster to train, 15% to validate, and 15% to test the models. 2. Consider data from all pots of one entire section to train the models, except for one pot of the respective section to test the model. This was applied to section clustering and lifespan division. 3. Dataset standardization was done using the z-score method.
The z-score generates a standardized dataset with average equal to 0 and standard deviation equal to 1 and it is expressed by: where x is the value to be standardized, is the average of the variable, and is the standard deviation of the variable. Table 3 shows the division of the complete dataset for the modeling process: for each lifespan division or all datasets and two different learning algorithms. Moreover, three strategies were used for each technique, 32 different pot sections, whole dataset, and three outputs, resulting in 594 different models, initially.

Lifespan division Training algorithm Number of models
Starting point ANN-LM 32 sections × 3 outputs = 96 All dataset × 3 outputs = 3 The following subsection shows the steps performed in the original database in order to generate the resulting models.

Strategy for Modeling
Data clustered by each section and by each lifespan division were used to build models to estimate TMP, ALF, and NME using the ANN technique. It is important to know that each ANN model has only one of three outputs and two different training algorithms were used to create them: Levenberg−Marquardt (LM) and back propagation (BP). Besides, three strategies were used for each technique: 1.
Consider 70% of the data from each cluster to train, 15% to validate, and 15% to test the models.

2.
Consider data from all pots of one entire section to train the models, except for one pot of the respective section to test the model. This was applied to section clustering and lifespan division.

3.
Dataset standardization was done using the z-score method.
The z-score generates a standardized dataset with average equal to 0 and standard deviation equal to 1 and it is expressed by: where x is the value to be standardized, µ is the average of the variable, and σ is the standard deviation of the variable. Table 3 shows the division of the complete dataset for the modeling process: for each lifespan division or all datasets and two different learning algorithms. Moreover, three strategies were used for each technique, 32 different pot sections, whole dataset, and three outputs, resulting in 594 different models, initially. Each model was trained ten times, because the initial weights of the neural network and the division of training and validation data are random, according to a Gaussian probability density function. In total, 5760 neural networks were created considering clustered data, whereas 2880 models use the LM algorithm and 2880 use the BP algorithm. The pseudocode (Algorithm 1) summarizes the entire modeling process. Each model was trained ten times, because the initial weights of the neural network and the division of training and validation data are random, according to a Gaussian probability density function. In total, 5760 neural networks were created considering clustered data, whereas 2880 models use the LM algorithm and 2880 use the BP algorithm. The pseudocode (Algorithm 1) summarizes the entire modeling process. The mean squared error (MSE) and the R between target and estimated values were considered as quality metrics of the models. MSE is defined as: where n is the number of samples, and yi and ̂ are the target and estimated values by the model, respectively.
The mean squared error (MSE) and the R between target and estimated values were considered as quality metrics of the models. MSE is defined as: where n is the number of samples, and y i andŷ i are the target and estimated values by the model, respectively.

Parameter Learning for ANN Models
It is important to mention that there were empirical attempts to define the number of neurons in the hidden layer and transfer functions in the hidden and output layers. Empirical attempts considering 2, 4, 8, 16, 32, 64, and 128 neurons in the hidden layer were done and alternating the transfer function resulted in a small variation in training, validating, and testing MSE of 0.5%. Therefore, it was decided to generate simpler models according to the parameters explained in Table 4. Table 4. Artificial neural network (ANN) model details.

Parameter Value Justification
Number of hidden layers 1 Empirical attempts. Number of neurons in the hidden layer 2 Transfer function in the hidden layer Symmetric Sigmoid Transfer function in the output layer Linear Learning algorithms LM To build models faster, because this algorithm considers an approximation of Newton's method, which uses an array of second-order derivatives and a first-order derivative matrix (Jacobian matrix). On the other hand, it uses more memory to calculate optimal weights [76,77].

BP
To create models based on the most traditional learning algorithm: descendent gradient. It is slower than LM, but it uses less memory [78,79].
It is important to mention that the models were generated using MATLAB ® version R2018a (The MathWorks Inc., Natick, MA, USA) on a computer equipped with a processor by Intel ® Core™ i7-3537U, CPU 2.00 GHz, 8 GB RAM, SSD (Solid State Disk).

Results and Discussion
After running the experiments, this section shows and discusses the results. Figure 11 shows the time spent in each set of experiments by lifespan division and the training algorithm. Once there were 32 different sections, three different outputs and ten experiments were done, so each point represents the training of 960 different models. All experiments consumed over two and a half hours in total, where the LM algorithm was almost twice as fast as the BP. Figure 12 exemplifies the evolution of training, validating and testing of neural networks creation process for TMP output, considering starting point data. It is possible to verify LM converges faster and it is more accurate than BP. This same behavior was identified for the other outputs and lifespan divisions.  Figure 12 exemplifies the evolution of training, validating and testing of neural networks creation process for TMP output, considering starting point data. It is possible to verify LM converges faster and it is more accurate than BP. This same behavior was identified for the other outputs and lifespan divisions.  Figure 12 exemplifies the evolution of training, validating and testing of neural networks creation process for TMP output, considering starting point data. It is possible to verify LM converges faster and it is more accurate than BP. This same behavior was identified for the other outputs and lifespan divisions. Since the reduction pot always operates with the closed loop control, the available data are closed loop. In other words, the estimation of the variables made by the soft sensors is in a closed loop. Thus, the estimates obtained show bias deviations and inherent error in the frequency domain [72][73][74][75][76]. Since the reduction pot cannot operate in an open loop, these errors will be inherent in the estimates obtained, but are sufficiently useful for control [73,76]. Therefore, it is possible that data are affected by the change of the controller transfer function. Figure 13 shows MSE and R values for 2880 models considering all pots in starting, stationary and shutdown phases, ANN-LM, the three output variables, and normalized data. Most models present low MSE values and high R values (the blue line is the average). Therefore, the contribution is to prove that the modeling strategy described worked properly. Since the reduction pot always operates with the closed loop control, the available data are closed loop. In other words, the estimation of the variables made by the soft sensors is in a closed loop. Thus, the estimates obtained show bias deviations and inherent error in the frequency domain [72][73][74][75][76]. Since the reduction pot cannot operate in an open loop, these errors will be inherent in the estimates obtained, but are sufficiently useful for control [73,76]. Therefore, it is possible that data are affected by the change of the controller transfer function. Figure 13 shows MSE and R values for 2880 models considering all pots in starting, stationary and shutdown phases, ANN-LM, the three output variables, and normalized data. Most models present low MSE values and high R values (the blue line is the average). Therefore, the contribution is to prove that the modeling strategy described worked properly.    Figure 14 shows MSE and R values for the other 2880 models, considering all the characteristics and pots previously mentioned, but the ANN-BP training algorithm. It is noted that MSE and R values were bigger on average and had more variants than those of ANN-LM. It is interesting to note high variance in the results of each section. Figure 15 shows MSE and R values for models created by all data for ANN-LM and ANN-BP. It was possible to verify higher MSE and lower R (on average) when compared to previous models.       Comparative graphs between target values and estimated by the models were generated after the creation of estimating models and selection of the best ones. Once there were 32 models for three Sensors 2019, 19, 5255 24 of 31 different lifespan divisions, models based on all data, three outputs (TMP, ALF, and NME), and two ANN learning algorithms, then it was necessary to select only one pot to visualize this similarity (pot 5). Figure 16 displays comparisons for ANN-LM-based models considering non-standardized data. It verified that the models based on lifespan division (red line) estimate very well the dynamics of the process for all output variables. Models based on all data had not learned to estimate the values (green line), especially the ALF output. Next to the graphs, there were the respective MSE and R values. Comparative graphs between target values and estimated by the models were generated after the creation of estimating models and selection of the best ones. Once there were 32 models for three different lifespan divisions, models based on all data, three outputs (TMP, ALF, and NME), and two ANN learning algorithms, then it was necessary to select only one pot to visualize this similarity (pot 5). Figure 16 displays comparisons for ANN-LM-based models considering non-standardized data. It verified that the models based on lifespan division (red line) estimate very well the dynamics of the process for all output variables. Models based on all data had not learned to estimate the values (green line), especially the ALF output. Next to the graphs, there were the respective MSE and R values.  Figure 17 shows comparisons for ANN-BP-based models. Estimated values also follow target values, but the accuracy is lower than the ANN-LM-based models for the most variables. When models based on all data are analyzed, it is possible to verify that they have not learned using the neural network parameters cited above.  Figure 17 shows comparisons for ANN-BP-based models. Estimated values also follow target values, but the accuracy is lower than the ANN-LM-based models for the most variables. When models based on all data are analyzed, it is possible to verify that they have not learned using the neural network parameters cited above.  Table 6 displays the MSE and R values for comparisons between target and estimated values for ANN-LM, ANN-BP-based models and by clustered and all data plotted on the graphs in Figures 16  and 17. It proves the advantage of using the proposed method. It is important to remember that data used to perform these comparisons were not used in the neural net creation process.  Table 6 displays the MSE and R values for comparisons between target and estimated values for ANN-LM, ANN-BP-based models and by clustered and all data plotted on the graphs in Figures 16 and 17. It proves the advantage of using the proposed method. It is important to remember that data used to perform these comparisons were not used in the neural net creation process. Another results evaluation was performed analyzing residual plot in all phases, considering the best clustered based model. Figure 18 shows that the most TMP points are between −5 • C and 5 • C, the most ALF points are between −1% and 1%, and NME points are between −0.5 cm and 0.5 cm. These error variances are perfectly acceptable by process engineer. Red lines display the std ranges.

Conclusions
In this work, the results of an innovative approach to create soft sensors to estimate TMP, ALF, and NME variables of primary Al production were presented. After testing different neural net topologies and considering two different training algorithms, training and testing 5940 different models, the best model of each output variable was selected and it was possible to ensure that these models generate high generalization power and very small errors that are fully tolerated by process engineers. In all cases, models based on section clustering and lifespan division performed more

Conclusions
In this work, the results of an innovative approach to create soft sensors to estimate TMP, ALF, and NME variables of primary Al production were presented. After testing different neural net topologies and considering two different training algorithms, training and testing 5940 different models, the best model of each output variable was selected and it was possible to ensure that these models generate high generalization power and very small errors that are fully tolerated by process engineers. In all cases, models based on section clustering and lifespan division performed more accurate estimates compared to models that do not use clustering. LM has helped to create neural networks more accurate than the BP algorithm. Besides, LM is faster for training the models.
TMP, ALF, and NME variables are the most important to control the proper functioning of the pots. The lifespan and section dataset clustering contributed to creating more specialized models in the behaviors of the respective clusters of pots, reducing errors and increasing the precision of the estimating soft sensors. ANNs have been chosen because they can generate models with a high power of generalization and they have the capability to learn the nonlinearity of the process using experimental plant data.
MATLAB ® was used to develop the models, but a computer system will be created to implement the integration of soft sensors with data acquired in real time, making it possible for engineers to virtually estimate the behavior of the pots, rather than make manual or laboratory measurements. It is planned to use these soft sensors to control the pots.