Industrial Process Control Using DPCA and Hierarchical Pareto Optimization

: The control of large-scale industrial systems has several criteria, such as ensuring high productivity, low production costs and the lowest possible environmental impact. These criteria must be established for all subsystems of the large-scale system. This study is devoted to the development of a hierarchical control system that meets several of these criteria and allows for the separate optimization of each subsystem. Multicriteria optimization is based on the processing of data characterizing production processes, which makes it possible to organize a multidimensional statistical control process. Using neural networks to model the technological processes of subsystems and the method of dynamic principal component analysis (DPCA) to reduce the dimensionality of con-trol problems allows us to ﬁnd more eﬃcient solutions. Using the example of a two-level hierarchy, we showed a variant of the connection between two subsystems by parameters.


Introduction
Hierarchical optimization is a methodology for managing complex large-scale production systems that consist of individual subsystems.Large-scale systems have many goals: high productivity, low production costs, and possibly a lower environmental impact.A large-scale high-level system can be divided into subsystems belonging to lower levels of the hierarchy.Each subsystem of a lower level forms a control and transmits information about its output parameters to the next level, which takes the received data into account to form its control actions.Moreover, each subsystem can be optimized independently, in accordance with the developed optimization methodology, taking into account the exchange of information between subsystems of different levels.
In hierarchically organized large-scale systems, lower-level subsystems solve problems, the results of which are used by higher-level subsystems.Each subsystem of the lower level, the operation of which is optimized, affects the optimization results of the subsystem of the next level.
The result of multicriteria optimization (MCO) is the selection of an equilibrium solution from a set of optimal solutions.A set is called Pareto optimal if, when moving from this solution to another in the feasible solution space, any improvement in the value of one target criterion leads to a deterioration in at least one of the remaining target criteria.
Multicriteria optimization methods can be divided into four classes: direct methods based on the preferences of the decision maker (DM); a priori methods that allow for an optimal solution that is closest to the preferences of the decision maker to be obtained; a posteriori methods based on obtaining a set of Pareto optimal solutions and choosing one of the decisions of the decision maker; interactive methods that provide multiple Pareto optimal solutions and invite decision makers to iteratively select the optimal solution, providing recommendations at each iteration step.
Classical a priori methods include the weighted sum method, the epsilon constraint method, and the weighted metrics method.These methods have some disadvantages, in particular the heterogeneity of optimal solutions and the inability to obtain solutions for non-convex sets in the space of objective functions.
Multi-objective optimization methods, such as Particle Swarm Optimization (PSO), are based on modeling the social behavior of a group of subjects who iteratively try to improve their position.For example, the Multi-Objective Gray Wolf Optimizer (MOGWO) method is based on modeling the social leadership and hunting techniques of gray wolves.
Evolutionary algorithms (EA) and methods are used to solve complex FEM problems due to their ability to quickly find solutions in a multi-parameter space.They provide effective sampling in the criterion space by modeling the behavior of the system in some modes that cannot be realized in practice and allow for one to obtain many solutions simultaneously.Depending on the definition of elitism, two types of multifunctional advisers are distinguished.The non-dominated sorting genetic algorithm (NSGA) searches for non-dominated solutions while preserving data diversity.The NSGA-II algorithm performs an additional sorting of non-dominated solutions.
NSGA-II performs the following operations: initializing an initial data set based on the model and input data ranges, criteria and constraints; sorting using criteria for nondominance of the initialized population; dividing data into edges of different ranks and determining the distance of data condensation along the edge.Individuals from a population are selected based on rank and aggregation distance; the N best individuals are selected according to the criterion of non-dominance and combined into a new population, to which recombination and mutation procedures are applied.These actions continue until the specified number of generations has been calculated.
In our study, we use the evolutionary algorithm NSGA-II to construct the Pareto frontier of a technological process, because it allows for the use of a recurrent multilayer perceptron (MLP) to take into account the dynamics of processes and eliminate rare island conditions.
A number of publications are devoted to solving the MCO problem using machine learning methods.The authors of [1] use deep MLP and EA to implement dynamic MCO.Article [2] presents the results of machine learning for the development, control, diagnostics and prediction of cyber-physical systems.The authors of article [3] use hybrid modeling and a comparison of multicriteria learning with and without supervision and an evolutionary algorithm.
The article [4] demonstrates the use of MCO in systems with the visualization of solutions using growing hierarchical self-organizing maps (HSOM), which are applicable even if the dimension of the criterion space is more than three.
When using hierarchical MCO, we assume that each subsystem can be optimized separately and independently.The solutions obtained as a result of optimization of the lower-level subsystem are transferred to the next level, which manipulates them to find the optimal solution for their level or for the entire system.The advantage of the hierarchical optimization method is its ability to simplify a complex system due to modularity and reduce the dimension of each individual subsystem.Pareto boundaries must be calculated for all subsystems that have their own goals.Optimal solutions for subsystems are found by calculating the Pareto frontier for the subsystems and selecting the desired point on the boundaries of the subsystems.Then, the resulting optimal parameter values are transmi ed back to the upper level.Such parameters connect the optimal values of subsystems of neighboring levels.Having received new parameter values, a Pareto optimal solution is generated [5].
In large-scale manufacturing systems, separating subsystems does not always achieve significant dimensionality reductions because each subsystem is characterized by a large number of correlated parameters and dynamic behavior.Therefore, the task of reducing the dimensionality and correlation of subsystems remains relevant [6,7].
The goal of our research is to optimize process control, thereby achieving the Stackelberg equilibrium between the desire to increase equipment performance and reduce the costs (increase the efficiency) of production [8,9].As an example, we will consider a twolevel technological process for producing superheated steam for a metallurgical enterprise.

Multivariate Statistical Process Control (MSPC)
The statistical control of production processes, which is mainly used in the energy and chemical industries, is based on mathematical methods of modeling and statistical analysis of large data sets.MSPC is a transition to a higher level of processing the accumulated data characterizing production processes, including modeling, clustering, multifactor monitoring and process control, aiming to constantly improve the performance of departments and the enterprise as a whole.
Currently, in the energy sector, the number of measured parameters used to control just one steam boiler exceeds sixty.The parameters have correlation dependencies.As calculations have shown, cross-correlation coefficients range from 0.01 to 0.99.Therefore, to reduce the dimensionality of the modeling and control problem, it is advisable to use the principal components of parameters obtained from the original data.The dependencies between the parameters may be dynamic and are not always linear, so it is advisable to consider the possibility of using the dynamic principal component method of dynamic DPCA.
The curse of dimensionality makes it difficult to model complex dynamical systems.Reducing the dimension of the model allows for one to obtain more stable results; for example, the authors of [10], when describing the dynamics of the turbulence of the velocity field, use the method of principal components of the field parameters, presented on a fine grid, in a multidimensional space.The principal component method allowed for us to obtain a significant reduction in the model dimension.
The authors of [11] use the principal component method to model systems in the chemical industry.
Nonlinear principal component analysis (NLPCA) is a nonlinear generalization of standard PCA using the principal curve technique, which involves moving from straight lines to curves.To implement the NLPCA method, the authors of [12,13] developed an auto-associative neural network and simulated the dynamics of continuous chemical reactors using linear and nonlinear principal component methods.The nonlinear principal components are determined by a feedforward neural network with one hidden layer.The authors analyze the resulting models from the perspective of their application for process control.

Pareto Optimization of Industrial Processes
Pareto optimization methodology refers to a posteriori method in which a decision is made after finding a set of feasible solutions to multicriteria optimization [14].Mathematically, if the goal is to maximize or minimize the objective function ( ): max ( ) = ( ), … , ( ) , ∈ , where ( ): ⟶ is the objective vector function, and where > 2 ; ∀ * ∈ * ∄ ∈ : ≻ * : ≻ * means that x dominates * .When controlling technological processes in production, the objective function is difficult to analytically set, since it depends on operating modes characterized by many interrelated parameters that change over time.Therefore, modern production systems implement a multidimensional statistical control process [15].
The control object is represented as a nonlinear dynamic system in state space, having a vector of control parameters , a state vector , and a vector of output parameters .The optimal control vector u is obtained from the following conditions [16,17]: Maximization of the criteria vector: ;  Constraints in the form of inequalities: ( , , ) ≤ 0;  Constraints in the form of equalities: ( , , ) = 0;  Accounting for boundaries: The input of the dynamic system is the vector of control parameters ( ), which influence the object at time , which leads to a change in the state vector ( + 1) at the next time, + 1.At the output of the system, the result of the measurement is observed in the form of a vector ( ).The model is represented by a nonlinear process equation and a linear measurement equation: where , are matrices characterizing the control object, are vector functions characterizing the nonlinearity of the object, and is a matrix characterizing the measuring system.
A nonlinear state-space dynamic system is characterized by a recurrent artificial neural network (ANN), which has a hidden layer with nonlinear activation functions and a linear output layer.In a more complex case, the number of hidden layers can be increased.The structure of the ANN is shown in Figure 1.A neural network with one hidden layer is a model of a system in state space, characterized by Equations ( 2) and (3).Typically, recurrent multilayer perceptron, which has one or more hidden layers, is more efficient; the structure of this is shown in Figure 1.
Increasing the number of object parameters that can be measured allows for the use of multivariate statistical process control (MSPC).To avoid the "curse of dimensionality" when modeling production processes, it is advisable to use principal component analysis (PCA) for linear and nonlinear dependencies between parameters.
Linear SVD.The initial data are presented in the form of a table containing measurement results (matrix X) recorded at a given time interval.The number of columns m in matrix X is equal to the number of parameters, the number of rows depends on the number of parameter measurements during the observation time, and n > m.When performing PCA, the columns of the matrix are centered.
The covariance matrix obtained by formula = , ≫ 1 is a symmetric × matrix; therefore, it can be diagonalized as = , where is a matrix, each column of which contains its own vectors, is a diagonal matrix containing the eigenvalues , in descending order, on the main diagonal.Projections of data onto axes are called principal components.To reduce the data dimensionality from to < , the first k columns of are selected, and the × upper left part of the matrix .Their product requires an × matrix containing the principal components s.PCA is preferentially performed via singular value decomposition (SVD) due to its greater numerical stability.The result of the SVD decomposition of the matrix is the product = , where is a unitary matrix whose columns contain left singular vectors, a diagonal matrix containing singular numbers , and a matrix whose columns are called right singular vectors.Since the correlation matrix has the expression: and the eigenvalues of the covariance matrix are related to the singular numbers = , the principal components are determined by Formula (4), = = , using the numerical method of singular value decomposition.
MSPCs used for control have groups of correlated parameters, and therefore contain redundancy.Redundancy can be reduced using principal component analysis (PCA).The property of PCA that can order the principal components of PCA allows for one to exclude components that have an insignificant influence on the optimization criteria.

Linear Dynamic PCA
The principal components method does not consider the dynamic properties of manufacturing processes.When the multicriteria optimization of complex nonlinear industrial processes is carried out, it is necessary to consider their dynamic behavior.The results of changes in parameter have a strong correlation, which reflects data redundancy, which makes it difficult to quickly process data.Therefore, to eliminate correlation dependencies and reduce the dimension of the control problem, it is advisable to use the method of dynamic principal component analysis (DPCA) proposed by the authors of the article.DPCA considers the fact that controlled processes are characterized by an autocorrelation function.Such processes can be described by an autoregressive model, a moving average model, or a mixed autoregressive-moving average model.
The DPCA method uses the dilated source matrix , which is the matrix X augmented with time-shifted repeat values of all variables in .PCA is then performed on this matrix.
DPCA considers that the data have an implicit vector autoregressive model.The KSG-95 method given in [18] and the RR-13 method proposed by Rato and Reisa [19] are known, which use the same lag for all variables, and the lag value determines the order of the autoregression process.
The method somewhat complicates the process of performing optimal multivariate statistical control, since it considers the autocorrelation of the observed processes.More adequate process models that consider autocorrelation are autoregressive, moving average, or mixed autoregressive moving average models.The use of these models allows for one to obtain good results [20].
We used a method that adds the same number of lags for each -th parameter.As an example, article [21] shows that for two parameters, ., ., which are characterized by second-order autoregression, the principal components are calculated using current and previous data: Thus, DPCA leads to the expansion of the parameter matrix X by adding delayed data, the number of which is determined by the order of the autoregressive processes: = [ ( ), ( − 1), … , ( − )].Singular value decomposition is performed on the extended covariance matrix obtained for the matrix .Nonlinear SAR: Nonlinear dependencies between parameters lead to the need to use nonlinear PCA.We can use SAR with a kernel.This method uses the function φ( ) in the feature space.The kernel ( ) , ( ) = 〈φ( ( ) ) φ( ( ) )〉 represents the inner product of functions in parameter space.Parameters ( ) , = 1. .have zero mean value.The covariance matrix is as follows: The projection in the space ( ) of features has the form = ∑ φ(x ) , where represents eigenvectors.As in linear PCA, eigenvalues are used to rank eigenvectors based on how much of the variation in the data is captured by each principal component.This PCA approach is used to reduce the dimensionality of the data for kernel PCA.
The basis for nonlinear dimensionality reductions in nonlinear PCA is provided by the fundamental curves and manifolds.This idea has been explored by many authors [22,23].

Multivariate Statistical Model
To construct the Pareto front from data using classical methods, it is necessary to have a sufficient representation of the data in the entire criteria space.If the data are insufficient, the front boundary may have discontinuities.The solution can be improved by methods that use data generation, allowing for one to achieve a more uniform filling of the criteria plane.Evolutionary algorithms are well-suited for filling areas close to the front with data, which makes it possible to simultaneously obtain many solutions to a problem using hidden parallelism.A recurrent neural network is used to generate data in the strategy space and map then to the criteria space for nonlinear systems.
The NSGA-II genetic algorithm is based on the use of two procedures: a procedure for quickly sorting a set of solutions into fronts of solutions that do not dominate each other and have the same level of dominance, and a procedure for estimating the distance characterizing the density (condensation) of solutions belonging to each front.As a result, the Pareto front will be built according to historical data, which may not contain the most profitable operating modes for the equipment.Therefore, it is advisable to build a boundary based on a process model.A nonlinear, rather complex, dynamically changing model was obtained using a neural network.
The experience of using the NSGA-II genetic algorithm to build a Pareto front for our case showed that the front had irregularities and moved too far from the data, which led to very strict recommendations for the control of production processes, which could lead to an increased operating mode that is not applicable.
Most of the reviewed multicriteria control methods are built on the basis of analytical equations arising from the physical processes that determine the dependencies between system parameters and quality criteria to ensure their functioning.Analytical dependencies do not always take into account more subtle features of production processes, for example, the aging of equipment, the appearance of scale in boilers, changes in fuel oil humidity, and weather conditions.Therefore, our approach is based on models of production processes that are updated based on big data that are constantly accumulated in the data lake, which allows for you to create control actions that are adequate for the current situation in production, considering all factors.
The neural network is pre-trained on historical data of the technological process, and retrained when reconfiguring production processes (after the running-in, repair or replacement of equipment).The application of physically informed neural networks for modeling chemical processes is shown in article [24].The results of neural network modeling are used to generate a population of data used by the genetic algorithm.The use of DPCA makes it possible to simplify the implementation of a genetic algorithm by excluding equations that describe the relationships between parameters from consideration.The Pareto frontier represents a set of criterion values that satisfy the following conditions:


Maximize the vector of criteria: ;  Consider the restrictions imposed on the principal components of the production process parameters: ≤ ≤ .
The measurement results obtained during production contain anomalous measurement results and data loss caused by strong production noise.Failures result in non-numeric values.The evolutionary algorithm is very sensitive to anomalous measurement results.If there are noise, failures, and losses in the change results, the Pareto frontier will be constructed incorrectly.Therefore, it is necessary to perform filtering first.Single anomalous measurement results are suppressed using median filters [25][26][27], while packages of strongly correlated anomalous measurement results are detected using finite-difference filters: where is the expected length of a packet of highly correlated data, ∆ ( ) is the second-order finite difference calculated for the length, , of the packet, and ̂ ( ) is the predicted value replacing the packet.
The Algorithm 1 for constructing the Pareto front performs the following steps: Algorithm 1: Description 1. Reading historical data X; 2. Median filtering;

3.
Removal of single anomalous measurements and packages of highly correlated anomalous measurements; 4. Formation of multiple criteria; 5. Formation of an extended matrix of control parameters; 6. Calculation of DPCA parameters ′ = ; 7. Construction of a neural network model ( ) of the controlled system ; 8. Initializing a random population ( ) by generating data using ANN( ′ ); 9.
Assignment of rank ( ) using the method of determining the depth of dominance and assessing diversity using the average distance characterizing the proximity to neighboring solutions; ℎ .

Determination of Optimal Control Actions
Article [28] is devoted to the problem of the multicriteria optimization of a control system using PCA.The authors of [29,30] emphasize that multi-objective optimized control methods can improve the efficiency of industrial heating and power generation processes.In publication [31], the authors show the importance of developing methods for managing production processes in industry 4.0.
Figure 2a shows schematically the mapping of the space of dynamic principal components of the measured parameters [ , , ] of the controlled object into the criteria space [ , ] .ANN neural network is used to obtain this mapping.Figure 2a illustrates the fact that the system has some non-optimal modes, changing which would simultaneously improve two target criteria and .After obtaining the Pareto front, which characterizes a set of optimal operating modes of the object and selecting the preferred mode [ * , * ] as a hyperparameter, it is possible to obtain the optimal values of the dynamic principal components of the parameters [ * , * , * ] .The IANN neural network is used to produce this mapping.Figure 2b illustrates the mapping from the criterion space to the space of dynamic principal component parameters.The application of the multicriteria optimization method is considered in [32].Publication [33] is devoted to the issue of system stability, which is important for control theory, which was solved using Razumikhin's method.
Let us consider the technological process of preparing fuel oil at a power plant, which includes a receiving and draining device, the main tanks for storing a constant supply of fuel oil, a fuel oil pump, a pipeline system for fuel oil and steam, a group of fuel oil heaters and filters.To pump fuel oil, fill it and drain it from containers, the temperature of the fuel oil must be at least 60-70 °C.The preparation of fuel oil before combustion consists of removing mechanical impurities, increasing the pressure of fuel oil, and heating it, which are necessary to reduce energy losses in the transport of fuel oil to the boilers of the power plant and its fine atomization in the nozzles of burner devices.The temperature of the fuel oil in the tanks is maintained at 60-80 °C at any time of the year due to circulation heating by returning part (up to 50%) of the fuel oil heated in external heaters to the tank.
A complete set of data characterizing the preparation of fuel oil contains 31 parameters, including the pressure of fuel oil, water and steam, the consumption of water, steam and additives, the temperature of fuel oil, water and steam, and fuel oil humidity.The criteria for the quality of the preparation system's operation are the temperature and consumption of the prepared fuel oil.The historical raw data contain more than 15,000 measurements of 31 parameters.
The use of a large number of model parameters allows for one to obtain high accuracy when assessing the criteria.However, the solution to the inverse problem, which consists of determining a large number of input parameters based on the values of the criteria, becomes unstable.Even the use of a stabilizer did not provide satisfactory results.To achieve a compromise between accuracy in the approximation of criterion values from experimental data and accuracy when determining optimal control actions to transfer the object to the Pareto front mode, the most significant principal components were selected.
The correlation matrix of the principal components of the control parameters has the form: After removing components with a total energy not exceeding 3% of the total energy of the input parameters, the correlation matrix of the principal components took the form: The number of principal components determines the size of the ANN input layer.Reducing the size of the input layer led to a significant increase in the error in calculating the criteria.To verify this statement, the result of modeling criteria using ANN for five principal components was obtained.The simulation results are shown in Figure 3.The average relative error in simulating fuel oil flow and temperature was 0.03 and 0.004, respectively.Considering the presence of noise when performing measurements in industrial environments, the simulation result is satisfactory.Figure 5 shows a polar diagram of the principal components of control parameters, the values of which make it possible to determine the optimal mode for the fuel oil preparation system corresponding to the selected point on the Pareto frontier shown in Figure 4a.Based on the dynamic principal components, it is possible to determine recommended sequences changing the values of control parameters when necessary to achieve the optimal value of the criteria.Figure 6 shows the recommended physical values of the control parameters recovered from the DPCA.The parameter values shown in Figures 5 and 6 are normalized in such a way that the maximum parameter value is equal to one.The control parameters listed in Figure 6 characterize the operation of a fuel oil pumping station (FOPS), a fuel oil supply warehouse (FOSW), and a thermal power plant (TPP).
For a system at the top level of the hierarchy, the required value on the criteria plane is specified as a hyperparameter.

Hierarchical Optimization
As an example of a multicriteria optimization problem, consider a boiler equipment control system that produces superheated steam for production shops and for heating needs.A simplified block diagram is presented in Figure 7.The control object is the steam boiler equipment, which consists of the boiler itself (1), fuel oil preparation subsystems (2), water preparation (3), and air (4).As a result of the production process, superheated steam is generated, which is supplied to the production workshops for consumers (5).
The solution to the problem was divided into several hierarchically organized stages, which are shown in a simplified form in Figure 8, where three dots show blocks that perform similar functions (calculating the Pareto front for air preparation, operating mode for water preparation, and control parameters for air preparation).At the top level of the hierarchy, general goals were set for the boiler equipment control system: efficiency.and system performance.Specific target values for steam boiler efficiency and steam boiler productivity were selected on the Pareto frontier..Then, in accordance with the developed algorithm, using a pre-trained INN for the boiler system, all control parameters were determined that could help to achieve the specified target operating mode of the boiler equipment as a top-level system.

Discussion
With an increase in the number of criteria, the optimization of production control processes makes it possible to create and maintain sustainability not only during production, but also in the production ecosystem.Multicriteria optimization allows for us to achieve be er production results while saving resources and reducing the harmful impact on nature and humans.Such production ecosystems include metallurgical production complexes that use natural resources, produce, and energy resources, and strictly control the impact of production on the environment [26].The parameters of all production processes are documented, and their current and historical data should be used to optimize production.For example, for just one fuel oil treatment system, over the course of a year, there are more than 30 million process records for 32 parameters that have autocorrelation and cross-correlation relationships.The steam boiler system has more than 60 million records for 62 parameter changes.Currently, production process control has several goals, including increasing productivity, increasing efficiency, and reducing harmful emissions.Control optimization is multicriteria, based on an analysis of multivariate correlated statistical processes.
Multicriteria optimization is performed using an evolutionary algorithm, instead of a simpler search for non-dominated solutions, which allows for you to include all possible operating modes of production equipment in the analysis, including those modes that were not implemented during the period of the accumulation of measurement data.This algorithm uses a neural network model of a controlled system, which considers all permissible operating modes of controlled systems and many restrictions on input parameters.The periodic updating of data and retraining/retraining of the model allows for you to maintain its relevance.
The presence of strong inter-correlation dependencies between parameters makes it difficult to write restrictions on parameter values when implementing the NSGA-II algorithm, which leads to information redundancy and makes it difficult to solve the problem of finding control actions that correspond to the Pareto boundary.Therefore, the use of dynamic principal components of control parameters will avoid unnecessary complexity.For example, for a fuel oil preparation system, instead of 29 control parameters, it was found to be possible to use 6 of their principal components.
The reverse transition to the control parameters allows for us to provide recommendations on how to adjust the physical parameters to achieve the Pareto optimal regime.In this case, it is necessary to monitor the stability of the system [33].
The method we proposed was applied to optimize the operation of boiler equipment at a metallurgical enterprise.As a result, with the same productivity of boiler equipment, we were able to increase the average monthly efficiency value from 86.5% to 88.5%.

Figure 2 .
Figure 2. Data mapping: (a)-mapping from data space to criteria space, (b)-mapping from criteria space to control parameter space.

Figure 3 .
Figure 3. Modeling system performance criteria using ANN, (a) simulation of the fuel oil consumption of a steam boiler based on data obtained over 250 h, (b) simulation of steam boiler fuel oil temperature based on data obtained over 250 h.

Figure 4 .
Figure 4. Modeling of control actions using (a) Pareto boundary; (b) first principal component of the control parameters obtained using ; (c) second principal component of the control parameters obtained using .

Figure 7 .
Figure 7. Simplified block diagram for steam production workshops and heating.

Figure 8 .
Figure 8. Two-level hierarchical optimization scheme for a steam boiler.