processes

: In this paper, a multi-objective collaborative optimization (MOCO) strategy is proposed for making decisions on a distillation column group. Firstly, based on data preprocessing, the operating modes of the tower group are determined by use of the fuzzy C-means clustering method. Secondly, based on the proposed concept of a collaborative variable, the discrete state-space model of the main towers are constructed by the subspace identiﬁcation method. Then, a MOCO optimization model is designed for the ethylene plant. Finally, NSGA-III is used to solve the optimization problem. An analysis of a Pareto-optimal frontier and population is carried out. To illustrate the superiority of the proposed strategy, the results are compared with historical data and the appealing operation area is ﬁnally obtained.


Introduction
In the real world, most optimization problems have multiple conflicting objectives.Single objective optimization (SOO) which gives a single solution has difficulty weighing multiple objectives.Unlike SOO, multi-objective optimization (MOO) gives a set of optimal solutions, each of which corresponds to different values of the objective functions, which, in turn, can form a Pareto frontier to provide rational guidance to decision-makers.Therefore, MOO has many applications in renewable energy resources production optimization [1,2], energy saving optimization [3,4], and emission reduction optimization [5][6][7].In trade-off MOO methods, priori methods, interactive methods, Pareto-dominated methods and new dominance methods are available [8].
In recent years, there has been a wealth of MOO research on single towers and tower groups.Zhou et al. [9] optimized the decision variables of a methanol-to-propylene plant under different operating conditions based on a MOO framework and achieved a balance between total heating, cooling costs and product purity.For Fischer-Tropsch synthesis, Zhang et al. [10] transformed the design of a reactive distillation column into a MOO problem and determined the column structure based on the Pareto-optimal frontier.In [11], MOO was used to weigh the total annual cost and energy of a cyclohexane-isopropanolwater extractive distillation process.In [12], by considering three conflicting objectives, the total annual cost, the CO 2 emissions and the molar purity of a biodiesel product, an algal biodiesel reactive distillation unit was optimized.Gu et al. [13] optimized heat-integrated pressure-swing distillation based on MOO to reduce the total annual cost and CO 2 emissions, while saving energy.For similar purposes, MOO was also used for heteroazeotropic distillation processes [14], dividing wall columns [15,16].It is worth noting that ethylene is one of the most produced chemical products in the world-the ethylene industry is also the core of the petrochemical industry, but little research has been undertaken in this area.Shen et al. [17] targeted the chilling train and demethanization system in ethylene manufacturing according to the Pareto-optimal frontier, and, by maximizing the exergy efficiency and minimizing the operational cost, the optimal operation was selected.Multi-objective adaptive surrogate model-assisted optimization was proposed by [18] et al. with energetic, economic and environmental MOO performed for a practical ethylene separation process.More research should be devoted to the ethylene industry.
MOO studies of single towers and tower groups are essentially based on Aspen [9,[11][12][13][14][15][16][17][18].To build an Aspen model of a tower or tower group, it is necessary to have in-depth knowledge of the physical parameters and it is difficult to obtain satisfactory Aspen models in the absence of sufficient parameters.The question arises of how to conduct a MOO study of a tower group in the absence of the necessary physical parameters or without access to Aspen software.This is a problem.In addition, the optimization of Aspen-based MOO studies is separated from the simulation of the process; interaction between software is required.Before population iteration, the data in Aspen needs to be updated to the optimization platform, either manually or with the help of plug-ins, which is inconvenient and inefficient.
Under the above circumstances, it is important to study the MOO problem based on historical data for the manipulated variables and control variables, but this also raises the following questions: (1) How to handle contaminated data for modeling; (2) What kind of model to build in order to optimize; (3) How to design a MOO problem for a distillation column group.
In order to address these problems, a multi-objective collaborative optimization (MOCO) strategy based on system identification is proposed for an ethylene tower group.For the preprocessing of historical data, the Hampel method [19,20] is used to eliminate missing values and outliers.Gaussian filtering [21] is used to smooth the noise present in the data.Considering the direct or indirect connection between towers, a collaborative variable is proposed and applied.Fuzzy C-means clustering [22][23][24], a soft clustering method, is used to distinguish the operating modes of the ethylene plant.Selecting the data near the main operating mode, collaborative-variable-based linear discrete state-space models of the target towers are constructed using a subspace identification method [25].A MOCO problem with the objectives of minimizing the impurity content in the ethylene product, the impurity content in the propylene product, and the operating cost is designed to optimize the top reflux flow rates and steam flow rates of bottom reboilers.NSGA-III [26] is used to solve the optimization problem and the effectiveness of MOCO is analyzed.
The remainder of this paper is organized as follows: Section 2 describes the ethylene plant under study.Section 3 describes a data-preprocessing scheme, determines the operating modes, and constructs state-space models of the target towers.The MOCO problem is described in Section 4. Section 5 provides a MOCO case study and discussion based on historical data.The conclusions are presented in Section 6.

Description of Ethylene Plant
The ethylene plant studied in this paper has a target annual production of 1 million tons of ethylene and 500,000 tons of propylene.Without considering the product systems of ethylene and propylene, the tower group for the distillation process consists of a deethanizer (T 1 ), a C2 hydrogenation reactor (R 1 ), a green oil tower (T 2 ), an ethylene distillation column (T 3 ), a low pressure depropanizer (T 4 ), a high pressure depropanizer (T 5 ), a C3 hydrogenation reactor (R 2 ), a propylene distillation column A (T 6 ), a propylene distillation column B (T 7 ), a debutanizer (T 8 ) and several dryers (DR) and heat exchangers (EX).
The tower group is shown in Figure 1.The main control variables of T 1 are the bottom C2 concentration (ethylene and ethane) and the top C3 concentration, which are measured by online analyzers on the bottom and top lines, respectively.The feed to T 1 is the demethanizer bottom product in the upstream tower group.Its top product goes to R 1 , where acetylene is selectively hydrogenated to produce ethylene and ethane.The bottom product of R 1 enters T 3 after passing through T 2 and a dryer.The main controlled variable of T 3 is the impurity content in the top ethylene product; the ethylene extracted from the tower side line is pumped out partially to the cryogenic tank area.The T 4 bottom product is fed to T 8 and the top product is used as another feed to T 5 .The T 5 top product is fed to R 2 after a dryer and an arsenic-protected bed, and is eventually provided as a feed to T 7 .R 2 selectively hydrogenates the methylacetylene (MA) and propadiene (PD) from the T 5 top product to produce propylene and propane.The propylene distillation column is a two-tower system, with the top product of T 6 fed to T 7 and the bottom product used for recirculation.The main controlled variable of T 7 is the content of impurities in the top product, the bottom product is fed to T 6 and the product extracted from the tower side line is used for polymerization.

Model Construction
Data preprocessing is required before modeling with historical data, otherwise, it is difficult to obtain accurate models.On the premise of ensuring the stable operation of the tower group, the operators need to complete the target according to the plan.Since the actual operators change shifts by turns, the operation of the tower group will fluctuate, so it is necessary to identify the operating modes of the tower group.In addition, there is coupling between towers, which makes it difficult to build a complex model of a tower group.Establishing a linear model of each tower near a certain operating point is a good choice.

Data Preprocessing
The Hampel method [19,20] was used to process the missing values and outliers in one year's data for the manipulated and controlled variables for T 1 , T 3 , T 5 , T 6 , T 7 .Given the data {x 1 , x 2 , . . . ,x n }, and a sliding window of length k, the i-th median and standard-deviation estimate are shown in ( 1), (2), respectively.
If a sample x i satisfies (3) at a given threshold n σ , then the sample is considered as an outlier and m i is used to replace it.
Since the controlled variables are all impurity contents measured by online analyzers, there is a certain amount of noise in the data.Therefore, the data of the controlled variables are smoothed using the Gaussian-weighted moving average [21] shown in (4).
where xi is the filtered value, p is the window size, and w j is the weighting factor of the Gaussian distribution.

Determination of Operating Modes
Since the towers show certain linear characteristics near the stable operating point, it is necessary to determine the operating modes of the tower group in order to facilitate the construction of the linear model of each tower.Considering that the target products of the ethylene plant are ethylene and propylene, the preprocessed data for the impurity content of the T 3 top product (D1), the impurity content on the 73rd tray of T 6 (D2), and the impurity content of the T 7 top product (D3) are selected and analyzed by a fuzzy C-means clustering method [22][23][24].Using the membership function (MF), the degree of belongingness of a sample to a cluster can be quantified.
Figure 2 shows the performance of clustering when the number of clusters is set to two.The reason for setting the number of clusters to two is that, during the one-year production process, the operation of a tower group may be affected by people or equipment and deviate from the main operating mode, that is, the ideal operating conditions of the tower group.Two clusters correspond to two operating modes; the data with an MF value greater than 0.8 are presented in Figure 3.The number of samples with an MF value above 0.8 under operating mode 1 is 490.In comparison, the number of data under operating mode 2 is larger, which is 2634.Therefore, we regard operating mode 2 as the main operating mode; the 2634 data are used to construct the subsequent tower models.

Subspace Identification
The linear discrete state-space model of each tower is constructed using the method of subspace identification [25].Consider the following state-space model where x ∈ R n is the estimated state, u ∈ R m is the input, y ∈ R p is the output.η and ν are residuals.The system parameter of the regression model is n+m) , and its least squares estimation form is shown in (6).
Equation ( 7) presents the covariance matrices of the residuals.
The stable Kalman gain K can be obtained by solving an algebraic Riccati equation.
Finally, an innovation model ( 9) is obtained.
x(t + 1) where x is the estimate of the state vector, and ê is the estimate of the output prediction error.

Collaborative-Variable-Based State-Space Model
The towers considered for optimization are T 1 , T 3 , T 5 , T 7 .T 6 is the upper tower of the propylene final product.The content of its propylene product is associated with T 7 , so only T 7 is considered in the two-tower system propylene distillation column.
There often exists a direct (connection through pipelines) or indirect (connection without pipelines) connection between towers and this situation needs to be taken into account when building a single tower model.Therefore, we propose the concept of a collaborative variable-a variable that reflects the direct or indirect connection between two towers, which is a controlled variable of one tower and a "manipulated variable" of another tower.For example, the bottom product of T 1 is directly used as a feed to T 5 , the bottom C2 content of T 1 can be considered as a direct collaborative variable to T 5 , the top product of T 1 is hydrogenated and dried as a feed to T 3 , and the top C3 content of T 1 can be considered as an indirect collaborative variable to T 3 .
Variables in the state-space models are given in Table 1.For T 1 , the manipulated variables are the top reflux flow rate u 1 T 1 and the bottom reboiler steam flow rate u 2 T 1 .The controlled variables are the bottom C2 content y 1 T 1 and the top C3 content y 2 T 1 .For T 3 , the manipulated variables are the top reflux flow rate u 1 T 3 and the bottom reboiler steam flow rate u 2 T 3 .The controlled variable is the impurity content y T 3 of the lateral line extraction product.For T 5 , the manipulated variables are the top reflux flow rate u 1 T 5 and the bottom reboiler steam flow rate u 2 T 5 .The controlled variable is the impurity content y T 5 of the top product.For T 7 , the manipulated variables are the top reflux flow rate u 1 T 7 and the bottom reboiler steam flow rate u 2 T 7 .The controlled variable is the impurity content y T 7 of the top product.Considering the connections between towers, the bottom C2 content of T 1 can be considered as a direct collaborative variable c 1 T 1 to T 5 , the top C3 content of T 1 can be considered as an indirect collaborative variable c 2 T 1 to T 3 , and the impurity content y T 5 can be considered as an indirect collaborative variable c T 5 to T 7 .
The n4sid function in MATLAB was used to construct the multi-input and singleoutput (MISO) model of the above towers, and the model with optimal normalized root mean squared error (NRMSE) was selected from the 2nd-10th-order model, respectively.The formulation of NRMSE is shown in (10).
The details of each MISO state-space model identified are shown in Table 2.It can be seen that the identified model has high accuracy and the adopted fuzzy C-means clustering method distinguishes the operating modes well.

Construction of MOCO Problem
In order to facilitate giving the form of the optimization problem, considering one-step prediction, the expression of ( 9) is adjusted to (11).
where T s denotes the discretization time, which in our case is 0.5 h.x(0) denotes the initial state, x(T s ) denotes the next state, and y(T s ) denotes the predicted output of the next step.
The operating cost of the tower group considered is obtained by weighting the top reflux flow rate as well as the bottom reboiler steam flow rate of each tower.Since the data obtained is the content of impurities in the product, minimizing the top impurity content of T3 and T7 is equivalent to maximizing the ethylene and propylene content.The form of the optimization objective is shown in (12), which is a three objective optimization problem, i.e., minimizing the content of impurities in the propylene product f 1 , the content of impurities in the ethylene product f 2 and the total operating cost f 3 .min where u j T i denotes the jth manipulated variable of the ith tower, and I, J are the corresponding index sets in the form shown in (13).The form of each sub-objective in (12) is shown in (14).
i ∈ I = {1, 3, 5, 7}, j ∈ J = {1, 2} R in ( 14) is the weight, c is the collaborative variable, and all collaborative variables are shown as The constraints are divided into manipulated variable constraints (16) and content constraints (17). ) where ⊗ is the Kronecker product, and F and G are parameter vectors.The cnstraints in (17) are the bottom C2 and top C3 content constraint of T 1 .
Combining the above equations, the MOCO problem takes the form shown in (18).min ,i∈I,j∈J s.t.Index sets defined by (13) Sub-objectives defined by ( 14) Collaborative variables defined by (15) Manipulated variable constraints defined by ( 16) Content constraints defined by (17) (18)

Case Study and Discussions
This section describes a MOCO case study for an ethylene plant based on historical data of a chemical plant.The data-sampling interval is 0.5 h, and the data for one year are collected, with 17,523 samples in total.The Hampel and smoothdata function in MATLAB are used for data preprocessing.k in Hampel is set to 20 and the standard deviations are set to 2. A Gaussian method is applied in smoothdata and the window size p is set to 20.A total of 7420 data points are left after data preprocessing, and 2634 data points are used for system identification.The parameters and parameter vectors in the constraints are shown in Table 3 and the coefficient matrices are given in Appendix A.

Parameter
Parameter Platemo, an evolutionary MOO platform developed by Tian et al. [27], is used to design and solve the MOCO problem.The algorithm chosen is NSGA-III [26], which uses adaptive updating of multiple reference points to maintain the population diversity.Compared to MOEA/D [28] and NSGA-II [29], it can give satisfactory results on more 2-15 objective problems.
The number of populations is 105 and 100 iterations are performed using NSGA-III.The final populations all meet the constraint requirements; the obtained Pareto frontiers are shown in Figure 4.
After 100 iterations, the final Pareto frontier approximation is presented as a plane which is shown in Figure 4.In order to increase the purity of the propylene product, it is necessary to increase the reflux flow rates and the reboiler steam flow rates of towers, which leads to an increase in cost.Therefore, the impurity content in the propylene product is negatively related to the total operating cost.However, the impurity content in the ethylene product does not show the above phenomenon, which is unreasonable.A MOO of the T 1 and T 3 is carried out to minimize the operating cost of both towers and the impurity content in the ethylene product; the Pareto frontier after 100 iterations is shown in Figure 5. Analysis of this figure shows that the impurity content in ethylene is negatively related to the cost.The possible reasons for the above anomaly are that the cost of T 5 and T 7 represents a greater proportion of the total operating cost and the purity of propylene product has more room for improvement.Therefore, in our case, the impurity content in the ethylene product has no significant relationship with the total operating cost of the tower group.As for the relationship between propylene and ethylene content, since there is no connection between T 3 and the two propylene distillation columns, these two objectives appear to be relatively independent.Corresponding to Figure 4, the optimal values of the decision variables after 100 iterations are shown in Figure 6.The subplots in each row of the figure correspond to the relationship between the same decision variable and different objectives, and the subplots in each column represent the relationships between the same objective and different decision variables.Analyzing the above three figures from the perspective of decision variables, it can be seen that the top reflux flow rate of T 3 and the bottom reboiler steam flow rate of T 7 are close to the lower boundary.For the objective of total operating cost f 3 , the weight of the T 3 and T 7 reflux flow rates are the largest and the magnitude of these two variables is also large.Thus, the smaller reflux flow rates of T 3 and T 7 imply lower cost.In our case, for the purpose of weighing the objectives and meeting the constraints of T 1 , NSGA-III tends to form a small T 3 reflux flow rate and to reduce the T 7 reflux flow rate.
Since the bottom reboiler steam flow rate magnitude of T 7 is the largest of all the decision variables in the solution set, this variable is also near the lower boundary.In the case of the three objective trade-offs, the solutions for the reflux flow rate of T 1 and T 5 , the reboiler steam flow rates of T 1 and T 5 are mostly close to the lower boundary, with a small number of solutions distributed in other regions.The solutions for the T 3 bottom reboiler steam flow rate, as well as the T 7 reflux flow rate, are the most uniformly distributed of all the manipulated variables.Analyzing Figure 6 from the perspective of the objective, an interesting phenomenon is that, although no connection exists between T 3 and T 7 , there is a complementary relationship between the T 3 bottom reboiler steam flow rate and the T 7 reflux flow rate, i.e., under the same objective, the solutions for one variable are irregularly related to the objective, and the other shows a linear relationship with the objective.The possible reason for the above phenomenon is that the NSGA-III algorithm selects the above two manipulated variables directly related to the objective to maintain the diversity of the population.Based on the small variation in the other decision variables, the algorithm selects the variable with the larger influence on the target of the two variables to increase the range of target variation.Since both the constraints and the model are linear, the remaining one variable shows a certain linear relationship with the objective.
To verify the effectiveness of the proposed method, we further explored the sample data.Figure 7 shows the scatterplot corresponding to the sample data.It can be clearly seen that the total operating costs corresponding to the sample data are all above 1.5 × 10 6 $/h.A sample data point with low cost and moderate impurity content is selected as the most preferred point.Figure 8 gives a comparison of this preference point with the Pareto frontier of the MOCO.The part in the dashed box is the appealing operation area because the total operating cost under this area is below 1.5 × 10 6 $/h and the content of impurities in both ethylene and propylene products is lower than the preference point.The above analysis shows that the proposed MOCO strategy for the ethylene plant can provide rational support for decision-makers to obtain higher product purity at lower cost.

Conculsions
A multi-objective collaborative optimization (MOCO) strategy is proposed for a distillation column group.Based on the preprocessing of historical data, the data for the impurity content of the ethylene and propylene products and the propylene content on the 73rd tray, are selected to determine the operating modes of the tower group by the fuzzy C-means clustering method.Data with a membership function value greater than 0.8 under the main operating mode are selected for subsequent modeling.The state-space models of towers are constructed based on a subspace identification method and collaborative variables.A MOCO case study is performed using MATLAB and Platemo,.The superiority of the proposed method in decision-making is verified by the Pareto-optimal frontier, population analysis and comparison with sample data.

2 Figure 3 .
Figure 3.Samples with MF value above 0.8 under two operating modes.

Figure 6 .
Figure 6.The solution set of the considered decision variables.(a-x) Optimal values of the considered decision variables corresponding to the Pareto-optimal front.

Figure 7 .
Figure 7. Distribution of objective values corresponding to data for modeling.

Figure 8 .
Figure 8.The appealing operation area in the optimal frontier under the guidance of a preference point.

Table 1 .
Variables in models.

Table 2 .
Parameters of models.
e: Variance of white noise.

Table 3 .
Parameters in constraints.