Abstract
Deep learning can realize the approximation of complex functions by learning deep nonlinear network structures, characterizing the distributed representation of input data, and demonstrating the powerful ability to learn the essential features of data sets from a small number of sample sets. A long short-term memory network (LSTM) is a deep learning neural network often used in research, which can effectively extract the dependency relationship between time series data. The LSTM model has many problems such as excessive reliance on empirical settings for network parameters, as well as low model accuracy and weak generalization ability caused by human parameter settings. Optimizing LSTM through swarm intelligence algorithms (SIA-LSTM) can effectively solve these problems. Group behavior has complex behavioral patterns, which makes swarm intelligence algorithms exhibit strong information exchange capabilities. The particle swarm optimization algorithm (PSO) and cuckoo search (CS) algorithm are two excellent algorithms in swarm intelligent optimization. The PSO algorithm has the advantage of being a simple algorithm with fast convergence speed, fewer requirements on optimization function, and easy implementation. The CS algorithm also has these advantages, using the simulation of the parasitic reproduction behavior of cuckoo birds during their breeding period. The SIM-LSTM model is constructed in this paper, and some hyperparameters of LSTM are optimized by using the PSO algorithm and CS algorithm with a wide search range and fast convergence speed. The optimal parameter set of an LSTM is found. The SIM-LSTM model achieves high prediction accuracy. In the prediction of the main control variables in the catalytic cracking process, the predictive performance of the SIM-LSTM model is greatly improved.
1. Introduction
In China, the catalytic cracking process has been vigorously developed in recent years [1]. The catalytic cracking unit as the core production unit for gasoline and diesel in refineries is an important source of revenue for refineries [2]. The informatization of catalytic cracking processes has been greatly developed with the assistance of distributed control systems (DCS) [3]. Using big data processing technology to mine and analyze the accumulated large amount of catalytic cracking operation data and predict the main control variables can improve the stable and safe operation of catalytic cracking units [4,5]. In recent years, research on the effective prediction of process parameters based on deep networks has received widespread attention [6]. LSTM can effectively extract the dependency relationship between time series data [7].
Li et al. [8] proposed a multi-parallel cyclic reservoir (MP-CRJ) with jumps by using data at different time scales to train the echo state network (ESN) neural network, which can reduce the spatial complexity brought by parallel ESN and relatively improve the dynamic diversity of the reserve pool. The actual factory data applied to methanol production can improve prediction accuracy while maintaining prediction speed. Based on principal component analysis (PCA) and the backpropagation (BP) neural network, Liu proposed the PCA-BP prediction model for predicting railway passenger volume. The linear correlation between influencing factors was eliminated through principal component analysis. The PCA-BP neural network improved the accuracy of railway passenger volume prediction [9]. Tao et al. [10] used the TreNet model to predict the future trend of multi-mode fusion and obtained current time series features through Convolutional Neural Network (CNN) learning and time series trend features through LSTM learning. This method has stronger feature extraction and learning ability, and can be used to learn local and global contextual features for predicting time series trends. Chen and Yu [11] proposed a prediction method combining a sparse autoencoder and BP neural network. This method extracted nonlinear features through a sparse autoencoder and used the extracted features as the input of a supervised learning model BP neural network to predict the index return. Sun and Wei [12] adopted the mixed structure of CNN and LSTM to predict the duration of futures prices and positive and negative trends by using the original data series and local trend series. Deng et al. [13] constructed a deep autoencoder by limiting the Boltzmann machine to encode high-dimensional data into low-dimensional space. Then, the regression model of a low-dimensional coding sequence and stock price based on the BP neural network was established. Compared with the original high-dimensional data, the model can reduce the calculation cost and improve the prediction accuracy by using the encoding data. Guo et al. [14] extracted the interdependence between sequence information by combining a fully connected network, Bi-LSTM and CNN, to predict the train running time of complex data sets.
There is also a lot of research progress in the deep network prediction of the catalytic cracking process. Yang et al. [15] developed a special neural network structure to deal with input variables at different time scales. Taking into account the collection characteristics of various variables and the time continuity of large process manufacturing units, the gasoline output prediction model of the catalytic cracking unit was constructed. The new model has better performance than the traditional LSTM network, which contributes to the intelligent upgrade of the FCC process. Tian et al. [16] proposed a data-driven and knowledge-based fusion method (DL-SDG), which used the LSTM with an attention mechanism and convolution layer to predict the future trend of key variables for the prediction and early warning of abnormal conditions in the catalytic cracking process. Yang et al. [17] developed a new FCC hybrid prediction framework by integrating the data-driven deep neural network with the lumped dynamics model with physical significance. The new hybrid model showed good prediction performance in all evaluation criteria, realizing rapid prediction and optimization of FCC and other complex reaction processes.
In nature, many organisms have limited survival capabilities, so they use specific group behaviors to solve problems that individual individuals cannot solve. Group behavior has complex behavioral patterns, which enable groups to exhibit strong information exchange capabilities [18]. In recent years, many scholars hope to achieve better research results by introducing the natural phenomenon of animal group behavior into the field of computer science [19]. With their continuous development, some swarm intelligence algorithms have been developed, such as the particle swarm optimization algorithm, cuckoo search algorithm, and ant colony algorithm. Liu et al. [20] proposed a neural network model optimization method for LSTM based on adaptive hybrid mutation particle swarm optimization (AHMPSO) and attention mechanism (AM). The prediction accuracy of chemical oxygen demand of key characteristics in a wastewater treatment plant was improved. The root mean square error (RMSE) decreased by 7.803–19.499%, the mean absolute error (MAE) decreased by 9.669–27.551%, the mean absolute percentage error (MAPE) decreased by 8.993–25.996%, and the determination coefficient (R2) increased by 3.313–11.229%. Xiao et al. [21] proposed a fault diagnosis method based on cuckoo search algorithm optimization based on back propagation neural network to improve the accuracy of mechanical rolling bearing fault diagnosis. The fault recognition accuracy of the backpropagation neural network optimized by the CS algorithm is 96.25%. In this paper, the PSO algorithm and CS algorithm are used to optimize the LSTM model. Compared with the results in the previous literature, the optimized PSO-LSTM model and CS-LSTM model showed high predictive performance in the prediction of the main control variables in the catalytic cracking process. R2 increased significantly, while MAPE, RMSE, and MAE decreased significantly.
In this paper, the missing value processing, outlier processing, noise processing, and Z-Scores standardization processing are carried out for the data of control parameters and related variables to remove the impact of different dimensions on the data. By using Spearman’s rank correlation coefficient analysis and R-type clustering analysis, feature selection is performed on variables to effectively reduce data dimensionality. The prediction model of LSTM contains many network parameters, such as kernel function, learning rate, number of neurons, etc., which will affect the prediction accuracy of the LSTM model. Because these parameters are too dependent on the experience setting, their prediction accuracy is unstable. At the same time, with the different parameter settings, the length of training time of the LSTM model will also change. The problem of selecting parameters manually can be solved by using the particle swarm optimization algorithm to search some network parameters of the LSTM model. Aiming at problems such as low accuracy and weak generalization ability caused by manual parameter settings, this paper uses PSO and CS with a wide search range and fast convergence speed to optimize some super parameters of LSTM to construct the PSO-LSTM model and CS-LSTM model, respectively, and find out the optimal parameter set for LSTM, so as to better improve the prediction accuracy of the model. The organizational structure of the paper is as follows. In Section 2, the framework of the SIA-LSTM method is introduced in detail. Section 3 shows the excellent performance of the LSTM model optimized by the swarm intelligence algorithm in industrial applications. Section 4 summarizes this paper.
2. Research Method
The SIA-LSTM method is composed of three parts: (1) data preprocessing; (2) feature selection; (3) optimization of LSTM using swarm intelligence algorithms. The framework of SIA-LSTM method is shown in Figure 1.
Figure 1.
The framework of SIA-LSTM method.
In the data processing part, the missing value processing, outlier processing, noise processing, and Z-Scores standardization processing are carried out for the data of control parameters and related variables to remove the impact of different dimensions on the data.
In the feature selection part, Spearman’s rank correlation coefficient analysis and R-type clustering analysis are performed on the preprocessed variables. By calculating Spearman’s correlation coefficient and R-type clustering analysis results, feature selection is performed on the variables, successfully achieving the goal of dimensionality reduction.
In the optimization of LSTM using swarm intelligence algorithm part, the PSO algorithm and CS algorithm, which have a wide search range and fast convergence speed, are used to optimize some super parameters of LSTM. PSO-LSTM model and CS-LSTM model are constructed to find the optimal parameter set for LSTM, so as to better improve the prediction accuracy of the model.
Given the missing values, outliers, and noises in the data set, and different dimensions affecting the data analysis results, this paper conducts data preprocessing and Z-Scores standardization on the data set. Excessive variables can lead to a waste of computational resources and hinder the accuracy of prediction models, requiring feature selection. Many network parameters of the LSTM prediction model can affect the prediction accuracy of the LSTM model. However, these parameters are too dependent on empirical settings, resulting in unstable prediction accuracy. This problem is effectively solved by optimizing LSTM using PSO algorithm and CS algorithm.
2.1. Data Preprocessing
2.1.1. Missing Value Processing
Missing values are undesirable and unavoidable for statisticians and data collectors. Some statistical analysis methods are able to delete the observation record with a missing value directly. When there are fewer missing values, there will be no significant problems. However, when the number of missing values is large, a large amount of information will be lost directly and may lead to wrong conclusions, so a more systematic missing value analysis is needed.
A missing value has a certain regularity, and its missing mode can be roughly divided into the following three manners: missing completely at random, missing at random, and missing not at random. A large number of variables are involved in the process of data mining. When only one variable value is missing, the whole variable set should not be deleted. By describing and diagnosing the missing value, the severity of the missing value is evaluated. The correlation matrix, mean value, and covariance matrix with the missing value are estimated. The expected maximization method or regression algorithm is used to replace the missing values [22].
2.1.2. Outlier Processing
Outliers refer to the observed values that deviate from the whole sample population in the data set. The existence of outliers will reduce the normality of the data and the fitting ability of the model. In the process of data mining, these outliers need to be identified and processed. The detection methods of outliers mainly include cluster analysis, three sigma principle, simple statistics, boxplot method, and t test.
Cluster analysis is to group similar data observations into the same cluster. It mainly includes distance analysis method and density analysis method. Density analysis can discover clusters of any shape. The main idea of this method is that data in a certain region, as long as the density of data points is greater than the preset threshold value, will be divided into clusters similar to it, while data observations less than the threshold value are regarded as outliers. Outliers can be found by using this method [23].
2.1.3. Noise Processing
Noise data refer to the possible deviation or error of measured value relative to actual value when measuring variables. These data will affect the accuracy and effectiveness of subsequent analysis operations. Noise points can be identified by using data smoothing techniques, regression analysis, cluster analysis, and box-splitting methods.
Data smoothing is used to eliminate noise in signals. Kalman filtering is one of the most widely used filtering methods [24]. It can estimate the state of the dynamic system from a series of data with measurement noise, and predict the information of the current state of the system according to the previous state of the system, as shown in Equations (1) and (2).
Among them, yL(l|l − 1) is the predicted value of the current state of the system; yL(l − 1|l − 1) is the optimal value in the previous state of the system; HL(l|l − 1) is the covariance corresponding to state yL(l|l − 1); HL(l − 1|l − 1) is the covariance corresponding to yL(l − 1|l − 1); wL(l) is the process noise, and its covariance is PL; vL(l) is the control variable of the current state; AL is the control input matrix; EL is the state transition matrix.
Through the current state composed of the system-predicted values and measured values CL(l), the optimal estimate of the current state of the system TL(l|l) is calculated, as shown in Equation (3).
Among them, CL(l) is the measured value of the system, and CL(l) = DL × yL + uL(l); DL is the observation matrix; uL is the observation noise, and its covariance is SL; JL(l) is the Kalman gain, as shown in Equation (4).
The covariance of the optimal estimated value under the updated state l is shown in Equation (5).
In the case of a single model and single measurement, IL = 1. When the system is in the l + 1 state, HL(l|l − 1) is the HL(l − 1|l − 1) in Equation (2) [25].
2.1.4. Standardized Processing
Differences in order of magnitude can lead to larger orders of magnitude occupying a dominant position and slow iteration convergence speed. Therefore, it is necessary to preprocess the data to eliminate effects with different dimensions. The standardization of data processing involves converting the original data into a small specific interval, such as 0–1 or −1 to 1, through certain mathematical transformations to eliminate differences in the properties, orders of magnitude, and dimensions, making it easy to comprehensively analyze and compare indicators of different units or orders of magnitude. Z-Scores standardization is used to eliminate the influence of dimensions, as shown in Equation (6) [26].
where Sj is the standard deviation of the data, is the mean of the data.
2.2. Feature Selection
According to the experience of experts, a feature selection method based on correlation analysis and R-type clustering analysis is proposed, which can effectively solve the problem of low correlation of some closely related variables in industrial production.
2.2.1. Spearman’s Rank Correlation Coefficient Analysis
Correlation analysis is to explore the degree and direction of correlation between variables with specific similar relationships, and to study whether there is a certain similar relationship between variables. The study of correlation analysis mainly focuses on the degree of closeness between two variables, which is a statistical method to study random variables. Correlation analysis studies the relationships and properties between variables. It is the fundamental work before data mining.
In feature selection methods, the Pearson or Spearman rank correlation coefficient is often used to quantify the dependence and monotonicity between two sets of data. The Pearson correlation coefficient is shown in Equation (7) [27].
where and are the variable values, is the sample size.
The value of the Pearson correlation coefficient mainly depends on the size and degree distribution of the network. Especially for large-scale networks, it will converge to zero, which seriously hinders the quantitative comparison of different networks. Spearman’s rank correlation coefficient is a non-parametric measure, which is used when the data show an abnormal distribution between two variables. It is defined as the Pearson correlation coefficient between two rank random variables, recording the difference between the positive rank and the negative rank of each data point. For data samples with a sample size of n, xi, and yi are sorted from small to large and converted into rgxi and rgyi, where rgxi and rgyi are the ranks of xi and yi. If the rank after arrangement is the same, the Pearson correlation coefficient is calculated according to Equation (7). Otherwise, r is calculated according to Equation (8). The value ranges from −1 to 1.
where di is the rank difference between xi and yi.
2.2.2. R-Type Clustering Analysis
R-type system clustering analysis uses the average inter-group distance method to cluster variables with high similarity, reducing variable dimensions and making data analysis easier [28]. The flowchart of the R-type clustering method is shown in Figure 2.
Figure 2.
The flow chart of R-type clustering method.
2.3. LSTM Model
The main characteristics of LSTM neural network include the use of gate mechanism to improve the problem of recurrent neural network (RNN) being unable to transmit historical information for a long time, solve the problem of gradient vanishing and exploding during long sequence training, and achieve long-term time series learning [29]. It consists of input gate, output gate, and forgetting gate, which can selectively learn input information and store valuable information in memory units [30]. LSTM is widely used in various fields such as unmanned driving, intelligent robots, speech recognition, and other fields. The structure of LSTM is shown in Figure 3.
Figure 3.
Structure diagram of LSTM neural network.
In Figure 3, ct is the cell status update value at time t; ht−1 and ht respectively represent the output at time t − 1 and time t; Ct−1 and Ct respectively represent the cell state at time t − 1 and t; Xt is the input at time t; ot, it and ft represent output gate, input gate, and forgetting gate, respectively; ot, it and ft are all matrices with the element value range of [0, 1]; tanh and σ are activation functions. Equations (9)–(15) show the information transmission process of LSTM neural network.
Among them, wo, wi, and wf respectively represent the weight matrix of the three gates; bo, bi, and bf respectively represent the bias terms of the three gates; σ is sigmoid function, which is an S-type function that can map the output results of real numbers within the [0, 1] interval. The value range of the tanh function is (−1, 1) [31].
Based on the above LSTM structure diagram and derived equations, the input of the LSTM neural network at time t is determined by the input Xt at the current time and the output ht−1 at time t − 1. The cell status update value ct at time t is determined by the input gate it, which is used to select useful information to update the unit state Ct. The forgetting gate ft is used to control the cell state at t − 1 time and filter storage information. The joint effect of input gate it and forgetting gate ft is to forget worthless information and update useful information to obtain the unit state Ct at time t. The output gate ot is used to control the cell state Ct at time t. Firstly, the tanh function is used to map the value of the cell state Ct to the (−1, 1) interval, and then the input gate ot is used to analyze the output ht at time t.
2.4. Optimization of LSTM Prediction Model by Particle Swarm Optimization
The PSO algorithm can be described as that every bird in the population is a possible solution to the problem to be optimized, and it is called “particle”. All particles will be evaluated using fitness values determined by a function to be optimized [32]. PSO first initializes the state of the particles, obtaining a set of random parameter information. Then, the particles continuously update their positions through the use of local optimal solutions (Pbest) and global optimal solutions (Gbest) during continuous motion, and obtain the optimal parameter set after iteration. Pbest and Gbest respectively represent the optimal position of the particle fitness value of the whole population in the iteration process of spatial position. In the process of iteration, its speed and position are updated by Equations (16) and (17).
Among them, is the current position vector of the particle; is the optimal position for individual particles; is the velocity vector of particle motion; f1 and f2 are random numbers between (0,1); e1 and e2 are the acceleration factors of particles; gc is the inertia factor [33].
The LSTM prediction model contains many network parameters, such as kernel function, learning rate, number of neurons, etc., which will affect the prediction accuracy of the LSTM model. Because these parameters are too dependent on the empirical setting, their prediction accuracy is unstable. At the same time, with the different parameter settings, the length of training time of LSTM model will also change [34]. The problem of selecting parameters manually can be solved by using particle swarm optimization algorithm to search some network parameters of LSTM model [35]. Aiming at the problems such as low accuracy and weak generalization ability of the model caused by manual parameter settings, this paper uses the PSO algorithm with a wide search range and fast convergence speed to optimize some super parameters of LSTM, build the PSO-LSTM model, and find the optimal parameter set for LSTM, so as to better improve the model prediction accuracy. Figure 4 shows the flow chart of PSO-LSTM model.
Figure 4.
The flow chart of PSO-LSTM.
The specific steps of PSO-LSTM prediction model construction are as follows:
Step 1: Data preprocessing. The variable data are collected and the training set and testing set of the model are obtained through data preprocessing.
Step 2: Initialize parameters. Initialize the parameters of PSO, such as learning factor, number of iterations and particles, and so on. At the same time, the optimization range of LSTM parameters such as learning rate, time step, batch processing capacity, and number of hidden layer units is set.
Step 3: Evaluate particles. The mean values of MAE and RMSE between the real and predicted values of the test set in the LSTM model are taken as the particle fitness values. The obtained value is compared with the initial value Pbest and Gbest to obtain the optimal Pbest and Gbest.
Step 4: Update the position and velocity of the particles.
Step 5: Determine the termination condition. If the number of iterations reaches the preset value, stop searching and output the optimal parameter group. If not, return to Step 3 for iterative optimization.
After initializing the velocity and position of particles, PSO algorithm iteratively updates its position to obtain the optimal solution of the model. PSO algorithm has advantages such as good robustness and fast convergence speed for multi-dimensional space function optimization and multi-objective optimization, and is suitable for some hyperparameter optimization in neural networks, providing a theoretical basis for model combination. The PSO-LSTM prediction model optimized by particle swarm optimization algorithm achieves the best prediction effect by continuously iterating and searching for the optimal value of parameters within the given parameter range.
2.5. Optimization of LSTM Prediction Model by Cuckoo Algorithm
Cuckoo search algorithm is a simulation of the parasitic reproduction behavior of cuckoo birds during their breeding period. Cuckoo nests that lay eggs by means of parasitic brood rearing are a solution to the optimization problem, which needs to be updated by cuckoos’ flight. The process of cuckoos’ nest search represents the updated operation of solution. Cuckoo birds use unique Lévy flights and random walks to find new nests [36].
The CS algorithm is based on the following rules:
Rule 1: Each cuckoo lays only one egg at a time, and a nest is randomly selected for hatch;
Rule 2: From randomly selected bird nests, the nest with the best quality eggs is reserved for the next generation;
Rule 3: The number of nests is fixed, the probability that the original owner found the alien eggs is Pa, and Pa ∈ [0, 1]. When the original owner finds the eggs, they will throw them away or abandon the nest, and find another place to rebuild the nest [37].
Aiming at problems such as low accuracy and weak generalization ability of the LSTM model caused by manual parameter settings, this section adopts CS algorithm with good global and local search ability and fast convergence speed to optimize some super parameters of LSTM, build CS-LSTM model, and find the optimal parameter set for LSTM, so as to better improve the prediction accuracy of the model. Figure 5 shows the flow chart of CS-LSTM model.
Figure 5.
The flow chart of CS-LSTM.
The specific steps of CS-LSTM prediction model construction are as follows:
Step 1: Data preprocessing. Collect variable data and obtain the training and testing sets of the model through data preprocessing.
Step 2: Initialize parameters. Initialize the parameters of CS, such as learning factor, iteration number, and so on. At the same time, the optimization range of LSTM parameters such as learning rate, time step, batch processing capacity, and number of hidden layer units is set.
Step 3: Set the dimension D of the solution of the actual problem. The maximum number of iterations is L; Cuckoo population size N in the algorithm; C The probability of a cuckoo bird’s egg being discovered is fa. By randomly generating n cuckoo nest locations, , i = 1, 2, …, n, calculate the fitness value, and determine and save the current optimal nest location Gbest.
Step 4: Update the current nest position, calculate and compare the fitness values of the child nest and the parent nest, and save the nest position with better fitness value to the next generation, , i = 1, 2, …, n.
Step 5: Compare the value of the optimal fitness function of the previous generation and update the position Gbest of the optimal solution of this generation. Judge whether the algorithm meets the convergence condition. If it meets the convergence condition, output the optimal position Gbest; otherwise, jump back to Step 4 and repeat the optimization iteration until L iterations are completed [38].
Step 6: Determine the termination condition. If met, then stop the search, and output the optimal parameter set. If not, then return for iterative optimization.
2.6. Predictive Performance Evaluation Index
This paper mainly selects four indicators as the model evaluation indicators, namely, determination coefficient, mean absolute error, root mean square error, and mean absolute percentage error, so as to evaluate the prediction effect of the model more comprehensively and accurately. The calculation formula is as follows:
- (1)
- Determination coefficient
R2 is the coefficient of determination, which measures how well the predicted value fits the real value. R2 ranges from (0 to 1), and the closer it is to 1, the better model fitting is.
- (2)
- Mean absolute error
The mean absolute error can be obtained by directly calculating the mean value of the residual error, which represents the mean value of the absolute error between the predicted and observed values. The mean absolute error is a linear fraction in which all observed individual differences are weighted over the mean. The calculation formula is shown in Equation (18).
- (3)
- Root mean square error
The root mean square error is the square root of the expected value of the difference between the estimated value and the true value of the data. The root mean square error refers to the sum of the squares of the distances between each data deviation from the true value, which is then averaged and squared. The smaller the index, the higher the accuracy. The calculation formula is shown in Equation (19).
- (4)
- Mean absolute percentage error
The mean absolute percentage error is the percentage value, that is, the expected value of the relative error loss, which is the percentage of the absolute error and the true value. In model prediction, the smaller the mean absolute percentage error, the better the accuracy of the model. The calculation formula is shown in Equation (20).
Among them, n is the number of samples; ri is the actual value; pi is the predicted value.
3. Industrial Applications
3.1. Process Description
The case in this paper comes from a catalytic cracking unit of a petrochemical enterprise. Over two years, 347,520 observations are collected every 3 min. The collection time of all variables in the device is the same. Figure 6 shows the process of the catalytic cracking unit [26]. It mainly consists of R-1 reactor, R-2 regenerator, B-1 raw material buffer tank, C-1 fractionation column, C-2 diesel stripper, C-3 absorption column, C-4 desorption column, C-5 stabilization column, and C-6 reabsorption column. S-1 is the buffer tank. B-2 is the oil and gas separator at the top of the fractionation tower. B-3 is the top return tank of the absorption tower. B-4 is the stabilizer top return tank. P-1, P-2, …, P-8 are pumps. E-1, E-2, …, E-6 are heating devices. The output includes OUTPUT1 light diesel, OUTPUT2 liquefied gas, OUTPUT3 stable gas, OUTPUT4 dry gas, and OUTPUT5 rich absorption oil.
Figure 6.
The flow chart of the catalytic cracking unit.
The training set and test set of the LSTM model, PSO-LSTM model, and CS-LSTM model are constructed by the main control variables. The test set is predicted by learning the complex relationships between the variables of the training set and their long-term sequence memory. In the prediction of the main control variable, the variable after feature selection is selected as the input variable, and the main control variable is the prediction variable. In this paper, representative cases of reaction temperature (TIC1001) prediction and liquid level at the bottom of the fractionator (LIC2001) prediction are selected for an explanation.
As one of the main control parameters of the catalytic cracking process, the reaction temperature can have a significant impact on the depth of the reaction, which can change the quality and yield of the product. The liquid level at the bottom of the fractionating column is an important parameter for the whole column operation. The change in liquid level at the bottom of the column reflects the state of material balance and heat balance of the whole column, which has a significant impact on the operation of the entire column.
3.2. Data Preprocessing of Main Control Variables
When a few variables in the data set have missing values, the missing values are replaced by the expectation maximization method. The Kalman filtering method is used to smooth the data and Z-Scores standardization is carried out on all variables to eliminate the effect of different dimensionality. The standard deviation of the converted data is one and the mean is zero.
3.3. Feature Selection of TIC1001 and LIC2001
After communicating with field experts, and comprehensively considering mechanism analysis, process principles, and practical working experience, 20 variables related to reaction temperature are selected [16,39]. There are many variables in the catalytic cracking process; however, excessive variables can make data analysis more difficult. Only variables related to TIC1001 are considered. The Spearman rank correlation values between TIC1001 and its related variables are shown in Table 1, and the R-type clustering results between TIC1001 and its related variables are shown in Figure 7.
Table 1.
Main variables related to TIC1001.
Figure 7.
R-type clustering results of main variables related to TIC1001.
The Spearman rank correlation coefficient values between TIC1001 and its related variables in Table 1 and the R-type clustering results between TIC1001 and its related variables in Figure 7 intuitively reflect the relationship between reaction temperature and its related variables. The vertical axis in Figure 7 shows the clustering order between variables, and the horizontal axis shows that these variables can be divided into several categories when given different distances. Based on the Spearman rank correlation coefficient values between TIC1001 and its related variables, R-type clustering results between TIC1001 and its related variables, and expert experience, 12 variables are obtained through feature selection, as shown in Table 2. Through feature selection, the dimensionality of data variables is effectively reduced.
Table 2.
After feature selection main variables related to TIC1001.
After communicating with field experts, and comprehensively considering mechanism analysis, process principles, and practical working experience, 19 variables related to the liquid level at the bottom of the fractionator are selected. The Spearman rank correlation values between LIC2001 and its related variables are shown in Table 3, and the R-type clustering results between LIC2001 and its related variables are shown in Figure 8.
Table 3.
Main variables related to LIC2001.
Figure 8.
R-type clustering results of main variables related to LIC2001.
The Spearman rank correlation coefficient values between LIC2001 and its related variables in Table 3 and the R-type clustering results between LIC2001 and its related variables in Figure 8 intuitively reflect the relationship between the liquid level at the bottom of the fractionator and its related variables. The vertical axis in Figure 8 shows the clustering order between variables, and the horizontal axis shows that these variables can be divided into several categories when given different distances. Based on the Spearman rank correlation coefficient values between LIC2001 and its related variables, R-type clustering results between LIC2001 and its related variables, and expert experience, 12 variables are obtained through feature selection, as shown in Table 4.
Table 4.
After feature selection main variables related to LIC2001.
3.4. Prediction Cases of TIC1001 and LIC2001
In the prediction of the main control variable reaction temperature, 11 variables after feature selection are selected as input variables, as shown in Table 2. In the prediction, 2400 continuous time series data are selected, including 2000 data for the training set and 400 data for the testing set. The prediction results of the TIC1001 training set and testing set are shown in Figure 9 and Figure 10, respectively. TIC1001 temperature control is in the normal range of 500~530 °C. The real value and model predicted value of TIC1001 in part of the testing set are given in Table 5. The evaluation of the prediction results under different indicators of TIC1001 is shown in Table 6.
Figure 9.
TIC1001 predicted results of training set.
Figure 10.
TIC1001 predicted results of testing set.
Table 5.
The real value and model predicted value of TIC1001.
Table 6.
Evaluation of prediction results for different indicators of TIC1001.
As shown in Figure 9, Figure 10, and Table 6, TIC1001 achieves good prediction results in both the training and testing sets of the PSO-LSTM model and the CS-LSTM model. The predicted value curve of the PSO-LSTM model is close to the true value, with the R2 value of 0.9867, MAE value of 0.0081, RMSE value of 0.0104, and MAPE value of 0.0016%. The R2 index of the PSO-LSTM model increased by 3.17%, the MAE decreased by 44.90%, the RMSE decreased by 44.09%, and the MAPE decreased by 42.86%, achieving good model prediction results. The predicted value curve of the CS-LSTM model is also close to the true value, with the R2 value of 0.9836, MAE value of 0.0087, RMSE value of 0.0110, and MAPE value of 0.0017%. The R2 index of the CS-LSTM model increased by 2.95%, the MAE decreased by 40.82%, the RMSE decreased by 40.86%, and the MAPE decreased by 39.29%, achieving good model prediction results. In the comparison of prediction results among different models, it can be seen that the PSO-LSTM model has relatively better prediction results and smaller errors.
In the prediction of the main control variable liquid level at the bottom of the fractionator, 11 variables after feature selection are selected as input variables, as shown in Table 4. In the prediction, 2400 continuous time series data are selected, including 2000 data for the training set and 400 data for the testing set. The prediction results of the LIC2001 training set and testing set are shown in Figure 11 and Figure 12, respectively. The liquid level of LIC2001 is controlled in the normal range of 40~80%. The real value and model predicted value of LIC2001 in part of the testing set are given in Table 7. The evaluation of the prediction results under different indicators of LIC2001 is shown in Table 8.
Figure 11.
LIC2001 predicted results of training set.
Figure 12.
LIC2001 predicted results of testing set.
Table 7.
The real value and model predicted value of LIC2001.
Table 8.
Evaluation of prediction results for different indicators of LIC2001.
As shown in Figure 11, Figure 12, and Table 8, LIC2001 achieves good prediction results in both the training and testing sets of the PSO-LSTM model and the CS-LSTM model. The predicted value curve of the PSO-LSTM model is close to the true value, with the R2 value of 0.9945, MAE value of 0.1386, RMSE value of 0.1220, and MAPE value of 0.2491%. The R2 index of the PSO-LSTM model increased by 3.20%, the MAE decreased by 70.72%, the RMSE decreased by 78.16%, and the MAPE decreased by 70.23%, achieving good model prediction results. The predicted value curve of the CS-LSTM model is also close to the true value, with the R2 value of 0.9887, MAE value of 0.2533, RMSE value of 0.3115, and MAPE value of 0.4523%. The R2 index of the CS-LSTM model increased by 2.60%, the MAE decreased by 46.48%, the RMSE decreased by 44.23%, and the MAPE decreased by 45.95%, achieving good model prediction results. In the comparison of prediction results among different models, it can be seen that the PSO-LSTM model has better prediction results and smaller errors.
In comparison with the research results of related research literature, the multiple-level LSTM (ML-LSTM) was used to predict the yield of FCC gasoline [15]. The RMSE and R2 values of the ML-LSTM model are 0.5851 and 0.4384, respectively. The results of R2 and RMSE in this paper are better. In the AHMPSO-AM-LSTM model, to predict the chemical oxygen demand of key characteristics in the wastewater treatment plant, the RMSE decreased by 7.803–19.499%, MAE decreased by 9.669–27.551%, MAPE decreased by 8.993–25.996%, and R2 increased by 3.313–11.229% [20]. In contrast, the RMSE, MAE, and MAPE of the model in this paper are reduced more.
4. Conclusions
In this paper, the missing value processing, outlier processing, noise processing, and Z-Scores standardization processing are carried out for the data of control parameters and related variables to remove the impact of different dimensions on the data. Twelve variables related to TIC1001 and 12 variables related to LIC2001 are obtained. A novel SIA-LSTM method is proposed. The PSO-LSTM model and CS-LSTM model are constructed by using the PSO algorithm and CS algorithm with a wide search range and fast convergence speed to optimize some LSTM super parameters. Compared with the prediction results of the LSTM model, R2, RMSE, MAE, and MAPE indexes of the PSO-LSTM model and CS-LSTM model show excellent prediction performance. In the prediction of reaction temperature, the R2 index of the PSO-LSTM model increased by 3.17%, the MAE decreased by 44.90%, the RMSE decreased by 44.09%, and the MAPE decreased by 42.86%. The R2 index of the CS-LSTM model increased by 2.95%, the MAE decreased by 40.82%, the RMSE decreased by 40.86%, and the MAPE decreased by 39.29%. In the prediction of the liquid level at the bottom of the fractionator, the R2 index of the PSO-LSTM model increased by 3.20%, the MAE decreased by 70.72%, the RMSE decreased by 78.16%, and the MAPE decreased by 70.23%. The R2 index of the CS-LSTM model increased by 2.60%, the MAE decreased by 46.48%, the RMSE decreased by 44.23%, and the MAPE decreased by 45.95%. Both the PSO-LSTM model and CS-LSTM model proposed in this paper have high accuracy and feasibility in predicting the main control variables of the catalytic cracking process.
The potential research directions in the future will include the further optimization and improvement of prediction algorithms to achieve rapid and effective identification and prediction of major control states in industrial production. Further research will be conducted to combine predictive models with early warning models to reduce the occurrence of safety accidents.
Author Contributions
J.H.: Conceptualization, methodology, software, formal analysis, writing—original draft preparation, and validation. W.T.: writing—review and editing, supervision, project administration, and funding acquisition. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Cui, Z.; Tian, W.; Wang, X.; Fan, C.; Guo, Q.; Xu, H. Safety integrity level analysis of fluid catalytic cracking fractionating system based on dynamic simulation. J. Taiwan Inst. Chem. Eng. 2019, 104, 16–26. [Google Scholar] [CrossRef]
- Huang, M.; Zheng, Y.; Li, S. Distributed economic model predictive control for an industrial fluid catalytic cracking unit ensuring safe operation. Control Eng. Pract. 2022, 126, 105263. [Google Scholar] [CrossRef]
- Xiong, S.; Zhou, L.; Dai, Y.; Ji, X. Attention-based long short-term memory fully convolutional network for chemical process fault diagnosis. Chin. J. Chem. Eng. 2023, 56, 1–14. [Google Scholar] [CrossRef]
- Agudelo, C.; Anglada, F.; Cucarella, E.; Moreno, E. Integration of techniques for early fault detection and diagnosis for improving process safety: Application to a Fluid Catalytic Cracking refinery process. J. Loss Prev. Process Ind. 2013, 26, 660–665. [Google Scholar] [CrossRef]
- Taira, G.R.; Park, S.W.; Zanin, A.C. Fault detection in a fluid catalytic cracking process using bayesian recurrent neural network. IFAC-PapersOnLine 2022, 55–57, 715–720. [Google Scholar] [CrossRef]
- Shah, P.; Choi, H.K.; Kwon, J.S.I. Achieving optimal paper properties: A layered multiscale kMC and LSTM-ANN-based control approach for kraft pulping. Processes 2023, 11, 809. [Google Scholar] [CrossRef]
- Kim, H.J.; Baek, S.W. Application of wearable gloves for assisted learning of sign language using artificial neural networks. Processes 2023, 11, 1065. [Google Scholar] [CrossRef]
- Li, H.; Zhao, W.; Shi, Y.; Li, J. MP-CRJ: Multi-parallel cycle reservoirs with jumps for industrial multivariate time series predictions. Trans. Inst. Meas. Control 2022, 44, 2093–2105. [Google Scholar] [CrossRef]
- Liu, L. Research of railway passenger volume forecast model based on PCA-BP neural network. Compr. Transp. 2016, 8, 43–47. [Google Scholar]
- Tao, L.; Tian, G.; Aberer, K. Hybrid neural networks for learning the trend in time series. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 2273–2279. [Google Scholar]
- Chen, Y.; Yu, W. A forecasting model of financial market index based on sparse autoencoder. J. Appl. Stat. Manag. 2021, 40, 93–104. [Google Scholar]
- Sun, H.; Wei, X. Research on stock index futures forecast based on trend learning and hybrid neural network. China J. Econom. 2021, 1, 921–934. [Google Scholar]
- Deng, X.; Wan, L.; Huang, N. Stock Prediction Research Based on DAE-BP Neural Network. Comput. Eng. Appl. 2019, 55, 126–132. [Google Scholar]
- Guo, J.; Wang, W.; Tang, Y.; Zhang, Y.; Zhuge, H. A CNN-Bi_LSTM parallel network approach for train travel time prediction. Knowl.-Based Syst. 2022, 256, 109796. [Google Scholar] [CrossRef]
- Yang, F.; Sang, Y.; Lv, J.; Cao, J. Prediction of gasoline yield in fluid catalytic cracking based on multiple level LSTM. Chem. Eng. Res. Des. 2022, 185, 119–129. [Google Scholar] [CrossRef]
- Tian, W.; Wang, S.; Sun, S.; Li, C.; Lin, Y. Intelligent prediction and early warning of abnormal conditions for fluid catalytic cracking. Chem. Eng. Res. Des. 2022, 181, 304–320. [Google Scholar] [CrossRef]
- Yang, F.; Dai, C.; Tang, J.; Xuan, J.; Cao, J. A hybrid deep learning and mechanistic kinetics model for the prediction of fluid catalytic cracking performance. Chem. Eng. Res. Des. 2020, 155, 202–210. [Google Scholar] [CrossRef]
- Wu, C.; Wang, C.; Kim, J.W. Welding sequence optimization to reduce welding distortion based on coupled artificial neural network and swarm intelligence algorithm. Eng. Appl. Artif. Intell. 2022, 114, 1105142. [Google Scholar] [CrossRef]
- Hu, S.; Yin, H. Research on the optimum synchronous network search data extraction based on swarm intelligence algorithm. Future Gener. Comput. Syst. 2021, 125, 151–155. [Google Scholar] [CrossRef]
- Liu, X.; Shi, Q.; Liu, Z.; Yuan, J. Using LSTM Neural Network Based on Improved PSO and Attention Mechanism for Predicting the Effluent COD in a Wastewater Treatment Plant. IEEE Access 2021, 9, 146082–146096. [Google Scholar] [CrossRef]
- Xiao, M.; Liao, Y.; Bartos, P.; Filip, M.; Geng, G.; Jiang, Z. Fault diagnosis of rolling bearing based on back propagation neural network optimized by cuckoo search algorithm. Multimed. Tools Appl. 2022, 81, 1567–1587. [Google Scholar] [CrossRef]
- Luo, L.; Bao, S.; Peng, X. Robust monitoring of industrial processes using process data with outliers and missing values. Chemom. Intell. Lab. Syst. 2019, 192, 103827. [Google Scholar] [CrossRef]
- Wang, B.; Mao, Z. Outlier detection based on Gaussian process with application to industrial processes. Appl. Soft Comput. 2019, 76, 505–516. [Google Scholar] [CrossRef]
- Yang, F.; Yang, J. Multiband-structured Kalman filter. IET Signal Process. 2018, 12, 722–728. [Google Scholar] [CrossRef]
- Garcia, R.V.; Pardal, P.C.P.M.; Kuga, H.K.; Zanardi, M.C. Nonlinear filtering for sequential spacecraft attitude estimation with real data: Cubature Kalman Filter, Unscented Kalman Filter and Extended Kalman Filter. Adv. Space Res. 2019, 63, 78320–78344. [Google Scholar] [CrossRef]
- Hong, J.; Qu, J.; Tian, W.D.; Cui, Z.; Liu, Z.J.; Lin, Y.; Li, C.K. Identification of unknown abnormal conditions in catalytic cracking process based on two-step clustering analysis and signed directed graph. Processes 2021, 9, 2055. [Google Scholar] [CrossRef]
- Liu, N.; Wang, J.; Sun, S.L.; Li, C.K.; Tian, W.D. Optimized principal component analysis and multi-state Bayesian network integrated method for chemical process monitoring and variable state prediction. Chem. Eng. J. 2022, 430, 132617. [Google Scholar] [CrossRef]
- Xue, W. Statistical Analysis and SPSS Application; China Renmin University Press: Beijing, China, 2017; pp. 269–272. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Shu, X.; Zhang, L.; Sun, Y.; Tang, J. Host–parasite: Graph LSTM-in-LSTM for group activity recognition. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 663–674. [Google Scholar] [CrossRef]
- Zhang, Z.; Lv, Z.; Gan, C.; Zhu, Q. Human action recognition using convolutional LSTM and fully-connected LSTM with different attentions. Neurocomputing 2020, 410, 304–316. [Google Scholar] [CrossRef]
- Ang, K.M.; Lim, W.H.; Isa, N.; Tiang, S.; Wong, C. A constrained multi-swarm particle swarm optimization without velocity for constrained optimization problems. Expert Syst. Appl. 2020, 140, 112882. [Google Scholar] [CrossRef]
- Kassoul, K.; Zufferey, N.; Cheikhrouhou, N.; Belhaouari, S.B. Exponential Particle Swarm Optimization for Global Optimization. IEEE Access 2022, 10, 78320–78344. [Google Scholar] [CrossRef]
- Suebsombut, P.; Sekhari, A.; Sureephong, P.; Bouras, A. Field data forecasting using LSTM and Bi-LSTM approaches. Appl. Sci. 2021, 11, 11820. [Google Scholar] [CrossRef]
- Tsai, H.-C. Unified particle swarm delivers high efficiency to particle swarm optimization. Appl. Soft Comput. 2017, 55, 371–383. [Google Scholar] [CrossRef]
- Tang, H. Cuckoo search algorithm with different distribution strategy. Int. J. Bio-Inspired Comput. 2019, 13, 234–241. [Google Scholar] [CrossRef]
- Gao, S.; Gao, Y.; Zhang, Y.; Xu, L. Multi-Strategy Adaptive Cuckoo Search Algorithm. IEEE Access 2019, 7, 137642–137655. [Google Scholar] [CrossRef]
- Rajabioun, R. Cuckoo Optimization Algorithm. Appl. Soft Comput. 2011, 11, 5508–5518. [Google Scholar] [CrossRef]
- Wang, S.; Tian, W.; Li, C.; Cui, Z.; Liu, B. Mechanism-based deep learning for tray efficiency soft-sensing in distillation process. Reliab. Eng. Syst. Saf. 2023, 231, 109012. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).