Machine Learning-Based Prediction of Controlled Variables of APC Systems Using Time-Series Data in the Petrochemical Industry

: For decades, the chemical industry has been facing challenges including energy conservation, environmental protection, quality improvement, and increasing production efﬁciency. To address these problems, various methods are being studied, such as research on fault diagnosis for the efﬁcient use of facilities and medium-term forecasting with small data, where many systems are being applied to improve production efﬁciency. The problem considered in this study is the problem of predicting time-series Controlled Variables (CV) with machine learning, which is necessary to utilize an Advanced Process Control (APC) system in a petrochemical plant. In an APC system, the most important aspect is the prediction of the controlled variables and how the predicted values of the controlled variables should be modiﬁed to be in the user’s desired range. In this study, we focused on predicting the controlled variables. Speciﬁcally, we utilized various machine learning techniques to predict future controlled variables based on past controlled variables, Manipulated Variables (MV), and Disturbance Variables (DV). By using a time delay as a parameter and adjusting its value, you can analyze the relationship between past and future data and improve forecasting performance. Currently, the APC system is controlled through mathematical modeling and research, The time-series data of controlled variables, manipulated variables, and disturbance variables are predicted through machine learning models to compare performance and measure accuracy. It is becoming important to change from mathematical prediction models to data-based machine learning predictions. The R-Squared (R 2 ) and Mean Absolute Percentage Error (MAPE) metric results of this study demonstrate the feasibility of introducing an APC system using machine learning models in petrochemical plants.


Introduction
Petrochemical plants process raw materials related to petroleum to produce various chemicals. These plants utilize a variety of processes-including petroleum refining, chemical reactions, separation, and purification-to produce a variety of consumer products, including plastics, fibers, polymers, artificial rubber, and lubricants. Petrochemical plants are recognized as a globally important industry, contributing to energy, economies, and industrial development.
In general, the types of plants can be divided into intermittent and continuous processes. Among them, most petrochemical plants utilize continuous processes, where many substances are introduced continuously at the same time, and the quality of the product varies depending on the influence of each substance [1,2]. Many continuous process petrochemicals are controlled using Proportional Integral Differential (PID) control, which is a feedback control technique that does not consider multiple variables [3]. Since this approach does not consider multiple variables, there has been a trend of switching to Advanced Process Control (APC), which is a feedforward control with excellent performance [4]. APC systems are advanced control technologies used in process and manufacturing systems to control the complex behavior of processes by utilizing advanced mathematical modeling and optimization algorithms. This is due to the inclusion of Model Predictive Control (MPC), which is a technology that optimizes control behavior by considering various constraints based on future prediction models. APC, including MPC technology, enables more accurate control by controlling the relationship between the Controlled Variables (CV) to be controlled and the Disturbance Variables (DV) that cannot be controlled as well as the Manipulated Variables (MV) that can be controlled with a model [5]. In addition, real-time monitoring, feedback control, and limit condition management control functions are used to improve accuracy. The application of such systems and control methods aims to address the issues faced by not only the petrochemical industry but also the overall chemical industry, including energy conservation, environmental protection, quality improvement, and increasing production efficiency [6,7].
The benefits of switching to an APC system, such as increased production and reduced costs, have been documented in several papers. In the oil industry, APC systems have been utilized to increase crude oil production by 10% [8], and in the petroleum refining industry, APC systems have been applied to improve planning system performance [9,10]. In the petrochemical industry, where the data for this paper was obtained, APC systems have also been applied to minimize cost-effective glycol losses in natural gas dehydration plants [11]. APC systems are also being used in technologies that control the indoor environment in buildings. We quantified the number of papers involving efficiency improvements by applying APC systems for heating, ventilation, and air conditioning (HVAC), and found that the number of papers reporting energy savings of 10.1-15% by applying APC and MPC compared to conventional control systems was the highest with 17, followed by 20.1-25.0% with 13. A total of 80 cases were investigated [12][13][14].
The APC system introduced earlier has the disadvantage of not being able to respond to changes in factory equipment or conditions because the data is created with only the factory conditions at the time the model was created. Moreover, it does not respond to the decline in performance due to aging facilities and does not consider the temperature difference between summer and winter. Therefore, it is necessary to study the data of the current factory conditions to predict the controlled variables of the APC and manipulate them by entering the optimal values of the controlled variables. In this study, we investigated the controllability by learning the time-series data with machine learning and improving the control accuracy.
Prediction of time-series data through machine learning is already being actively studied in various fields. For example, time-series data prediction is used to predict climate variables such as rainfall [15], wind speed [16], and other phenomena [17]. In the financial field, stock price prediction is also being actively studied [18,19]. The petrochemical industry is also active in this area, with researchers using artificial neural networks to Processes 2023, 11,2091 3 of 23 predict the production of petrochemical products [20] and to predict the temperature of distillation columns [21][22][23][24].
The major contributions of this paper can be summarized as follows: First, we solved the problem of predicting controlled variables in APC systems with machine learning and confirmed its feasibility. Second, the performance of feature engineering for predicting controlled variables was determined. Feature engineering is the process of extracting new features from existing data that are easier for a prediction model to learn. It is a method utilized to improve prediction performance. Finally, comparison of multiple machine learning models was carried out. A variety of machine learning models were applied, including Random Forest, Neural Network, k-Nearest Neighbor, Support Vector Regression, and XGBoost. The R-Squared and MAPE squared metrics were applied to find the model with the best predictions.
The rest of the paper is organized as follows. Section 2 describes PID and APC systems for process control in more detail and introduces the machine learning algorithms used in this study. Section 3 presents an overview of this study and describes the time-series data from a petrochemical plant and the feature engineering for it. We also present the modeling and hyper-parameter tuning process of the machine learning models used in this study. In Section 4, we present and evaluate the performance of the machine learning models based on the datasets and evaluation metrics used in the experiments. Conclusions are given in Section 5, along with plans for future work. The Classical Controller consists of an On/Off control where the most commonly used control techniques are P, PI, and PID control. PID control, which stands for Proportional-Integral-Derivative control, is a common control algorithm widely used in automatic control systems. PID control measures the current state of the system and calculates control inputs based on that to maintain or regulate the system at a desired target state. P control is proportional control, meaning that it generates control inputs that are proportional to the error between the current state and target state. The error cannot be eliminated, so the exact target state may not be reached. PI control is a control method that adds integral control to P control. The integral control calculates the accumulated error and adjusts the control input. The integral action, which can cause instability in the system, must be controlled. PID control is a control method that adds differential control to the functions of P control and PI control. A block diagram for PID is shown in Figure 1. Differential control tracks the rate of change of the error and adjusts the control input. This allows the system instability to be predicted and the control input to be adjusted appropriately to improve the response time and reduce oscillations [4,6].
The major contributions of this paper can be summarized as follows: First, we solved the problem of predicting controlled variables in APC systems with machine learning and confirmed its feasibility. Second, the performance of feature engineering for predicting controlled variables was determined. Feature engineering is the process of extracting new features from existing data that are easier for a prediction model to learn. It is a method utilized to improve prediction performance. Finally, comparison of multiple machine learning models was carried out. A variety of machine learning models were applied, including Random Forest, Neural Network, k-Nearest Neighbor, Support Vector Regression, and XGBoost. The R-Squared and MAPE squared metrics were applied to find the model with the best predictions.
The rest of the paper is organized as follows. Section 2 describes PID and APC systems for process control in more detail and introduces the machine learning algorithms used in this study. Section 3 presents an overview of this study and describes the time-series data from a petrochemical plant and the feature engineering for it. We also present the modeling and hyper-parameter tuning process of the machine learning models used in this study. In Section 4, we present and evaluate the performance of the machine learning models based on the datasets and evaluation metrics used in the experiments. Conclusions are given in Section 5, along with plans for future work.

PID System
The Classical Controller consists of an On/Off control where the most commonly used control techniques are P, PI, and PID control. PID control, which stands for Proportional-Integral-Derivative control, is a common control algorithm widely used in automatic control systems. PID control measures the current state of the system and calculates control inputs based on that to maintain or regulate the system at a desired target state. P control is proportional control, meaning that it generates control inputs that are proportional to the error between the current state and target state. The error cannot be eliminated, so the exact target state may not be reached. PI control is a control method that adds integral control to P control. The integral control calculates the accumulated error and adjusts the control input. The integral action, which can cause instability in the system, must be controlled. PID control is a control method that adds differential control to the functions of P control and PI control. A block diagram for PID is shown in Figure 1. Differential control tracks the rate of change of the error and adjusts the control input. This allows the system instability to be predicted and the control input to be adjusted appropriately to improve the response time and reduce oscillations [4,6].

APC System
Advanced Process Control (APC) is a type of control algorithm that uses a system behavior model to predict the future sequence of control inputs and selects the optimal control input at the current time based on it. It is a set of control techniques used in the field of process control [25,26]. APC aims to improve the performance and maintain the stability of the process by utilizing various control algorithms, modeling techniques, and optimization techniques. APC includes model predictive control (MPC), which uses mathematical models to predict the dynamic characteristics of a process, optimization techniques [27,28], multivariable control, which manipulates multiple variables simultaneously, real-time monitoring, feedback control, and constraint management control [10,29].
Among the functions of APC, model predictive control (MPC) refers to a class of algorithms that calculate a series of manipulated variable adjustments to optimize the plant's behavior. A block diagram of APC is shown in Figure 2 below. Industrial MPC controllers typically evaluate the future CV behavior over a finite future time interval called the prediction horizon [7]. APCs are widely used in industry to automate and optimize processes, providing benefits such as energy efficiency, increased productivity, and improved quality. Advanced Process Control (APC) is a type of control algorithm that uses a sys behavior model to predict the future sequence of control inputs and selects the opti control input at the current time based on it. It is a set of control techniques used in the f of process control [25,26]. APC aims to improve the performance and maintain the stab of the process by utilizing various control algorithms, modeling techniques, optimization techniques. APC includes model predictive control (MPC), which u mathematical models to predict the dynamic characteristics of a process, optimiza techniques [27,28], multivariable control, which manipulates multiple varia simultaneously, real-time monitoring, feedback control, and constraint management con [10,29].
Among the functions of APC, model predictive control (MPC) refers to a clas algorithms that calculate a series of manipulated variable adjustments to optimize plant's behavior. A block diagram of APC is shown in Figure 2 below. Industrial M controllers typically evaluate the future CV behavior over a finite future time interval ca the prediction horizon [7]. APCs are widely used in industry to automate and optim processes, providing benefits such as energy efficiency, increased productivity, improved quality.

Random Forest
Random Forest (RF) is a representative supervised learning model that comb multiple decision trees in a bagging fashion to overcome the limitations of individ decision trees [30]. RF is a machine learning algorithm that uses multiple decision trees training and can be applied to classification and regression problems [31]. For classifica problems, it outputs the most predicted value of the decision trees, and for regress problems, it outputs the average of the predicted values of the decision trees [32]. In addi to simple regression problems, RF is used to predict time-series data, with example  Random Forest (RF) is a representative supervised learning model that combines multiple decision trees in a bagging fashion to overcome the limitations of individual decision trees [30]. RF is a machine learning algorithm that uses multiple decision trees for training and can be applied to classification and regression problems [31]. For classification problems, it outputs the most predicted value of the decision trees, and for regression problems, it outputs the average of the predicted values of the decision trees [32]. In addition to simple regression problems, RF is used to predict time-series data, with examples in engineering, environmental and geophysical sciences, and finance [33]. RF belongs to the bagging type, which increases the diversity of the tree by growing it from different subsets of training data generated through techniques that train data generation by resampling existing data sets largely at random [29]. Because bagging uses independent random vectors with the same as the input samples to generate the subsets, some data can be used for training only once, while others can be used multiple times. Therefore, greater stability is achieved because it can react more robustly to changes in the data and increase prediction accuracy. RF, on the other hand, uses the best features/bifurcation points within a randomly selected subset of evidence features from the entire set of input evidence features when growing the tree. Thus, this may reduce the strength of any single tree, but it also reduces the correlation between trees, which reduces the generalization error. RF can compute an unbiased estimate of the generalization error without using an external subset of text data. RF also provides an assessment of the relative importance of different evidential features [31]. This aspect is useful for multi-source studies where data dimensionality is very high. It is important to know how each feature affects the predictive model so that the best evidential features can be selected.

Support Vector Regression
SVR solves linear regression problems by mapping data points into a higher dimensional space. SVR works by building a regression model that maximizes the margin between the given data points and a line or curve. SVRs use loss functions that predict continuous values, solve regression problems, and minimize residual errors; SVMs use loss functions that maximize margins to predict discrete class labels, solve classification and regression problems, and build classification models [34,35].

Neural Network
A Neural Network (NN), also known as an Artificial Neural Network (ANN), is an information management model that resembles the functioning of the biological nervous system of the human brain. NNs are structured to operate in the same way that an efficiently functioning human brain performs tasks. They have a structure that mimics the neurons and synaptic relationships between neurons in the human brain. Similar to the process of human learning, NNs require adjustments to the relationships between nodes in the layers that function as neurons in order to learn. NN layers are independent of each other, and any given layer can have an arbitrary number of nodes. An arbitrary number of nodes is called a bias node and a bias node is equivalent to an offset in a linear regression. The main function of a bias node is to provide the nodes with a trainable constant value in addition to the normal input that the network nodes receive [36]. With this mode of operation, NNs have been extensively applied to real-world problems in business, education, economics, and life issue applications. They are effective algorithms for identifying trends in data and patterns, which is what we are trying to identify in this study.

K-Nearest Neighbor
K-Nearest Neighbor (k-NN) is a non-parametric method used for classification and regression and is classified as a lazy learner because it does not store instances during the training phase. Given some features, such as the explanatory variables of a new instance to be classified (regression), k-NN finds the k training instances closest to the new instance based on some distance metric and returns the average explanatory variable, which is the majority class. The basic explanation for using k-NNs for time-series prediction is that since all time-series data contain repetitive patterns, we can find previous patterns that are like the current time-series structure and use subsequent patterns to predict future behavior [37]. Since our goal in this study is to identify patterns in time-series data to predict future patterns, we used k-NN to identify patterns in time-series data.

XGBoost
The XGBoost (XGB) algorithm is based on the Gradient Boosting Framework, which builds tree ensemble models to make predictions. XGB trains each tree sequentially using gradient descent on the training dataset and updates the model in a way that minimizes the residual error. This allows XGB to improve prediction performance and avoid overfitting. It also provides a variety of features to improve the generalization performance of the model and enhance its speed and memory efficiency [38].

Benefits of Machine Learning Algorithms
Machine learning can be used as a powerful tool for continuous quality improvement in large-scale, complex processes such as semiconductor manufacturing. Since manufacturing often involves dealing with high-dimensional data, machine learning algorithms have the advantage of being applicable to such high-dimensional problems. Another advantage of machine learning technology is the ease of use of algorithmic applications due to the variety of programs available. They can be easily applied to many processes, even in petrochemical plants. Furthermore, the classification performance can be increased by adjusting the parameters. The main advantage of machine learning algorithms is that they can uncover new, previously unknown knowledge and identify relationships in a data set. This new information can be used to support engineers' decisions or improve the system [39].

Overall Architecture
In the petrochemical industry, a machine learning-based controlled variable prediction technique for APC systems using time-series data involves six steps, as shown in Figure 3 [40]. The XGBoost (XGB) algorithm is based on the Gradient Boosting Framework, which builds tree ensemble models to make predictions. XGB trains each tree sequentially using gradient descent on the training dataset and updates the model in a way that minimizes the residual error. This allows XGB to improve prediction performance and avoid overfitting. It also provides a variety of features to improve the generalization performance of the model and enhance its speed and memory efficiency [38].

Benefits of Machine Learning Algorithms
Machine learning can be used as a powerful tool for continuous quality improvement in large-scale, complex processes such as semiconductor manufacturing. Since manufacturing often involves dealing with high-dimensional data, machine learning algorithms have the advantage of being applicable to such high-dimensional problems. Another advantage of machine learning technology is the ease of use of algorithmic applications due to the variety of programs available. They can be easily applied to many processes, even in petrochemical plants. Furthermore, the classification performance can be increased by adjusting the parameters. The main advantage of machine learning algorithms is that they can uncover new, previously unknown knowledge and identify relationships in a data set. This new information can be used to support engineers' decisions or improve the system [39].

Overall Architecture
In the petrochemical industry, a machine learning-based controlled variable prediction technique for APC systems using time-series data involves six steps, as shown in Figure 3 [40]. In the first step, we partition the data into two sequences. The data include the controlled variables that the petrochemical plant wants to control with the APC system, the In the first step, we partition the data into two sequences. The data include the controlled variables that the petrochemical plant wants to control with the APC system, Processes 2023, 11, 2091 7 of 23 the uncontrollable disturbance variables, and the controllable regulated variables. One sequence of data is partitioned before the prediction horizon of the control, regulated, and disturbance variables for machine learning purposes. The other sequence is partitioned after the prediction horizon, which is used to evaluate the quality of the fitted model. The data partitioned for machine learning purposes include controlled variables, moderators, and disturbance variables, while the data partitioned for quality assessment include only the controlled variables to be predicted. In the second step, feature engineering is set up on the time-series data. Feature engineering uses mean, standard deviation, skewness, kurtosis, maximum, minimum, and change in mean to generate useful features or variables from the raw data. The features generated through the feature engineering process provide new information to the model that was not present in the original raw data. This enables the model to make more accurate predictions. Additionally, the selected features are used in the feature selection process to extract the necessary information for model training. In the third step, we define the models, transform the data, and train them with machine learning models. To use different regression models such as RF, NN, k-NN, SVR, and XGB, we define a model grid and set up a hyperparameter grid for each machine learning model. After that, we transform the data according to the given time and prediction time and use the data from the previous time to generate the statistical characteristics of each variable. For each variable, we select the important variables and train the model based on the selected variables. The fourth step measures the accuracy by comparing the predicted values to the evaluation sequence. This step uses the R-Squared (R 2 ) and Mean Absolute Percentage Error (MAPE) metrics to compare the hyperparameter combinations. The fifth step is to predict future periods of the time series. It should be monitored against actual values and aims to indicate when the model needs to be updated with new data or re-parameterized because the distribution of newer data is distinct from older data. The sixth step is to update with new data. In petrochemical plants, the turnaround period is when production is stopped and all facilities in the plant are inspected for maintenance, equipment improvement, noise control, safety checks, and so on. During the turnaround period, the performance of the plant may improve, and the time-series predictions of the APC controlled variables may not be correct. Therefore, it is necessary to retrain with new data and find hyper-parameter combinations.

Time-Series Data and Feature Engineering Method
Time-series data is sequential data derived from observations of N variables over a period of time. Examples include stock prices, precipitation, and traffic data. Time-series data forecasting predicts the future behavior of an objective variable based on data obtained from past data observations. Time-series data forecasting is used in a variety of fields and has many applications in finance, transportation, engineering, health, and weather, among others [41,42]. Feature engineering is the process of extracting new features from existing data that are easier for a prediction model to learn. It is an important step in machine learning and data analytics in particular, as it is the process of generating useful features or variables from raw data. Feature engineering involves various tasks such as data cleaning, scaling, outlier handling, and transforming and combining different features.
In this paper, feature engineering is very important in time-series modeling, including time-series prediction, to accurately express the meaning of the time series, and it is necessary to analyze the meaning of the prediction on what basis it was made. Therefore, the following features are generated for each variable, and the top k (k = 10, 20, 30, . . . ) features are selected for modeling based on the R 2 score of the simple regression model. Table 1 shows the features we considered: Removing rows with missing values is necessary to ensure the validity of the data. Rows with missing values cannot be used for training, so removing them refines the data. Additionally, by calculating the moving average and standard deviation, trends and volatility in the data can be incorporated into the model. Skewness and kurtosis can also be calculated to reflect the characteristics of the distribution in the model. By calculating the minimum and maximum values, the range of the data can be incorporated into the model to improve the scaling or normalization efforts. Finally, by calculating the mean difference, time-series characteristics can be incorporated into the model. These feature engineering tasks can improve forecasting performance.

Hyper-Parameter Grid
In this study, we address the problem of predicting the desired controlled variable based on one or more uncontrollable disturbance variables and controllable regulatory variables using various machine learning models, as shown in Equation (1): In Equation (1),ŷ t is the predicted value of the variable controlled at time t (t = 0, 1, 2, · · · ) , where x t and z t are the external disturbance and manipulated variables at time t, respectively. As described above, there can be more than one external disturbance and manipulated variables, so both x t and z t can be vectors. In addition, τ 1 and τ 2 are user-set variables, where larger τ 1 utilizes values from the distant past and larger τ 2 predicts values in the distant future. Let f denote the model that predicts the controlled variable based on the independent variables (disturbance and controlled variables), i.e., f in Equation (1) is the machine learning model. Table 2 shows the machine learning model and hyperparameter grid. Hyperparameters are configuration variables that control and tune the behavior of a model. Many popular machine learning algorithms take a significant amount of time to train on data. At the same time, these same algorithms need to be configured before training. Most machine learning algorithm implementations have a set of configuration variables that can be set by the user, which have various effects on how training is performed. Often, there is no optimal configuration for every problem domain, so the best configuration depends on the specific application. These configuration variables are called hyper-parameters [43] such as the maximum depth of the RF model or the number of neighbors in k-NN. Since this work involves forecasting with machine learning in petrochemical plants, we do not aim to optimize hyperparameters. It tries all user-definable hyper-parameter combinations, trains a model for each combination, evaluates it, and produces the optimal combination. Trying all the combinations is time-consuming and computationally expensive. We record all of the combinations and try to explore the hyper-parameter grid with the evaluation metrics MAPE and R 2 . In Table 2, in the hyper-parameter grid, hyper-parameters marked with curly braces mean that each value in the brackets is compared. For example, the neural network compares the following four hyperparameter combinations. The size of a tuple representing a hidden structure is the number of hidden layers, and the elements of the tuple are the number of nodes in that hidden layer. A hidden layer is used in artificial neural networks and refers to the intermediate layer between the input and output of the model. This intermediate layer transforms the input data nonlinearly and passes it on to the next layer. The parameters of a hidden layer refer to the number of nodes in each layer. For example, (10, 10, 10) consists of three hidden layers, each with 10 nodes. In this case, the input data are passed through the 10 nodes of the first hidden layer, then through the 10 nodes of the next hidden layer, and then through the 10 nodes of the last hidden layer before being passed to the output layer. The number of nodes in each layer is used to control the complexity and expressiveness of the model. Hidden layers with more nodes can allow the model to learn more complex functions, but there is a possibility of overfitting, so it is important to choose the right number of nodes. The kernel in SVR refers to the function used to map the data into a higher-dimensional space, while the penalty factor represents the level of regularization to avoid misclassification. In K-NN, k represents the number of neighboring points considered for regression. For RF, the decision tree max depth indicates the maximum depth of the decision tree, and the percentage of features when branching represents the number of features used for each split. In XGBoost, the learning rate determines how much influence the previous tree has.

Discussion
By evaluating the models with various combinations of user-definable hyper-parameters using MAPE and R 2 evaluation metrics and selecting the best performing model, this study shows a more efficient approach than the existing process of learning optimal hyperparameters through machine learning. In addition, instead of using just one type of machine learning model, we use five machine learning models and compare the hyper-parameters of each model to show the applicability of machine learning models in the field of APC control in petrochemical plants.
However, among the various machine learning models, there is a disadvantage that there is a lack of academic evidence using the models used in this study. Although each model presents its advantages in the case of using time-series data, the reasons for adopting the machine learning model are poor. In addition, user-definable hyper-parameters were set in consideration of the learning time and general (empirical) cases of the models, but among the combinations of hyper-parameters that can be set, there is no evidence of using the combination used in this study, so it is unfortunate that there may be a hyper-parameter grid that shows better performance.
Currently, there is a lot of research on time-series data prediction in various industries such as manufacturing, energy, and finance. This study focuses on predicting future controlled variables in the petrochemical industry by learning control variables, external disturbance variables, and manipulated variables. The results of the research should not only present predictive results, but also be applicable to real APC systems. As mentioned earlier, APC systems utilize mathematical models to predict the dynamic characteristics of the process. Improving the prediction performance by combining machine learning prediction models with mathematical prediction models could lead to improved control performance.

Experiment Environment
The hardware platforms used to ensure reproducibility of the experiments are shown in Table 3: We used Google Colab for our development environment. Google Colab provides a free Jupyter Notebook environment and is available in the cloud without installation. We used Google Colab to conduct this study and to process the large amount of data from Plant P.

Experiment #1 (Plant P)
The data were collected from a real petrochemical process. We will refer to the Experiment #1 data as Plant P. The data used in Experiment #1 consist of one controlled variable, y, four disturbance variables, x, and one moderator variable, z, with 371,528 rows. For all of the independent variables, a graph visualizing their variability along with the controlled variable is shown in Figure 4. The red lines represent the controlled variables and the blue lines represent the independent variables.    Figure 4 shows the time series plots of the control and moderator variables. Figures 5-8 show the time series plots of the controlled variables and disturbance variables. As shown in the time series plots of the controlled variables and disturbance variables, the scale of each of the control, moderator, and disturbance variables varies significantly. As a result, they were min-max normalized so that the minimum value was 0 and the maximum value was 1. This data preprocessing adjusts the scale without changing the distribution, so the relative relationship between variables is preserved and it is easy to compare or analyze various variables or features. In addition, we used the first 300,000 rows of data as training data and the last 71,528 rows of data as evaluation data. Finally, τ 1 is set to 30 and τ 2 is set to 5.

Experiment #2 (Plant N)
The data was collected from a real petrochemical process and the Experiment #2 data will be referred to as Plant N. The data used in this chapter consists of one controlled variable, , one external disturbance variable, , and one regulatory variable, , with 1441 rows. Figures 9 and 10 are graphs that visualize the variability for all independent variables, along with the controlled variables, where the red lines are the controlled variables, and the blue lines are the independent variables.   Figure 9 shows the time series plots of the control and moderator variables, and Figure 10 shows the time series plots of the control and disturbance variables. The experiment was conducted in the same way as Experiment #1, with the first 1000 rows of data as training data and the last 441 rows of data as evaluation data, with set to 10 and set to 5.

Experiment #2 (Plant N)
The data was collected from a real petrochemical process and the Experiment #2 data will be referred to as Plant N. The data used in this chapter consists of one controlled variable, y, one external disturbance variable, x, and one regulatory variable, z, with 1441 rows.

Experiment #2 (Plant N)
The data was collected from a real petrochemical process and the Experiment #2 data will be referred to as Plant N. The data used in this chapter consists of one controlled variable, , one external disturbance variable, , and one regulatory variable, , with 1441 rows. Figures 9 and 10 are graphs that visualize the variability for all independent variables, along with the controlled variables, where the red lines are the controlled variables, and the blue lines are the independent variables.   Figure 9 shows the time series plots of the control and moderator variables, and Figure 10 shows the time series plots of the control and disturbance variables. The experiment was conducted in the same way as Experiment #1, with the first 1000 rows of data as training data and the last 441 rows of data as evaluation data, with set to 10 and set to 5.

Experiment #2 (Plant N)
The data was collected from a real petrochemical process and the Experiment #2 data will be referred to as Plant N. The data used in this chapter consists of one controlled variable, , one external disturbance variable, , and one regulatory variable, , with 1441 rows. Figures 9 and 10 are graphs that visualize the variability for all independent variables, along with the controlled variables, where the red lines are the controlled variables, and the blue lines are the independent variables.   Figure 9 shows the time series plots of the control and moderator variables, and Figure 10 shows the time series plots of the control and disturbance variables. The experiment was conducted in the same way as Experiment #1, with the first 1000 rows of data as training data and the last 441 rows of data as evaluation data, with set to 10 and set to 5.  Figure 9 shows the time series plots of the control and moderator variables, and Figure 10 shows the time series plots of the control and disturbance variables. The experiment was conducted in the same way as Experiment #1, with the first 1000 rows of data as training data and the last 441 rows of data as evaluation data, with τ 1 set to 10 and τ 1 set to 5.

Experiment #3 (Plant S)
The data were collected from a real petrochemical process, and the Experiment #3 data are referred to as Plant S. The data used in this chapter consist of one controlled variable, y, one external disturbance variable, x, and one moderator variable, z, with 10,081 rows. For all independent variables, we can visualize the variability along with the controlled variable as follows. The red lines are the controlled variables, and the blue lines are the independent variables. Figure 11 shows the time series plots of the control and moderator variables, and Figure 12 shows the time series plots of the control and disturbance variables. As shown in Figures 11 and 12, the scales of each of the control, moderator, and disturbance variables are significantly different. Therefore, min-max normalization was performed so that the minimum value is 0 and the maximum value is 1. This data preprocessing adjusts the scale without changing the distribution, so it is easy to compare or analyze various variables or features while maintaining the relative relationships between variables. In addition, we used the first 7000 rows of data as training data and the last 3081 rows of data as evaluation data with set τ 1 to 10 and τ 2 to 5.

Experiment #3 (Plant S)
The data were collected from a real petrochemical process, and the Experiment #3 data are referred to as Plant S. The data used in this chapter consist of one controlled variable, , one external disturbance variable, , and one moderator variable, , with 10,081 rows. For all independent variables, we can visualize the variability along with the controlled variable as follows. The red lines are the controlled variables, and the blue lines are the independent variables. Figure 11 shows the time series plots of the control and moderator variables, and Figure  12 shows the time series plots of the control and disturbance variables. As shown in Figures  11 and 12, the scales of each of the control, moderator, and disturbance variables are significantly different. Therefore, min-max normalization was performed so that the minimum value is 0 and the maximum value is 1. This data preprocessing adjusts the scale without changing the distribution, so it is easy to compare or analyze various variables or features while maintaining the relative relationships between variables. In addition, we used the first 7000 rows of data as training data and the last 3081 rows of data as evaluation data with set to 10 and to 5.

Mean Absolute Percentage Error (MAPE)
The mean absolute percentage error is a metric used to evaluate the accuracy of a prediction model or forecast, measuring the average percentage deviation between the predicted and actual values. It is commonly used in applications including time-series analysis, economics, and operations research [44].
• represents the number of observations or data points.

Experiment #3 (Plant S)
The data were collected from a real petrochemical process, and the Experiment #3 data are referred to as Plant S. The data used in this chapter consist of one controlled variable, , one external disturbance variable, , and one moderator variable, , with 10,081 rows. For all independent variables, we can visualize the variability along with the controlled variable as follows. The red lines are the controlled variables, and the blue lines are the independent variables. Figure 11 shows the time series plots of the control and moderator variables, and Figure  12 shows the time series plots of the control and disturbance variables. As shown in Figures  11 and 12, the scales of each of the control, moderator, and disturbance variables are significantly different. Therefore, min-max normalization was performed so that the minimum value is 0 and the maximum value is 1. This data preprocessing adjusts the scale without changing the distribution, so it is easy to compare or analyze various variables or features while maintaining the relative relationships between variables. In addition, we used the first 7000 rows of data as training data and the last 3081 rows of data as evaluation data with set to 10 and to 5.

Mean Absolute Percentage Error (MAPE)
The mean absolute percentage error is a metric used to evaluate the accuracy of a prediction model or forecast, measuring the average percentage deviation between the predicted and actual values. It is commonly used in applications including time-series analysis, economics, and operations research [44].
• represents the number of observations or data points. The mean absolute percentage error is a metric used to evaluate the accuracy of a prediction model or forecast, measuring the average percentage deviation between the predicted and actual values. It is commonly used in applications including time-series analysis, economics, and operations research [44].
• n represents the number of observations or data points.
• e j is the actual value of the variable you want to predict. • A j is the variable's predicted value.
MAPE is expressed as a percentage, as shown in Equation (2), and represents the average size of the error between the predicted and actual values. A low MAPE indicates a more accurate prediction, while a high MAPE indicates a large difference between the predicted and actual values. MAPE is undefined when the actual value is zero. Because MAPE is an absolute percentage calculation, it gives more importance to large errors. Therefore, it is recommended to use MAPE in combination with other evaluation metrics to comprehensively evaluate the performance of a prediction model and this paper uses R 2 in combination with it.

R-Squared
In regression analysis, the explanatory power of an independent variable is measured to explain the variability of a dependent variable. In other words, the coefficient of determination is a measure of the percentage of variability in a variable predicted by a model. A higher value indicates that the model better explains the variability of the dependent variable [45].
In Equation (3), y i is the actual value, y i is the predicted value, and y i is the average of the actual values. R 2 typically has a value between 0 and 1 and can be interpreted as follows. An R 2 of 1 means that the model perfectly explains the variability of the dependent variable. An R 2 of 0 means that the model does not explain the variability of the dependent variable at all. When R 2 is between 0 and 1, it indicates that the model explains some of the variability in the dependent variable. Higher values indicate better model explanatory power. It is important to consider the number of independent variables because R 2 tends to increase automatically as the number of independent variables increases.

Experiment #1
A partial leaderboard of the hyper-parameter tuning results, sorted by the R 2 score in descending order, is shown in Table 4:   Table 4 shows that RF performs well based on the R 2 score and it can be said that it effectively predicts the controlled variables in that the R 2 scores of the first through tenth-ranked models are above 0.999 and the MAPE values are below 0.245%. In addition, although the preprocessing of the data generated about 180 feature variables, the number of feature variables (k) of the top 10 best performing models tended to be in the range of about 10 to 30. The selected features, in order, are as follows.
From the above, we can see that the features related to the controlled variables helped to predict the controlled variables.
To determine the impact of each feature on APC's prediction of the controlled variables, we measured the gain of the 'Feature Importance' attribute provided by the RF model. In Figure 13, the sum of the importance of each feature is 1:  Table 4 shows that RF performs well based on the R 2 score and it can be said that it effectively predicts the controlled variables in that the R 2 scores of the first through tenthranked models are above 0.999 and the MAPE values are below 0.245%. In addition, although the preprocessing of the data generated about 180 feature variables, the number of feature variables (k) of the top 10 best performing models tended to be in the range of about 10 to 30. The selected features, in order, are as follows. From the above, we can see that the features related to the controlled variables helped to predict the controlled variables.
To determine the impact of each feature on APC's prediction of the controlled variables, we measured the gain of the 'Feature Importance' attribute provided by the RF model. In Figure 13, the sum of the importance of each feature is 1: Figure 13. Plant P feature importance plot.
In Figure 13, the variables that had the most impact on CV were CV_min and CV_mean. It can be concluded that these values were largely responsible for the prediction of the CV time series at the next time point. Aside from the controlled variables itself, the variable that had the greatest impact on the prediction of CV was DV03, which suggests that the previous value of DV03 influenced the current value of CV in a staggered manner. Figure 14 shows the average R 2 for each model.
In Figure 14, XGB, RF, and NN perform well with R 2 scores close to 1.0. Considering the R 2 score, we can see that all the models are quite effective in predicting Plant P. This is because the data of Plant P consists of many independent variables and the long time-series data can be trained to explain the controlled variables sufficiently. Figure 15 is a graph depicting the results of predicting the controlled variables based on the best performing models.
In Figure 15, we can see that the predicted values are not significantly different from the actual values. Plant P has the most data compared to the other plants, so it is the best In Figure 13, the variables that had the most impact on CV were CV_min and CV_mean. It can be concluded that these values were largely responsible for the prediction of the CV time series at the next time point. Aside from the controlled variables itself, the variable that had the greatest impact on the prediction of CV was DV03, which suggests that the previous value of DV03 influenced the current value of CV in a staggered manner. Figure 14 shows the average R 2 for each model.
In Figure 14, XGB, RF, and NN perform well with R 2 scores close to 1.0. Considering the R 2 score, we can see that all the models are quite effective in predicting Plant P. This is because the data of Plant P consists of many independent variables and the long time-series data can be trained to explain the controlled variables sufficiently. Figure 15 is a graph depicting the results of predicting the controlled variables based on the best performing models. trained and has an R 2 score of 0.99949, which is close to 1, which means it perfectly explains the variability of the dependent variable. The MAPE score is 0.228, which means that the average size of the error between the predicted value and the actual value is small, and it makes accurate predictions. The highest number of disturbance variables is presumably the reason for the best prediction.   Table 5 shows some of the hyper-parameter tuning results, sorted in descending order based on the R 2 score:  the variability of the dependent variable. The MAPE score is 0.228, which means that the average size of the error between the predicted value and the actual value is small, and it makes accurate predictions. The highest number of disturbance variables is presumably the reason for the best prediction.   Table 5 shows some of the hyper-parameter tuning results, sorted in descending order based on the R 2 score:  In Figure 15, we can see that the predicted values are not significantly different from the actual values. Plant P has the most data compared to the other plants, so it is the best trained and has an R 2 score of 0.99949, which is close to 1, which means it perfectly explains the variability of the dependent variable. The MAPE score is 0.228, which means that the average size of the error between the predicted value and the actual value is small, and it makes accurate predictions. The highest number of disturbance variables is presumably the reason for the best prediction. Table 5 shows some of the hyper-parameter tuning results, sorted in descending order based on the R 2 score:   Table 5 shows that RF outperforms the other models based on its R 2 score, but it does not appear to be significantly different from XGB's R 2 score, which is shown in the top 2 through 9 results. The highest performing model is the RF model, but the overall model performance (average) is the highest for XGB. In Table 5, the top 10 models all have an R 2 score of 0.997 or higher and a MAPE value of 0.195% or less, indicating that they effectively predict the controlled variables. From the above, we can see that the features related to the controlled variable helped predict the controlled variable.

Experiment #2
To determine the impact of each feature on APC's prediction of the controlled variable, we measured the gain of the 'Feature Importance' attribute provided by the RF model. In Figure 16, the sum of the importance of each feature is 1:  Table 5 shows that RF outperforms the other models based on its R 2 score, but it does not appear to be significantly different from XGB's R 2 score, which is shown in the top 2 through 9 results. The highest performing model is the RF model, but the overall model performance (average) is the highest for XGB. In Table 5, the top 10 models all have an R 2 score of 0.997 or higher and a MAPE value of 0.195% or less, indicating that they effectively predict the controlled variables. To determine the impact of each feature on APC's prediction of the controlled variable, we measured the gain of the 'Feature Importance' attribute provided by the RF model. In Figure 16, the sum of the importance of each feature is 1: In Figure 16, CV_min and CV_mean are the variables that have the most influence on CV, and the values of these variables are used to predict the CV time series at the next time point. In addition to the variables related to CV, we can see that DV has the largest impact. It is determined that the previous value of DV has affected the current value of CV with a time lag. Figure 17 shows the average R 2 score, sorted by model:  In Figure 16, CV_min and CV_mean are the variables that have the most influence on CV, and the values of these variables are used to predict the CV time series at the next time point. In addition to the variables related to CV, we can see that DV has the largest impact. It is determined that the previous value of DV has affected the current value of CV with a time lag. Figure 17 shows the average R 2 score, sorted by model: Figure 17. Average R 2 score per model in Plant N.
In Figure 17, the RF and XGB models have noticeably higher R 2 scores. Consistent with the previous results, we see that the XGB model has the highest R 2 score on average. Figure 18 plots the results of predicting the controlled variables based on the best performing models: In Figure 18, we can see that Plant N has a high prediction accuracy, just like Plant P. The RF model, which is selected as the best model, has an R 2 score of 0.9972 and a MAPE of 0.1941%, which means that the model can fully explain the variability of the dependent variable.

Experiment #3
Some of the hyper-parameter tuning results, sorted by R 2 score in descending order, are shown in Table 6. In Figure 17, the RF and XGB models have noticeably higher R 2 scores. Consistent with the previous results, we see that the XGB model has the highest R 2 score on average. Figure 18 plots the results of predicting the controlled variables based on the best performing models: In Figure 16, CV_min and CV_mean are the variables that have the most influence on CV, and the values of these variables are used to predict the CV time series at the next time point. In addition to the variables related to CV, we can see that DV has the largest impact. It is determined that the previous value of DV has affected the current value of CV with a time lag. Figure 17 shows the average R 2 score, sorted by model: Figure 17. Average R 2 score per model in Plant N.
In Figure 17, the RF and XGB models have noticeably higher R 2 scores. Consistent with the previous results, we see that the XGB model has the highest R 2 score on average. Figure 18 plots the results of predicting the controlled variables based on the best performing models: In Figure 18, we can see that Plant N has a high prediction accuracy, just like Plant P. The RF model, which is selected as the best model, has an R 2 score of 0.9972 and a MAPE of 0.1941%, which means that the model can fully explain the variability of the dependent variable.

Experiment #3
Some of the hyper-parameter tuning results, sorted by R 2 score in descending order, are shown in Table 6. In Figure 18, we can see that Plant N has a high prediction accuracy, just like Plant P. The RF model, which is selected as the best model, has an R 2 score of 0.9972 and a MAPE of 0.1941%, which means that the model can fully explain the variability of the dependent variable.

Experiment #3
Some of the hyper-parameter tuning results, sorted by R 2 score in descending order, are shown in Table 6.  Table 6 shows that the NN, SVR, and XGB models perform well based on their R 2 scores. As in Experiment #1, the NN model performs the best and the XGB model is stable regardless of the parameters.
When the model performance is listed in terms of R 2 , all the models in Table 6 have an R 2 score of 0.959 or higher and a MAPE of 9.76% or less, indicating that they effectively predict the controlled variables. The selected features, in order, are as follows.
From the above, we can see that the features related to the CV helped predict the CV. To determine the impact of each feature on APC's prediction of CV, we measured the gain of the 'Feature Importance' attribute provided by the RF model. In Figure 19, the sum of the importance of each feature is 1: In Figure 19, the variables that had the most impact on CV were CV_mean and CV_max, and these values were largely responsible for predicting the CV time-series at the next point in time. In Figure 19, the variables that had the most impact on CV were CV_mean and CV_max, and these values were largely responsible for predicting the CV time-series at the next point in time. Figure 20 shows the average R 2 score per model: Figure 19. Plant S feature importance plot.
In Figure 19, the variables that had the most impact on CV were CV_mean and CV_max, and these values were largely responsible for predicting the CV time-series at the next point in time. Figure 20 shows the average R 2 score per model: In Figure 20, XGB has the highest R 2 score, followed by the RF model and then the NN model. Figure 21 shows the prediction of the controlled variables based on the best performing model: In Figure 21, we can see that Plant S has a larger MAPE value than Plant P and Plant In Figure 20, XGB has the highest R 2 score, followed by the RF model and then the NN model. Figure 21 shows the prediction of the controlled variables based on the best performing model: Figure 19. Plant S feature importance plot.
In Figure 19, the variables that had the most impact on CV were CV_mean and CV_max, and these values were largely responsible for predicting the CV time-series at the next point in time. Figure 20 shows the average R 2 score per model: In Figure 20, XGB has the highest R 2 score, followed by the RF model and then the NN model. Figure 21 shows the prediction of the controlled variables based on the best performing model: In Figure 21, we can see that Plant S has a larger MAPE value than Plant P and Plant In Figure 21, we can see that Plant S has a larger MAPE value than Plant P and Plant N, so the predicted data do not follow the actual data in some bins, but the overall trend is well followed. The model is especially significant in that it predicts large fluctuations in the 2500-3000 range with high accuracy. The MAPE values of the top 10 models are between 6.0% and 9.0%, which is high compared to the other plants, but the explanatory power of the models can be judged to be sufficiently significant given the large fluctuations in controlled variables in the second half of the test data and the high R 2 scores of the top models.

Conclusions
This study demonstrates the feasibility of using machine learning models to predict control variables for the adoption of APC systems in the petrochemical industry. In an APC system, the prediction of control variables and how the control variable values are modified to be within the user's desired range are of utmost importance. This study mainly focused on the prediction of control variables and utilized various machine learning techniques to predict future control variables based on past control variables, manipulated variables, and external disturbance variables. By using time delay as a parameter and adjusting its value, we were able to analyze the relationship between past and future data and improve the prediction performance. Currently, APC systems are controlled through mathematical modeling, and research is trending toward predicting time series data of control variables, manipulation variables, and external disturbance variables with machine learning models to compare performance and measure accuracy. It is becoming increasingly important to change from mathematical prediction models to data-driven machine learning predictions. The results of R-squared (R 2 ) and Mean Absolute Percentage Error (MAPE) metrics in this study demonstrate the feasibility of introducing machine learning models into APC systems in petrochemical plants.
This paper presents two directions for future research. First, to further improve the performance of APC's controlled variables prediction, the performance should be evaluated against machine learning models. It should be investigated whether other ensemble methods such as LightGBM or machine learning algorithms can be applied to further improve the controlled variables prediction performance. Second, research should be conducted to determine the optimal controlled variables values. In the aforementioned APC system, the most important aspect is the optimal MV for CV prediction and CV control. We propose to determine the optimal MV through machine learning. Additionally, upon completion of the research, it is necessary to develop a linear dynamic model and compare it with real data to enhance its completeness. To this end, this paper is the first step in exploring the possibility of predicting and controlling controlled variables of APC systems in petrochemical plants through machine learning, which opens new possibilities for improving the efficiency and performance of automatic control in the petrochemical industry.

Data Availability Statement:
The data used to support the findings of this study will be provided by the corresponding author upon request (jpjeong@skku.edu).