Temporal Feature Selection for Multi-Step Ahead Reheater Temperature Prediction

: Accurately predicting the reheater steam temperature over both short and medium time periods is crucial for the e ﬃ ciency and safety of operations. With regard to the diverse temporal e ﬀ ects of inﬂuential factors, the accurate identiﬁcation of delay orders allows e ﬀ ective temperature predictions for the reheater system. In this paper, a deep neural network (DNN) and a genetic algorithm (GA)-based optimal multi-step temporal feature selection model for reheater temperature is proposed. In the proposed model, DNN is used to establish a steam temperature predictor for future time steps, and GA is used to ﬁnd the optimal delay orders, while fully considering the balance between modeling accuracy and computational complexity. The experimental results for two ultra-super-critical 1000 MW power plants show that the optimal delay orders calculated using this method achieve high forecasting accuracy and low computational overhead. Moreover, it is argued that the similarities of the two reheater experiments reﬂect the common physical properties of di ﬀ erent reheaters, so the proposed algorithms could be generalized to guide temporal feature selection for other reheaters.


Introduction
Steam reheating plays an important role in power plants. It can increase thermal efficiency by 2% and it can also reduce steam humidity and improve the safety of the final stage's blade [1,2]. However, due to the complexity of the many influential factors, it is difficult to maintain the reheat steam temperature within a certain range [3]. For instance, the reheater steam temperature of two ultra-super-critical 1000 MW units investigated in this paper may fluctuate between 565 • C and 610 • C, while the normal reheater outlet steam temperature is 603 • C with tolerable fluctuation within the range of 503 to 608 • C [4] (the specific threshold may vary with the type of reheater). A temperature that is too high will cause damage to the metal material, while a temperature that is too low will reduce the thermal cycle efficiency [5]. Therefore, finding features that affect the modeling target and analyzing the extent of these features are crucial for the system's safety and efficiency.
A reheater system is a typical nonlinear hysteresis thermal system, which is highly coupled, complex, and impacted by many factors [6,7]. The selection of the most related features from a large variety of sensors is important for the realization of effective control [8]. Traditional feature selections are normally developed on the basis of mass balance, energy balance, and dynamic principles, which rely greatly on human expertise and normally require a long modeling time [9][10][11]. Recently, researchers have increasingly adopted the data-driven methodology that extracts features directly from huge amounts of accumulated process data [12][13][14]. Li et al. [15] analyzed operation parameters presents experiments and discussions. Discussions and possible directions for future work are provided in the final section.

Description of Reheater System
A reheater is a set of tubes located in a boiler, the main purpose of which is to avoid excess moisture in steam at the end of expansion to protect the turbine. The exhaust steam from the highpressure turbines passes through these heated tubes to collect more energy before driving the intermediate-and then low-pressure turbines. The conceptual structure of the reheater unit is shown in Figure 1. After the high-pressured turbine, the exhaust pressure and temperature at the inlet of the reheater are about 35-37 kg/cm 2 and 345-355 °C, respectively. A reheater is designed in the shape of a serpentine tube in order to increase the heated area. The hot smoke generated by the combustion of coal transfers heat to the reheater, meaning that the temperature of steam in the reheater rises. The steam temperature at the outlet of the reheater is kept around 603 °C. Reheater steam with hightemperature and high-pressure characteristics is collected into the high-temperature reheat steam container. A similar process is performed again in the low-pressure cylinder. Table 1 denotes the influential features of our modeling target, which is the outlet steam temperature of the reheater. Many features affect the reheat steam temperature, such as the inlet steam temperature, inlet gas temperature, smoke baffle opening, etc. Also, these variables have different inertias toward the reheater outlet steam temperature. Therefore, these variables and their hysteresis times should be considered in the prediction model. Here, the previous values of the steam outlet temperature are also used in the modeling process and the multi-step steam temperatures are used as the outputs of the model. In order to simplify our discussion, the major factors are referred to by the notations shown in Table 1.  After the high-pressured turbine, the exhaust pressure and temperature at the inlet of the reheater are about 35-37 kg/cm 2 and 345-355 • C, respectively. A reheater is designed in the shape of a serpentine tube in order to increase the heated area. The hot smoke generated by the combustion of coal transfers heat to the reheater, meaning that the temperature of steam in the reheater rises. The steam temperature at the outlet of the reheater is kept around 603 • C. Reheater steam with high-temperature and high-pressure characteristics is collected into the high-temperature reheat steam container. A similar process is performed again in the low-pressure cylinder. Table 1 denotes the influential features of our modeling target, which is the outlet steam temperature of the reheater. Many features affect the reheat steam temperature, such as the inlet steam temperature, inlet gas temperature, smoke baffle opening, etc. Also, these variables have different inertias toward the reheater outlet steam temperature. Therefore, these variables and their hysteresis times should be considered in the prediction model. Here, the previous values of the steam outlet temperature are also used in the modeling process and the multi-step steam temperatures are used as the outputs of the model. In order to simplify our discussion, the major factors are referred to by the notations shown in Table 1.

Problem Statement
One of the major control concerns of a reheater is the stability of steam o . In respect to the reheater, some features are reheater-uncontrollable, e.g., smoke temperature and pressure. These features might influence the reheater wall temperature and then change the outlet steam temperature. Steam o has the characteristics of being non-linear and having a large inertia. Due to the change in operation conditions, it may deviate from the expected range. The normal operation changes the smoke flow toward the reheater by adjusting the smoke baffle opening degree. This operation exhibits a long delay before it imposes impacts on temperature. Another method is to spray the desuperheated water to the reheater steam. This method promptly lowers steam temperature, but also reduces the boiler's efficiency. Considering the economic benefits, the first method is always used. The second method is employed only in an emergency, such as when the steam temperature is too high or the working condition is changing.
Similar to the control variables mentioned above, other features also have impacts characterized by different inertias toward the steam temperature. One major concern is the complexity of accurately determining the impact inertia of different features, which highly depends on the physical nature laws of the reheater as well as the operational conditions of the reheater, e.g., combustion stability. One natural choice is to use long delay orders to compose the model inputs. However, the indiscriminate delay order settings make the feature dimension very high and introduce considerable overheads for both storage and computation. Thus, it is important to select the most cost-effective delay order for features while keeping the system model accurate enough.

Multi-Step Prediction
In order to predict the temperature trend of steam o , the nonlinear autoregressive exogenous model is presented. Differing from other approaches, the proposed model predicts values not for any given time, but for a set of future moments. Since the reheater system displays different hysteresis characteristics toward different features, modeling the steam o with both short and long hysteresis parameters is important. A multi-step steam o prediction model, which generates a serial of predictions for the next n + 1 time steps, is given in Equation (1).
where t is the current time, t + n is the n-th future moment, x k is the k-th independent variable, y is a dependent variable, τ k represents the time delay order corresponding to x k , and τ y is the time delay order of dependent variable y.

Optimization Function
The prediction target increases the forecast performance for the next n + 1 time steps by selecting the most appropriate delay order. However, the total number of delay orders is proportional to the computational complexity and opposite to the model accuracy. Thus, the optimization goal defined is to strike a balance between the computational complexity and modeling accuracy. Accordingly, the objective function is used to minimize the total number of delay orders to minimize the computational complexity. Furthermore, the total number of delay orders is kept as high as possible but within a certain range in order to keep the prediction error low enough. Let ε be the maximum acceptable prediction error for the modeling target; thus, another optimization goal is transferred as one constraint, i.e., the prediction error is smaller than or equivalent to ε. Thus, a constrained optimization problem is formulated as Equation (2).
where K is the total delay orders of inputs, m is the total of test data, n is the n-th future moment, τ k is the delay order of x k , and τ y is the delay order of the dependent variable. J is the total of delay orders. e is the error in total m samples and n + 1 prediction numbers in the form of mean absolute error (MAE). e l is the error generated by the l-th iteration. C is the max delay order. is the upper limit of MAE.Ŷ is the prediction value vector andŶ = [ŷ(t),ŷ(t + 1), . . . ,ŷ(t + n)] T , Y is the actual value vector, and Y = [y(t), y(t + 1), . . . , y(t + n)] T ;Ŷ and Y have m samples.

Delay Order Selection
In order to accurately select the temporal features, two parts-i.e., the DNN-based prediction model and the GA-based optimal feature selection algorithm-are designed. First of all, the GA generates the individuals of different delay order combinations, which are used as the inputs to the DNN. Then, the DNN outputs the multi-step predictions, which are evaluated by the test sets.
The evaluated values are employed as fitness values, which are used in the GA.

Delay Order Optimization
Delay order optimization is performed by the GA algorithm. The schema of GA is shown in Figure 2. The algorithm starts from an initial population with 20 individuals and each individual has 28 genes. These randomly generated genes are divided into seven sections. Each section represents an input parameter and has 4 binary numbers which can delay the order range from 0 to 15. Then, the individuals are evaluated by the fitness function, which returns two fitness values (MAE and the total of orders). The different fitness values are assigned different fitness scores. The smaller the MAE value, the higher the fitness scores. In a case in which the MAE values are very close (the difference is below a certain threshold), the smaller the total number of delay orders, the higher the fitness scores. The fitness score determines the probability of being selected as a parent. The probability of being selected is according to the roulette wheel selection, shown in Equation (3).
where N is the number of individuals in the population, f i is the fitness of individual i in the population, and p i is the probability of individual i being selected in the population. Once the parents are selected, they have a certain probability ( ) of being mated randomly and generating new individuals. If the parents are not mated, they become new individuals in the new population. Then, the new population has a certain probability ( ) of deciding whether the individual is mutated. Mutating changes (0 changes to 1, or 1 to 0) randomly. The new individuals are evaluated, selected, mated, and mutated until the number of cycles is reached. At the end of the cycle, the GA obtains the best individuals [26,27].

Prediction Model
DNN is used to fit the correlation between the future steam and the historical reheater inlet variables with the accumulated data sets. Figure 3 is the structure of the steam trend prediction model. Let = + + ⋯ + be the total of input dimensions to DNN. The outputs of DNN are n + 1 values of steam . DNN has one input layer, two hidden layers, one output layer, and a large number of neurons. The hypothesis function is shown in Equation (4).
where X is a vector with m dimensions and Θ , Θ , and Θ are the weight matrixes between four layers, respectively. g(•) is the activation function.
The cost function of DNN is shown in Equation (5).
where m is the total number of samples, n is the total number of output variables, is the number of neurons in the k-th layer, and ℎ is the prediction value in the i-th sample and the j-th predict value.
is the prediction value in the i-th sample and the j-th actual value, λ is the regularization parameter, and L2 is the regularization term to limit over-fitting. The goal of the DNN is to minimize Equation (5)   Once the parents are selected, they have a certain probability (p c ) of being mated randomly and generating new individuals. If the parents are not mated, they become new individuals in the new population. Then, the new population has a certain probability (p m ) of deciding whether the individual is mutated. Mutating changes (0 changes to 1, or 1 to 0) randomly. The new individuals are evaluated, selected, mated, and mutated until the number of cycles is reached. At the end of the cycle, the GA obtains the best individuals [26,27].

Prediction Model
DNN is used to fit the correlation between the future steam o and the historical reheater inlet variables with the accumulated data sets. Figure 3 is the structure of the steam o trend prediction model. Let m = τ 1 + τ 2 + . . . + τ k be the total of input dimensions to DNN. The outputs of DNN are n + 1 values of steam o . DNN has one input layer, two hidden layers, one output layer, and a large number of neurons. The hypothesis function is shown in Equation (4).
where X is a vector with m dimensions and Θ 1 , Θ 2 , and Θ 3 are the weight matrixes between four layers, respectively. g(•) is the activation function. The cost function of DNN is shown in Equation (5).
where m is the total number of samples, n is the total number of output variables, l k is the number of neurons in the k-th layer, and h(X i j ) is the prediction value in the i-th sample and the j-th predict value. Y i j is the prediction value in the i-th sample and the j-th actual value, λ is the regularization parameter, and L2 is the regularization term to limit over-fitting. The goal of the DNN is to minimize Equation (5) with the given sets of features and training samples.

Experiments and Discussion
The data for modeling are collected every 3 s from unit 3 and unit 4 by the distributed control system (DCS). Unit 3 and unit 4 are two ultra-super-critical 1000 MW power plants with the same structure. In our experiment, in total, 7,084,800 records are used for evaluation, in which unit 3 and unit 4, respectively, have 3,542,400 records from 1 May 2016 to 31 August 2016.

Data Preprocessing
In the data preprocessing process, two steps are taken: Outlier removal and standardization.
Outlier removal: The outliers that violate the physical or technical limitations might affect the model's performance and should be removed before modeling. (1) The points out of the normal range of physical or technical are replaced with the average of adjacent points. For instance, for a certain period, the temperature of steam should be around 600 °C; thus, the points below 594 °C that violate the steady change characteristics of temperature should be replaced. (2) The errors of D control time should be modified. Under normal circumstances, the D control time (more than 0) takes a few minutes. For instance, if the collected data shows that the control time lasts for several hours, the abnormal control time will be modified to a maximum of 3 min.
Standardization: The different features might have different range of values. If these variables are used directly, the feature data with small values may be ignored, while the ones with large dimensions will be selected. Therefore, the Z-score standardization technique [28] is used to scale the data to the ones with a mean value of 0 and a standard deviation of 1, which will speed up the iteration rate of the optimization and convergence.

Experiment Settings
The parameters of DNN and GA are shown in Table 2. The DNN is a 2-hidden-layer neural network,and the learning rate is set to 0.001. MAE, which is the average absolute differences between predictions and actual observations, is used to evaluate the modeling error. Tanh is chosen as the activation function since it achieves the smallest average MAE compared to other activation functions (e.g., identity, logistic, relu) for the chosen data set.
The 4-month data for unit 3 and unit 4 are divided into 20 different sets. Each set consists of training data from 7 days (about 201,600 records) and test data from 1 day (about 28,800 records). ...

Experiments and Discussion
The data for modeling are collected every 3 s from unit 3 and unit 4 by the distributed control system (DCS). Unit 3 and unit 4 are two ultra-super-critical 1000 MW power plants with the same structure. In our experiment, in total, 7,084,800 records are used for evaluation, in which unit 3 and unit 4, respectively, have 3,542,400 records from 1 May 2016 to 31 August 2016.

Data Preprocessing
In the data preprocessing process, two steps are taken: Outlier removal and standardization. Outlier removal: The outliers that violate the physical or technical limitations might affect the model's performance and should be removed before modeling. (1) The points out of the normal range of physical or technical are replaced with the average of adjacent points. For instance, for a certain period, the temperature of steam o should be around 600 • C; thus, the points below 594 • C that violate the steady change characteristics of temperature should be replaced. (2) The errors of D water control time should be modified. Under normal circumstances, the D water control time (more than 0) takes a few minutes. For instance, if the collected data shows that the control time lasts for several hours, the abnormal control time will be modified to a maximum of 3 min.
Standardization: The different features might have different range of values. If these variables are used directly, the feature data with small values may be ignored, while the ones with large dimensions will be selected. Therefore, the Z-score standardization technique [28] is used to scale the data to the ones with a mean value of 0 and a standard deviation of 1, which will speed up the iteration rate of the optimization and convergence.

Experiment Settings
The parameters of DNN and GA are shown in Table 2. The DNN is a 2-hidden-layer neural network, and the learning rate is set to 0.001. MAE, which is the average absolute differences between predictions and actual observations, is used to evaluate the modeling error. Tanh is chosen as the activation function since it achieves the smallest average MAE compared to other activation functions (e.g., identity, logistic, relu) for the chosen data set.
The 4-month data for unit 3 and unit 4 are divided into 20 different sets. Each set consists of training data from 7 days (about 201,600 records) and test data from 1 day (about 28,800 records).

Results and Discussion
This proposed method is evaluated from three different perspectives: Firstly, a one-round simulation is performed with a set of data to demonstrate its capability for finding the optimal delay order for different features; secondly, the experiment is implemented on unit 3 and unit 4 at different times to demonstrate the adaptability of the presented method; finally, the delay order identified with data from the unit 3 is directly used in the modeling process for unit 4 to check its capability for generalization.
(1) Results of the one-round simulation As for getting the preliminary delay order in unit 3, the data from~23 July 2016-30 July 2016 is selected as the experiment data. The changes of MAE and the total number of selected orders during the iteration process are shown in Figure 4a. The accuracy level of MAE is set as 0.001. In the early iterations, MAE begins to decrease while the total delay order increases. Then, until MAE stabilizes at 0.13-i.e., the lower limit of MAE-the total delay order decreases. In the later iterations, these criteria remain constant, which indicates that the algorithm is converged. Figure 4b shows each feature's delay order. It can be seen that some features have a larger delay order, e.g., smoke p , which indicates large hysteresis, while in contrast, the order of D water shows timely but transient impacts.

Results and Discussion
This proposed method is evaluated from three different perspectives: Firstly, a one-round simulation is performed with a set of data to demonstrate its capability for finding the optimal delay order for different features; secondly, the experiment is implemented on unit 3 and unit 4 at different times to demonstrate the adaptability of the presented method; finally, the delay order identified with data from the unit 3 is directly used in the modeling process for unit 4 to check its capability for generalization.
(1) Results of the one-round simulation As for getting the preliminary delay order in unit 3, the data from ~23 July 2016-30 July 2016 is selected as the experiment data. The changes of MAE and the total number of selected orders during the iteration process are shown in Figure 4a. The accuracy level of MAE is set as 0.001. In the early iterations, MAE begins to decrease while the total delay order increases. Then, until MAE stabilizes at 0.13-i.e., the lower limit of MAE-the total delay order decreases. In the later iterations, these criteria remain constant, which indicates that the algorithm is converged. Figure 4b shows each feature's delay order. It can be seen that some features have a larger delay order, e.g., smoke , which indicates large hysteresis, while in contrast, the order of D shows timely but transient impacts.  In Figure 5, the forecasting errors in one-minute periods with 20 points in 30 July 2016 are plotted in a box plot which displays the distribution of five different metrics, i.e., minimum, first quartile, median, third quartile, and maximum. Figure 5 shows that MAE increases with the increase in the predicting time step. This is normal, as timely response factors, such as steam , smoke , and baffle , cannot be captured by predictor. However, the median MAE in one minute is less than 0.3 °C, and the average is near 0.1 °C. According to Figure 4b, the maximum delay order of the reheater steam  In Figure 5, the forecasting errors in one-minute periods with 20 points in 30 July 2016 are plotted in a box plot which displays the distribution of five different metrics, i.e., minimum, first quartile, median, third quartile, and maximum. Figure 5 shows that MAE increases with the increase in the predicting time step. This is normal, as timely response factors, such as steam p , smoke t , and baffle o , cannot be captured by predictor. However, the median MAE in one minute is less than 0.3 • C, and the average is near 0.1 • C. According to Figure 4b, the maximum delay order of the reheater steam temperature steam o is 13. This means that the historical data of steam o have major impacts on the accuracy of the model. It also shows that, in the current system, steam o is not well controlled, as it should kept steady around 600 • C.
Processes 2019, 7, x FOR PEER REVIEW 9 of 13 temperature steam is 13. This means that the historical data of steam have major impacts on the accuracy of the model. It also shows that, in the current system, steam is not well controlled, as it should kept steady around 600 °C. (2) Comparisons of unit 3 and unit 4 from different perspectives The feature selection method is tested for both unit 3 and unit 4 based on the operational data from 1 May 2016 to 31 August 2016. Since the records from some days contain too many abnormal data, the data from those days are not used for the model training. As shown in Table 3, the data periods are closed from the intra-comparisons within unit 3 or unit 4 or the inter-comparison between those two units. Table 3 shows that the general range of the seven studied features has the corresponding length of delay orders with respect to their inertia toward steam . For all 20 tests, there is no significant deviation regarding MAE. This means that the designed DNN with the selected features as the inputs achieves good convergence. It also shows that the delay orders of smoke and smoke are larger than those of steam and steam , as the smoke has indirect impacts toward the steam . Thus, their delay orders are much larger than those of the feature of the inlet steam. D has a very small delay order due to the fast temporal response toward steam . For certain periods, the delay orders of D are zero, e.g., in tests 9, 10, 16, and 18. The zero value is due to the lack of training data for D . In those periods, the action of spraying de-superheated water is seldom performed. This is due to the insufficient training samples. At these stages, the numbers of sprays are, respectively, 31, 22, 26, and 18, while other tests have about 60 actions, owing to the comparable steam which is more stable. A similar phenomenon can also be observed for the optimal delay order for baffle . These results show the importance of the data coverage for the accuracy of feature selection. Table 3. Results for both unit 3 and 4 (value before "/" is for unit 3 and after is for unit 4). MAE-Mean absolute error.  (2) Comparisons of unit 3 and unit 4 from different perspectives The feature selection method is tested for both unit 3 and unit 4 based on the operational data from 1 May 2016 to 31 August 2016. Since the records from some days contain too many abnormal data, the data from those days are not used for the model training. As shown in Table 3, the data periods are closed from the intra-comparisons within unit 3 or unit 4 or the inter-comparison between those two units. Table 3 shows that the general range of the seven studied features has the corresponding length of delay orders with respect to their inertia toward steam o . For all 20 tests, there is no significant deviation regarding MAE. This means that the designed DNN with the selected features as the inputs achieves good convergence. It also shows that the delay orders of smoke t and smoke p are larger than those of steam t and steam p , as the smoke has indirect impacts toward the steam o . Thus, their delay orders are much larger than those of the feature of the inlet steam. D water has a very small delay order due to the fast temporal response toward steam o . For certain periods, the delay orders of D water are zero, e.g., in tests 9, 10, 16, and 18. The zero value is due to the lack of training data for D water . In those periods, the action of spraying de-superheated water is seldom performed. This is due to the insufficient training samples. At these stages, the numbers of sprays are, respectively, 31, 22, 26, and 18, while other tests have about 60 actions, owing to the comparable steam o which is more stable. A similar phenomenon can also be observed for the optimal delay order for baffle o . These results show the importance of the data coverage for the accuracy of feature selection. Table 3. Results for both unit 3 and 4 (value before "/" is for unit 3 and after is for unit 4). MAE-Mean absolute error. For the purpose of controlling steam o changes within the ideal range, properly finding a delay order is crucial to accurately describing the hysteresis of features for a prediction model. The variations of delay orders for each feature are shown in Figure 6; the shadow ranges from the maximum to minimum delay order. There is a large overlap between two units, which indicates the existence of common delay orders. The medians of overlap (2, 6, 10, 10, 2, 1, and 14) represent the general level of intervals and may serve as the references for delay orders regarding the steam o system of ultra-super-critical 1000 MW power plants. For the purpose of controlling steam changes within the ideal range, properly finding a delay order is crucial to accurately describing the hysteresis of features for a prediction model. The variations of delay orders for each feature are shown in Figure 6; the shadow ranges from the maximum to minimum delay order. There is a large overlap between two units, which indicates the existence of common delay orders. The medians of overlap (2, 6, 10, 10, 2, 1, and 14) represent the general level of intervals and may serve as the references for delay orders regarding the steam system of ultra-super-critical 1000 MW power plants. The features with delay orders of 2, 6, 10, 10, 2, 1, and 14 generated from the data from unit 3 are used as selected features for the reheater steam temperature prediction. We also adopt the same methods to find the optimal feature distributed for the unit 4. Then, those results are compared with the dataset of test 1 to test 20, which are from unit 4. The orange bars indicate the MAE with the identified delay order. The directly calculated optimal solution is shown by the blue bars. Figure 7 shows the comparisons, which obviously indicate that the MAEs of two cases are approximately equal. The maximum error is only 0.9% (on the 16th day), which means that it is almost the same as the results from the optimal solutions. This shows that the selected delay orders (2, 6, 10, 10, 2, 1, and 14) have good generalization capability, and can, it is argued, represent the physical characteristics of two reheaters. The features with delay orders of 2, 6, 10, 10, 2, 1, and 14 generated from the data from unit 3 are used as selected features for the reheater steam temperature prediction. We also adopt the same methods to find the optimal feature distributed for the unit 4. Then, those results are compared with the dataset of test 1 to test 20, which are from unit 4. The orange bars indicate the MAE with the identified delay order. The directly calculated optimal solution is shown by the blue bars. Figure 7 shows the comparisons, which obviously indicate that the MAEs of two cases are approximately equal. The maximum error is only 0.9% (on the 16th day), which means that it is almost the same as the results from the optimal solutions. This shows that the selected delay orders (2, 6, 10, 10, 2, 1, and 14) have good generalization capability, and can, it is argued, represent the physical characteristics of two reheaters.

Conclusions
For many industrial processes, it is important to find the best feature delay orders as well as features that are most correlated with the prediction targets. In this paper, a delay order identification method based on GA and DNN is proposed. This method adopts the GA to generate candidate feature sets which try to find minimal numbers of features while keeping the MAE of the prediction model low enough. The DNN model is used for modeling processes that generate the multi-step predictions typically demanded in many industrial processes. This method is evaluated with

Conclusions
For many industrial processes, it is important to find the best feature delay orders as well as features that are most correlated with the prediction targets. In this paper, a delay order identification method based on GA and DNN is proposed. This method adopts the GA to generate candidate feature sets which try to find minimal numbers of features while keeping the MAE of the prediction model low enough. The DNN model is used for modeling processes that generate the multi-step predictions typically demanded in many industrial processes. This method is evaluated with experiments from different perspectives; data from two similar units are used to check whether the found time delays indeed demonstrate the physical characteristics of the underlying systems. The experimental results indicate that two units have similar delay orders and the delay order can be directly used for modeling similar devices with little loss of accuracy.
Of course, many interesting issues still need to be investigated. For instance, our solution limits the temporal feature selection. It is important for the delay order selection method to support both spatial and temporal feature selection. We are investigating the use of an attention mechanism to find the optimal solution for both dimensions. In addition, the GA demands considerable resources and computational costs. We are working to design more computationally efficient methods, e.g., filter-based feature selection for industrial feature processing.