Next Article in Journal
Circular Economy in Construction and Demolition Waste Management in the Western Balkans: A Sustainability Assessment Framework
Previous Article in Journal
Generosity and Environmental Protection: How Strong Is the Relationship between Giving and Sustainability?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling a Practical Dual-Fuel Gas Turbine Power Generation System Using Dynamic Neural Network and Deep Learning

1
Department of Electrical Engineering, King Abdullah II School of Engineering, Princess Sumaya University for Technology (PSUT), Amman 11941, Jordan
2
Department of Electrical Engineering, Graduate School of Engineering, University of Vermont, Burlington, VT 05405, USA
*
Author to whom correspondence should be addressed.
Sustainability 2022, 14(2), 870; https://doi.org/10.3390/su14020870
Submission received: 19 November 2021 / Revised: 25 December 2021 / Accepted: 6 January 2022 / Published: 13 January 2022
(This article belongs to the Section Energy Sustainability)

Abstract

:
Accurate simulations of gas turbines’ dynamic performance are essential for improvements in their practical performance and advancements in sustainable energy production. This paper presents models with extremely accurate simulations for a real dual-fuel gas turbine using two state-of-the-art techniques of neural networks: the dynamic neural network and deep neural network. The dynamic neural network has been realized via a nonlinear autoregressive network with exogenous inputs (NARX) artificial neural network (ANN), and the deep neural network has been based on a convolutional neural network (CNN). The outputs selected for simulations are: the output power, the exhausted temperature and the turbine speed or system frequency, whereas the inputs are the natural gas (NG) control valve, the pilot gas control valve and the compressor variables. The data-sets have been prepared in three essential formats for the training and validation of the networks: normalized data, standardized data and SI units’ data. Rigorous effort has been carried out for wide-range trials regarding tweaking the network structures and hyper-parameters, which leads to highly satisfactory results for both models (overall, the minimum recorded MSE in the training of the MISO NARX was 6.2626 × 10−9 and the maximum MSE that was recorded for the MISO CNN was 2.9210 × 10−4, for more than 15 h of GT operation). The results have shown a comparable satisfactory performance for both dynamic NARX ANN and the CNN with a slight superiority of NARX. It can be newly argued that the dynamic ANN is better than the deep learning ANN for the time-based performance simulation of gas turbines (GTs).

1. Introduction

1.1. Aims and Motivations

Gas turbines’ power share has increased progressively in the global power generation mix in later decades due to the progress in their design specifications, efficiency and reliability [1,2]. The field of system modeling and identification has facilitated the path towards many notable improvements, including higher cycle efficiencies and a reduced level of emissions; therefore, GT power generation technology has become an unavoidable choice for many developed and developing countries [3,4,5]. It can be more informative to provide an adequate motivation and background for this research before reviewing the literature/The operating principle of dual-fuel GT can be as simple as shown below in Figure 1.
The air is discharged by the compressor (1–2) for more efficient combustion, while in the combustor, the air/fuel blend is fired and burned (2–3). An isentropic process is established as the operation (1–2), whereas operation 2–3 is a constant pressure or isobaric. The combusted gases are taken by the gas turbine as an isentropic operation (3–4), which activates the thermal energy to become a useful mechanical energy in order to rotate the movable part of the synchronous generator. Operation (1–4) describes the exhausted heat that is very likely to be used to feed the heat recovery steam generator (HRSG) associated with a combined cycle gas turbine unit (CCGT), or is unlikely to be discharged to the atmosphere in GTs with open cycles only. The synchronous generator changes the harvested mechanical power into electrical power in order to feed the grid with electrical energy. The combustion chamber is normally supplied by natural gas through the two valves–the pilot valve and the NG control (premix) valve–however, during low loads, startups or instances where there is a shortage in the NG, the combustion has to be stable so the pilot valve is supplied by fuel oil through a booster pump system. These operating modes are known by the operators as premix mode or premix/diffusion dual mode, in which the premix mode is active only in the normal NG operation from 50% to the rated power and the diffusion mode is possible in the entire load range (including startups and shutdowns). However, the details of the combustion process might be complicated and even unnecessary in control-oriented system identification, machine learning and deep learning because they can be precisely driven by the data variables to create the simulation rules. Modeling gas turbines has many generalized aims, which represent the core objective of our research, such as control system orientations, performance monitoring, control, prediction, reducing emissions and grid code compliance. Furthermore, the trustable modeling of gas turbines or engines can lead to significant sustainable development in these generation systems in terms of using other fuels that are formed from other materials and waste, which can be fired in the same device or by a similar principle, such as biogas, leading to promising, sustainable and flexible power generation scenarios [6,7,8,9,10]. The topic of GT power generation modeling is apparently an interdisciplinary research field that can be deeply related to many disciplines during the research phase, but their research outcomes are extremely useful, so they are therefore released beyond the boundaries of every discipline alone. Figure 2 shows the relevant disciplines and possible useful purposes of GT time-based dynamic simulations. The aforementioned purposes are general; however, the very specific objective that distinguishes our work is that the modeling procedure has been made by two of the most recent and state-of-the-art, which are the dynamic neural network and deep learning convolutional neural network (CNN), with a satisfied performance and quantified analysis of the results.
The scientific merit of this article will be discussed in the next subsection, with a discussion of the related literature.

1.2. Related Work and the Paper Contribution

The aspects of a multidisciplinary/interdisciplinary nature can also be deduced from the literature review that will be presented here; for more detailed literature, the reader may refer to the recent critical review written by the corresponding author [5]. The recently published dynamic models of GTs, whether combined with the steam cycle to become CCGT or as a single unit, can be established by physical laws, system identification, artificial intelligence, machine learning or deep learning techniques. The literature will be informative, with an emphasis on modeling via neural networks, machine learning and deep learning methodologies. Asgari et al. (2014, 2016, 2021) [11,12,13] have published NARX-type ANN models to simulate some significant variables in the startup process, which has been used to simulate the behavior of an actual general electric (GE) GT (PG 9351FA GT). The compression ratio has given the maximum error in the simulation, with a RMSE of 2.8% (0.028) and minimum RMSE of 0.0004 in the speed response [11]. The same primary author has extended the work on GT modeling by a recurrent neural network with a single hidden layer, which achieved a comparable RMSE of approximately 0.22% (0.0022) for training and 2.6% (0.026) for testing [14]. Ibrahem (2020) [15] has offered a NARX ANN model for a GT manufactured by Siemens SGT-A65 ADGTE in order to pave the way toward the design of a predictive control strategy. Different neural network structures of ensemble and single MISO NARX were trained and tested. It has been found that the minimum RMSE achieved for the turbine speed during the training phase is 0, but is 0.0022 in the testing phase for one of the spool speeds. Mohamed et al. (2019) [4] have presented the performance of feed-forward (FF) back-propagation ANN (BPNN) in simulating for the purpose of a comparison with a physics-based model subspace system identification model. The minimum error has been given by the FF ANN of 0.05048 in the frequency or speed response. Rashid et al. (2015) [16] have presented a new model for CCGT by training FF ANN via particle swarm optimization (PSO), where the MSE for training is 1.019 × 10−4 and for testing is 0.0055. Rahmoune et al. (2020) [17] have developed a NARX model to identify the dynamic behavior of the gas turbine components under the influence of the vibration phenomena. The results of the proposed NARX model validated the capability of the NARX NN in determining the dynamic behavior of the gas turbine system, with a simulation MSE of 3.8414 × 10−3 for the high pressure (HP) turbine, 1.29152 × 10−1 and 2.12090 × 10−4 for the gas and air control valves, respectively. In terms of deep learning, Cao et al. (2021) [18] have presented different deep learning techniques that have been used to predict the changes in the efficiency and flow capacity of turbomachinery. The degradation predictions have been established via the LSTM approach, with a high accuracy ranging from 81.65% to 93.65%.
From this review and previous critical reviews [5], it can be readily found that there are no constraints for the achievable accuracy, and therefore more accurate results are probably still attainable. On the other hand, it is unfair to claim the numerical superiority of the accuracy of the proposed models with regard to the published literature because that depends on other factors rather than the NN structure design, such as the difference in on-site data from one GT to another in the literature, which prevents claiming a preference of obtaining accuracy numerically, with differences in data-sets from one research study to another. To the best of the authors’ knowledge, deep learning techniques for modeling GT haven’t yet been studied in detail on the GT time-based dynamic simulation, and it is very interesting to know whether they are comparable, superior or less effective than the dynamic neural network with a shallow structure, especially the NARX ANN.
The convolutional neural network is a well-recognized example of deep learning tools and NARX ANN is a typical and extensively used example of a dynamic neural network; therefore, they are both selected for this study. The scientific contributions of this manuscript are then:
(1) Two accurate methods for simulating Siemens dual-fuel GT have been shown, with an emphasis on the essential variables of the GT. One simulator has been established using dynamic NARX ANN and the other is based on a deep-learning convolutional neural network;
(2) The models’ performances are depicted in MIMO and parallel MISO structures with highly accurate results; as overall indicators, the performances showed that the minimum recorded MSE in the training of the MISO NARX was 6.2626 × 10−9, and the corresponding testing MSE was 3.4983 × 10−7. On the other hand, the maximum average MSE was recorded for the MISO CNN as 2.9210 × 10−4, and both networks worked successfully for more than 15 operating hours of the GT;
(3) It is newly shown that the NARX dynamic ANN was slightly superior in accuracy over the deep neural network, which indicates that the deep learning can be regarded as an alternative, but not substitutional, tool for the simulation of heavy-duty power GTs; in other words, they shall not replace the dynamic ANN, even with shallow architectures. One of the features that makes the NARX ANNs superior is the adoption of past outputs as additional direct inputs in NARX, which increase their overall accuracies. This major advantage has no equivalence in the deep convolutional networks in spite of the variety of their hyper-parameters.
The rest of the paper is organized as follows: Section 2 presents the data preparation of the adopted GT, inputs/outputs selection, normalization, standardization and actual quantities. Section 3 presents the NARX ANN model development, Section 4 presents the CNN model establishment, Section 5 shows the simulations results of both methodologies with a comparison against the real measurements and quantified analysis of the results and, finally, Section 6 concludes the research study and findings with some feasible future trends.

2. Data Curation and Analysis

The utilized datasets for this study have been collected from a real gas turbine generation unit and are provided by the corresponding author. The data-set comprises long-term data that represent 16 h of the GT operation. According to Table 1 and Table 2, the collected datasets have been classified as GT inputs and outputs variables, with the operational range for each variable. As can be seen from the tables, four variables have been identified as the GT’s inputs–the NG valve, the pilot valve, the compressor outlet temperature and the compressor outlet pressure–whereas the remaining three variables, which are the output power, exhausted temperature and the frequency, i.e., speed of the rotor, have been appointed as the outputs of the system.
After defining the input and output parameters from the obtained datasets, the corresponding data have been divided into two groups alternatively, namely training and validation datasets; this will ease the evaluation of the model generalization and prevent over-fitting during the training phase.
The first group of data has been used to train the model, whereas the other group has been applied to evaluate the models’ accuracy, which comprises unseen data, i.e., samples that have not been utilized during the training process. The system formation, including the input and output variables, is shown in Figure 3.
It is worth mentioning that the usual way of considering inputs is to include the compression ratio (CR) as an input instead of both COT and COP; however, these can be equivalent, and the improved accuracy has been notable during the testing phase of the GT.
Standardization and normalization are the most popular rescaling techniques. Both approaches specify the features of the system data within a restricted range rather than a wider range, making it very complex for the model to map inputs to outputs properly. However, both techniques differ in the way they work, and each of them have special use cases. Based on this, the collected data-sets from the GT unit have been pre-processed and rescaled in two main formats aside from the SI units’ data–normalized data and standardized data–in order to train and validate the built networks. This will be more valuable in providing a brief description through these two processes in order to understand how and why the given data are normalized or standardized.

2.1. Data Normalization

This specifies the data between the 0 and 1 range or between the −1 and +1 range. Normalization is required when there is a large difference in ranges of system’s features, furthermore, this scaling approach can be beneficial when the collected data do not follow any distribution, such as a Gaussian distribution. Therefore, this technique can be very useful in the neural networks algorithm, since it does not assume any data distribution. This technique is also known as min-max scaling. Equation (1) presents the mathematical formula for the normalization approach [19,20,21].
x   n o r m = x x m i n x m a x x m i n
where (xmax) and (xmin) are the maximum and minimum values of the input or output feature to the model, respectively. From the above equation, it can be clearly noticed that the range of features for each variable falls between the 0 and 1 range according to the following three scenarios:
  • When x equals the minimum, then ( x   n o r m ) is 0;
  • On the other hand, when   x is the maximum point in the array, then ( x   n o r m ) is 1;
  • However, if x is between the minimum and maximum, then ( x   n o r m ) will be between 0 and 1.

2.2. Data Standardization

This is another common rescaling approach that typically rescales the data to be about the mean with a unity deviation or unit variance. This indicates that the mean is zero and that the resulting distribution has a deviation of one. On the other hand, standardization might be useful when the data have a Gaussian distribution; however, this does not have to be the case. Furthermore, contrasting normalization, standardization has no boundary range; as such, if the data contain outliers, standardization will have no effect on them. Equation (2) shows the associated formula with the standardization technique [19,20,21].
x   s t a n d = x µ σ
where µ is the mean of features and σ is a standard deviation of the feature values. It can be noticed from the above equation that the input and output values are not restricted to a particular range.
In conclusion, using the normalization or standardization will ultimately rely upon the type of data and the machine-learning-based technique that will be employed. There is no hard and fast rule that states when the data should be normalized or standardized. Fitting the model by utilizing the actual, normalized and standardized data in order to achieve the best results, and then comparing the performance among these three types of data formatting, can be a powerful criterion in the deployment of the final model of a GT power plant; see Figure 4, which is dedicated toward data curation in this study.

3. The NARX Model Setup

The mathematical expression of the NARX model can be given as [22]
y ^ t = f u t , u t 1 ,   .   .   .   .   ,   u t n u , , y t 1 ,   .   .   .   . ,   y t n y + e t
where y t and y ^ t are the target and predicted output variables, respectively; u t is the input variable of the network; n u and n y are the time delays of the input and output variable; and e t is the model error between the target and prediction. In other words, y and u are the output and externally determined variables in this equation, respectively. y t is the next value of the dependent output signal, which is regressed on previous values of the output signal and an independent (exogenous) input signal.
To set up an accurate and reliable NARX model for the GT power plant with an acceptable predictive performance, much like other dynamic neural networks, various architectures may be considered over a wide range of trials [13]. These different architectures are based on several factors, such as the number of inputs and outputs, i.e., the MIMO or MISO structure; training algorithms; the number of hidden layers; the number of neurons in the hidden layer(s); the type of activation functions; the maximum number of epochs, i.e., iterations; the number of recurrent connections; and the time delays in the recurrent connections. In addition, another vital factor has been included in this study, which is the data type, i.e., data format. Figure 5 shows the NARX structure constructed for this study, in which, the tapped delay line (TDL) is employed to feed the network with the past values of inputs and outputs. As can be seen from this figure, the proposed NARX model is composed of four inputs, one hidden layer and three outputs.
Where the variables x0 to x4 represent the computer representation of the inputs, and w0 to w4 are the connection weights, which will be generalized later in the equations describing the NARX ANN, σ is the sigmoid activation function symbol and S is the linear activation function symbol, Ŷ(t) is the predicted output value. A thorough computer code in the MATLAB programming environment has been developed to set up and configure the NARX models with sophisticated generalization properties. MATLAB is a versatile programming environment that was founded and established by MathWorks for numerical computation in engineering and scientific applications. The generated code includes several hyper-parameters for training and configuring NARX models of a gas turbine generation unit. More precisely, the maximum number of iterations, learning rate, number of hidden layer’s neurons, time delays in the recurrent connections and model structure, i.e., MIMO and MISO configurations, as well as the data type, including normalized, standardized and actual data. All of these have been considered in the developed code as a combination of a variety of settings. Besides, this study employs a feed-forward multilayer dynamic neural network architecture with an input layer, one hidden layer and an output layer with a sigmoid-type transfer function and linear activation function for the output layer. Furthermore, the developed program has been used to train a wide range of NARX topologies, employing three training algorithms in the training step, which are the Levenberg–Marquardt (LM) algorithm, Bayesian regularization algorithm and scaled conjugate algorithm. Eventually, the tweaking of all hyper-parameters, in addition to the training algorithm, results in an indication of the best performance and its relevant NARX model. The mean squared error (MSE), which expresses the average squared error between the network outputs, the default performance function for feed-forward networks can be expressed as [23]:
E = m s e = 1 N i = 1 N ( e i ) 2 = 1 N i = 1 N y t y ^ t 2
The backpropagation technique, which involves executing computations backwards through the network, is used to determine the gradient and the Jacobian. However, it is tough to estimate which training method will be the most efficient for a given situation [23]. It is determined by a variety of parameters, including the problem’s complexity, the quantity of data points in the training set, the number of weights and biases in the network, the error target and whether the network is used for pattern recognition (discriminant analysis) or function approximation (regression) [23]. Therefore, the proposed NARX model of the GT power plant has been trained over a wide range of trials, including the three different optimization algorithms, in order to obtain the best performance and the most applicable NARX network. For more details about the training algorithms, Levenberg–Marquardt (LM) algorithm, Bayesian regularization (BR) algorithm and scaled conjugate gradient (SCG) algorithm, refer to [23]. According to the input variable u t in Equation (3), the output from the hidden layer at t time is computed as [22]:
H t i = f 1 j = 0 n u w i j u t j + k = 1 n y w i k y t k + a i
where w i j is the connection weight between the input neuron u t j and the i t h hidden neuron; w i j is the connection weight between the i t h hidden neuron and the output feedback delayed loop; a i is the bias of the hidden layer neurons; f 1 . is the hidden layer transfer function, i.e., activation function [22]. As mentioned before, the sigmoid function has been used in the proposed code as a hidden layer activation function. Equation (6) shows the mathematical expression of the sigmoid function [22]:
f 1 = 1 1 + e S
The final NARX prediction value network can eventually be obtained by integrating the hidden layer outputs as given [22].
y l ^ t = f 2 i = 1 n h w l i H t i + b l
where w l i is the connection weight between the i t h hidden neuron and l t h estimated output n h ; b l is the bias l t h predicted output; n h is the number of hidden neurons; and f 2 . is the output layer activation function. The mathematical representation of the linear activation function f 2 (.) is presented in Equation (8) [22]:
f 2 = S
According to the written code, the early stopping condition for the number of iterations, i.e., epochs, has been set to 1000. The datasets with three formats have been divided into three subsets: the training set (70%) for training the model, the validation set (15%) to confirm that the network is generalized properly and to stop the training step before overfitting and the test set (15%), which is utilized as a totally independent test of network generalization. The divided datasets have been applied to train the open-loop NARX model to guarantee an efficient learning procedure, since the true outputs are available during the training process as discussed before. After determining the optimal open-loop NARX model over a wide range of trials, the optimal open-loop network can then be transformed into a closed-loop mode for multi-step prediction. In this study, there are eighteen NARX architectures based on an open-loop mode with MIMO and parallel MISO structures and with different parameters: the number of hidden layer neurons, the training algorithms, the time delay in the recurrent connection and the data format.
The next subsection explains the MIMO and MISO NARX models.

3.1. The MIMO Model

The model has been evaluated with one hidden layer and various numbers of neurons in the hidden layer and various time delays, as well as different data types. The network has a three-neuron output layer, which means that the output power, frequency and exhausted temperature are three steps ahead. Furthermore, the three learning approaches have been tested, i.e., Levenberg–Marquardt, Bayesian regularization and the scaled conjugate. Due to the very high number of trials, it is infeasible to mention all of them here, but some samples that show the performance MSE and regression parameter R of the resultant MIMO NARX models are tabulated in Table 3, with the best design bolded.
According to the findings shown in Table 3, it can be noticed that the MIMO NARX structure with fifteen hidden layer nodes and a recurrent connection with thirty seconds employing the normalized data format, as well as the Bayesian regularization training algorithm, produced the best results in the test subset with time delay of 30 time samples and hidden neurons of 15 and normalized data format. Furthermore, the best regression coefficient was also found in the same network. The optimum performance and regression of the developed MIMO NARX model with four inputs and three outputs at the time is shown in Figure 6 and Figure 7, respectively. These graphs depict both the mean squared error (MSE) trend for the training and test sets and their regression training coefficient R during the learning procedure.
The decrease in both the training and especially the test sets trends demonstrates that there is no over-fitting in the model. As the performance figure shows, the best training performance was obtained after 503 iterations (epochs), since the minimum gradient was reached, with the MSE averaging 1.0732 × 10–6. Figure 8 represents the optimal open-loop MIMO NARX model based on fifteen neurons in the hidden layer.
It can be noticed from Figure 8 that the three outputs are fed into the input layer and output layer at the same time. Despite the relatively high performance and regression coefficients of the MIMO NARX network created, dealing with one output at a time is more efficient in the NARX network and will result in a high performance for the time prediction of each output parameter of the GT unit. Therefore, further developments have been carried out on the MATLAB code to create an open-loop MISO NARX model to predict the GT parameters individually. The constructed MISO models and their performance are elaborated in the next section.

3.2. The Parallel MISO Model

The model has been evaluated with one hidden layer and various numbers of neurons in the hidden layer and various time delays, as well as different data types. The network had a one-neuron output layer, which means that the output power, frequency or exhausted temperature were one step ahead at a time in each trial. Furthermore, the three learning approaches have been tested, i.e., Levenberg–Marquardt, Bayesian regularization and the scaled conjugate. Some samples of the trials for establishing the MISO model with the MSE performance and regression coefficients R of the resultant MISO NARX models for each parameter (output power, frequency and exhausted temperature) are tabulated in Table 4, Table 5 and Table 6, respectively.
The computational reasons for the superiority of the BR training algorithm can be argued to be due to the fact that the BR has no earlier stopping criteria, such as those in the LM and SCG algorithms. In addition, the normalized data are much better handled by the NARX ANN than the actual and standardized because of the harmony in the upper and lower limits of all outputs of the GT in normalized values, and the given set of data is a time-based measurement record of the data, which doesn’t belong to the class of data that embeds a Gaussian distribution. From Table 6 above, the optimal average MSE and the regression coefficient of the three GT’s parameters have been found in the twenty hidden layer neurons structure, with a 30 sample time delay employing a normalized data type and Bayesian training algorithm. The optimal training performance with an average MSE of 8.46 × 10–7 was obtained after 1000 iterations (epochs), since the maximum epochs number was reached. Furthermore, the best regression coefficient was also found in the same NARX network. Figure 9, Figure 10, Figure 11 and Figure 12 show the performance and the regression plot of each developed MISO NARX model that is based on four inputs and one output at a time for the three output variables. These figures illustrate both the mean squared error (MSE) trend for the training and test sets and their regression training coefficient R during the learning procedure.
The decrease in the MSE trend demonstrates that there is no overfitting in the proposed MISO NARX model. The regression plots demonstrate that the model achieved the optimum fits, since the datasets lie against the line at which all of the outputs are on par with the targets. Figure 13 represents the optimal open-loop MISO NARX model with twenty neurons in the hidden layer. It can be noticed from the figure above that there is one output fed into the input layer and output layer at the same time.

4. The Deep Learning Convolutional Neural Network (CNN) Model Setup

In this section, it will be valuable to elaborate on what the gradient descent algorithm that is used for training the CNN is and how this technique works in order to justify the GT data curation. It is an optimization technique that is utilized when training a neural network model based on the convex function [19]. The gradient descent tweaks the network parameters of the CNN model parameters to attain the minimum cost function of the given model. This function quantifies the performance of the model by computing the error between the predictions and the actual data values, then represents it in a single real number form. In other words, gradient descent is a paramount technique in machine learning models that offers the determination of the function’s coefficients that minimize a cost function as much as feasible; more details regarding the gradient descent algorithm can be found in the powerful Coursera course [20]. In machine learning and deep learning terms, the gradient descent can be assumed as a derivative of a function with more than one input [19]. The mathematical translation of the gradient descent technique is as follows [21]:
ω j + 1 = ω j α 1 m i = 1 ( h ω ( x i y i ) x j i
This equation will be adjusted through the weight values until reaching the convergence (i.e., the minimum value of cost function), where: ω j + 1 : iterated weight value; ω j : previous weight value; α : learning rate; m : number of training samples; h ω : hypothesis; x i : the i t h training example; y i : the corresponding predicted i t h example; x j i : the j t h feature in a given training example.
As can be seen in Equation (9), the cost function is firstly based on the initial value of the weight vector. These weights are adjusted iteratively using the gradient descent method over given data-sets in order to minimize the cost function of the generated model. From the aforementioned basics, it is clear that the presence of variable (x), which represents the input variables that are fed into the model will influence the gradient descent step size. Moreover, as mentioned before, the datasets that are used for the proposed models have been drawn from a practical GT generation unit, that, in turn, means that the system’s variables have a highly dynamic distribution. Therefore, the input and output datasets to the NARX- and CNN-based GT-model may differ greatly in scale, range and distribution for each variable; for example, the deviation among the output power values and exhausted temperature values is slightly larger than the change in the frequency instances. Differences in scales among the model’s parameters may exacerbate the difficulty of the modeled problem [19]. Some of the large input and output values may result in a model that learns large weight values. A model with large weight values is frequently unstable, which means that it may perform poorly during the learning phase and may be delicate to input values, resulting in an increased mean squared error, i.e., generalization error. Therefore, there is a need to apply a features-rescaling technique to the GT’s variables in the step of data pre-processing. Data pre-processing guarantees that the gradient descent of the model heads smoothly towards the minimum error and that the gradient descent steps are updated at the same rate for all parameters. Having the features of the data on a similar scale makes all input and output variables of a GT power plant equally important and easier to compile by the NARX and CNN model [21]. The convolutional neural network (CNN) is one of the most popular deep neural networks [24]. CNN usually comprises various layers, such as convolutional layers, pooling layers, fully connected layers, i.e., dense layers, etc. Figure 14 represents a typical example of CNN architecture.
According to the figure above, the first type of layer, which is called a convolutional layer, consists of filters and feature maps. The input to this filter is known as the receptive field [25], and it has a defined size. Each filter is pushed across the previous layer, producing an output that is collected in the feature map. In other words, the CNN’s convolutional layer adjusts the local perception and weight sharing, which consequently improves their ability to extract the significant features [25,26]. It will be informative to mention that the used GT’s datasets belong to a one-dimension distribution; thus, the corresponding convolutional layer that deals with the given datasets will be a 1D convolutional layer. One-dimensional-CNN accomplishes convergence across the local area of input parameters to generate the appropriate feature. Each kernel, i.e., filter, has unique characteristics on the feature map in all locations. Since the 1D-CNN utilizes the weight-sharing approach as mentioned before, fewer parameters need to be converged with the 1D-CNN than with conventional neural networks [26]. This ensures that the 1D-CNN converges earlier and faster. An example of a 1D convolutional operation is illustrated in Figure 15. Regarding the kernel’s size, it is set to 2, which means that all weights ( w 1 , w 2 ) will be shared by every step of the input layer ( x 1 , x 2 , , x n ) and the output y 1 , y 2 , , y n . In the kernel window ( w 1 , w 2 ) , which represents the filter size, the input values are multiplied by the weights and then the values are summed up in order to compute the value of the features map. In the shown example, the value of y 2 is obtained from y 2 = ( w 1 x 1 + w 2 x 2 ) [26]. The output of the convolution layer is provided as both the output and the input of the following layer. It also represents the features derived from training samples using the convolution kernel.
In order to obtain one-dimensional features, 1D-CNN performs input signal convolution operations in the local area, and various kernels extract certain features from the inputs. As illustrated in Figure 15, each kernel recognizes certain characteristics in any location on the input features map, and weight-sharing is performed on the same input feature map. This mechanism minimizes the number of parameters during training. The mathematical formula if the L i is a 1D convolutional layer can be generally expressed in Equation (10) [26]:
x j l = f b j l + i = 1 M x i l 1 k i j l  
where k denotes the number of convolution kernels, j is the kernel size and M refers to the channel input number x i l 1 . The kernel bias is indicated by b , where the symbol (*) is the convolution operator. f c . represents the non-linear activation function. CNNs usually utilize the rectified linear unit (ReLU), i.e., f x = max 0 , x , as an activation function [24]. Pooling layers are paramount for CNNs. Pooling methods can be thought of as down-samples used to minimize the parameter number while maintaining the major features in order to speed up the next stage, since there are more feature maps in the downstream sampling phase, resulting in an increased data dimensionality, which makes calculations too complex [27]. Figure 16 illustrates some max-pooling operation used in this study.
Sustainability 14 00870 g016
The learning rate determines how fast or slow we will move towards the optimal weights. If the learning rate is very large, we will skip the optimal solution; if it is too small, we will need too many iterations to converge to the best values. Therefore, using a good learning rate is crucial.
Table 7 and Table 8 represent some important samples of the CNN-developing attempts in MIMO and MISO structures, with the best choices bolded. Other design parameters, such as number of filters, filter size, number of hidden layers, number of neurons in the hidden layer, number of convolutional layers and number of filters in the convolutional layer, are tuned widely with implication of 3 × 10–3 learning rate. We shall avoid confusion in mentioning all trials that have been made, so we have mentioned only the ultimate parameters that have attained the lowest possible MSE. Then, the final CNN architectures with MIMO and MISO structures, and final optimal MSE value for each, are mentioned in Table 7.

5. Time-Based Simulation Results and Discussion

This section depicts the simulation results of the two approaches and their architectures (Figure 17, Figure 18, Figure 19, Figure 20, Figure 21 and Figure 22). From the results and the previously corresponding tabulated MSEs, it is evidenced that the deep CNN and the dynamic NARX ANN have shown a satisfactory performance in their application to heavy-duty dual-fuel GTs. They can be used for short-term or long-term predictions, controller upgrading, performance monitoring during measurement device malfunctioning, fuel requirements for the demand, GT characteristics with different fuels and so on.
The trends are followed successfully by both techniques for the power (ranges: 0–1 normalized and 124.89–241.57 MW actual power range of load-down, then load-up), with very negligible errors (minimum MSE 6.2626 × 10–9 and maximum MSE of 2.9210 × 10–4) for the adopted long operation time of the GT (more than 15 h of continuous operation), which indicate the robustness of deep learning and shallow dynamic ANNs. Such accuracies in the responses of GTs are difficult to attain by physics-informed or other system identification techniques because the power plant noises and uncertainties are high, and increasingly vary with the changes in the operating conditions. In addition, the differences in the nature of the responses make the simulation far more challenging; for instance, the power variations appear to be slower than the changes in the temperature and frequency, whereas the later responses change more severely, which makes the problem computationally over-complicated for the models to track all these variation trends simultaneously. Nevertheless, the proposed techniques in this paper have easily handled such computational burdens and prediction capabilities for a longer time than what has been previously published, which covers more than 15 h (or more than 54,000 sec) of operation.
It can also be seen that the NARX ANN has shown a slight superiority in the error values and also in zooming the results for both structures (parallel MISO and MIMO); this could be due to the following reasons:
  • Its simplified structure that implicates the direct effect of inputs and outputs; therefore, there are more realistic reflections of the inputs on the outputs;
  • The use of feedback delayed outputs as additional inputs, which increase the number of inputs utilized to depict the output more accurately. This important feature has no equivalence in CNN, despite its sophistication in the variety and number of its layers.
It can be generally deduced that the dynamic ANN, even if recognized as a shallow ANN with a single hidden layer, is still a leading choice for the modeling and simulation of GTs, which have negligible simulation errors and a high simulation performance of the variation trends of GT power plants. For other different successful applications of CNN and NARX ANNs, rather than time-based simulations, the reader may refer to the references [24,25,26,27,28].

6. Conclusions

Based on the most recent proposed future trends, simulated models of deep CNN and dynamic NARX ANN have been presented with extremely accurate results, which confirm the scientific merits of deep learning and shallow dynamic ANNs for the emulation of the GT power station performance. Some paper findings are below:
  • It is generally highly recommended to normalize the data of GTs rather than dealing with actual quantities in using ANNs in models;
  • The training algorithm of BR outperforms other training algorithms because of its late ultimate termination criteria, unlike other aforementioned earlier ones (LM and SCG);
  • The prediction capabilities of NARX ANN and CNN for the GTs time-based dynamic performance are satisfactory, with very small negligible errors for both techniques.
Based on the aforementioned points, the paper’s goals have been generally achieved. Further goals are important to mention based on the observation and investigation of the results:
  • There was a slight superiority of the dynamic NARX type in terms of its accuracy. A new conclusion can be suggested by stating that the main computational reason, which is the feedback delay element in NARX despite the shallow structure, is capable of providing additional information with other direct inputs in order to improve the accuracy over the deep CNN, in which there is no delay feedback element;
  • Based on the aforementioned results, deep learning can act as an alternative choice of modeling GTs in real applications, but cannot be a substitutional tool for the shallow dynamic ANN. This is because both have shown successful performances and can be used reliably in real applications;
  • Despite the achieved targets of the paper, there are still some deep learning techniques that have not been investigated in the literature; these techniques might have a comparable performance, and this motivates the mentioning of some future research opportunities;
  • One of the clearer future trends is to use other deep learning techniques and to compare them appropriately with developed/published ones. This may include the advanced deep recurrent neural network and locally connected neural networks;
  • Another possible future outcome is to include the fuel preparation system, especially for biogas firing for such turbines, and the process of (gasification/digestion), in order to quantify the amount of materials used to be converted to biogas and to link those with an enhanced control strategy with new objectives;
  • Another feasible future point is designing a supervisory controller for the developed ANN models and applying it to regulate the diffusion and premix modes, together with the objectives of a higher efficiency and lower emissions. A comparative study with other modeling philosophies may be useful, such as physics-based models and other black-box and grey-box models, with emphasis on many performance criteria rather than the mere numeric value of the accuracies.

Author Contributions

Conceptualization, M.A. and O.M.; methodology, M.A. and O.M.; software, M.A., O.M. and M.M.; validation, M.A. and O.M.; formal analysis, M.A. and O.M; investigation. O.M.; resources, M.A., O.M. and M.M.; data curation, M.A. and O.M; writing–original draft preparation, M.A. and O.M.; writing–review and editing, M.A. and O.M.; visualization, M.A. and O.M.; supervision, O.M.; project administration, O.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the preference of its availability upon request.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

ADGTEAero-derivative gas turbine engine
ANNsArtificial neural networks
BPNNBack-propagation Neural Network
BRBayesian regularization
CCGTCombined cycle gas turbine unit
CRCompression ratio
COPCompressor outlet pressure
COTCompressor outlet temperature
CNNConvolutional neural network
EXTExhausted temperature
FFFeed-forward
FreqFrequency
GTGas turbine
GEGeneral electric
HRSGHeat recovery steam generator
LMLevenberg-–Marquardt
LSTMLong short term memory
MSEMean squared error
MIMOMulti-input multi-output
MISOMulti-input single-output
NGNatural gas
NGVNatural gas control valve position
NARXNonlinear autoregressive network with exogenous inputs
NormNormalized data
1DOne-dimension
POutput power
PSOParticle swarm optimization
PILTVPilot gas valve position
RRegression parameter
RMSERoot mean squared error
SCGScaled conjugate
StandStandardized data
TDLTapped delay line

References

  1. Boyce, M.P. Gas Turbine Engineering Handbook; Elsevier: Amsterdam, The Netherlands, 2011. [Google Scholar]
  2. Rayaprolu, K. Boilers for Power and Process; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar]
  3. Mohamed, O.; Wang, J.; Khalil, A.; Limhabrash, M. Predictive control strategy of a gas turbine for improvement of combined cycle power plant dynamic performance and efficiency. SpringerPlus 2016, 5, 501. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Mohamed, O.; Za’ter, M. Comparative Study Between Three Modeling Approaches for a Gas Turbine Power Generation System. Arab. J. Sci. Eng. 2020, 45, 1803–1820. [Google Scholar] [CrossRef]
  5. Mohamed, O.K. Progress in Modeling and Control of Gas Turbine Power Generation Systems: A Survey. Energies 2020, 13, 2358. [Google Scholar] [CrossRef]
  6. Thangavelu, S.K.; Arthanarisamy, M. Experimental investigation on engine performance, emission, and combustion characteristics of a DI CI engine using tyre pyrolysis oil and diesel blends doped with nanoparticles. Environ. Prog. Sustain. Energy 2020, 39, e13321. [Google Scholar] [CrossRef]
  7. Murugesan, A.; Umarani, C.; Subramanian, R.; Nedunchezhian, N. Bio-diesel as an alternative fuel for diesel engines–A review. Renew. Sustain. Energy Rev. 2009, 13, 653–662. [Google Scholar] [CrossRef]
  8. Yaqoob, H.; Teoh, Y.H.; Ud Din, Z.; Sabah, N.U.; Jamil, M.A.; Mujtaba, M.A.; Abid, A. The potential of sustainable biogas production from biomass waste for power generation in Pakistan. J. Clean. Prod. 2021, 307, 127250. [Google Scholar] [CrossRef]
  9. Teoh, Y.H.; How, H.G.; Sher, F.; Le, T.D.; Nguyen, H.T.; Yaqoob, H. Fuel Injection Responses and Particulate Emissions of a CRDI Engine Fueled with Cocos nucifera Biodiesel. Sustainability 2021, 13, 4930. [Google Scholar] [CrossRef]
  10. Ayanoglu, A.; Yumrutas, R. Production of gasoline and diesel like fuels from waste tire oil by using catalytic pyrolysis. Energy 2016, 103, 456–468. [Google Scholar] [CrossRef]
  11. Asgari, H. Modelling, Simulation and Control of Gas Turbines Using Artificial Neural Networks. Ph.D. Thesis, University of Canterbury, Christchurch, New Zealand.
  12. Asgari, H.; Chen, X.; Morini, M.; Pinelli, M.; Sainudiin, R.; Rugerro Spina, P.; Venturini, M. NARX models for simulation of the start-up operation of a single-shaft gas turbine. Appl. Therm. Eng 2016, 93, 368–376. [Google Scholar] [CrossRef]
  13. Asgari, H.; Ory, E. Prediction of Dynamic Behavior of a Single Shaft Gas Turbine Using NARX Models. In Proceedings of the ASME Turbo Expo 2021: Turbomachinery Technical Conference and Exposition. Volume 6: Ceramics and Ceramic Composites; Coal, Biomass, Hydrogen, and Alternative Fuels; Microturbines, Turbochargers, and Small Turbomachines, Virtual, Online, 7–11 June 2021. V006T19A007. ASME. [Google Scholar]
  14. Asgari, H.; Ory, E.; Lappalainen, J. Recurrent Neural Network Based Simulation of a Single Shaft Gas Turbine. In Finland Linköping Electronic Conference Proceedings, Proceedings of The 61st SIMS Conference on Simulation and Modelling SIMS, Virtual Conference, 22–24 September 2020; LiU Electronic Press: Linköping, Finland, 2020; Volume 176, pp. 99–106. [Google Scholar] [CrossRef]
  15. Ibrahem, I.M.A. A Nonlinear Neural Network-Based Model Predictive Control for Industrial Gas Turbine. Ph.D. Thesis, Université Du Québec, Quebec, QC, Canada, 2020. [Google Scholar]
  16. Rashid, M.; Kamal, K.; Zafar, T.; Sheikh, A.; Shah, A.; Mathavan, S. Energy prediction of a combined cycle power plant using a particle swarm optimization trained feed-forward neural network. In Proceedings of the IEEE International Conference on Mechanical Engineering Automation and Control Systems, Tomsk, Russia, 1–4 December 2015; pp. 1–5. [Google Scholar]
  17. Rahmoune, M.B.; Hafaifa, A.; Kouzou, A.; Chen, X.; Chaibet, A. Gas turbine monitoring using neural network dynamic nonlinear autoregressive with external exogenous input modelling. Math. Comput. Simul. 2021, 179, 23–47. [Google Scholar] [CrossRef]
  18. Cao, Q.; Chen, S.; Zheng, Y.; Ding, Y.; Tang, Y.; Huang, Q.; Wang, K.; Xiang, W. Classification and prediction of gas turbine gas path degradation based on deep neural networks. Int. J. Energy Res. 2021, 45, 10513–10526. [Google Scholar] [CrossRef]
  19. Bulit in’s Expert Contributor Network, Gradient Descent: An Introduction to 1 of Machine Learning’s Most Popular Algorithms, Built in Beta. 2021. Available online: https://builtin.com/data-science/gradient-descent. (accessed on 21 October 2021).
  20. Ng, A.; Bensouda Mourri, Y.; Katanforoosh, K. DeepLearning.AI, Coursera. 2020. Available online: https://www.coursera.org/learn/deep-neural-network (accessed on 27 December 2021).
  21. Bhandari, A. Feature Scaling for Machine Learning: Understanding the Difference Between Normalization vs. Standardization. Analytics Vidhya. 2020. Available online: https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/ (accessed on 27 December 2021).
  22. Liu, Q.; Wei, C.; Huosheng, H.; Quingyuan, Z.; Zhixiang, X. An Optimal NARX Neural Network Identification Model for a Magnetorheological Damper With Force-Distortion Behavior. Front. Mater. 2020, 7, 10. [Google Scholar] [CrossRef]
  23. Markova, M. Foreign Exchange Rate Forecasting by Artificial Neural Networks. AIP Conf. Proc. 2019, 2164, 060010. [Google Scholar] [CrossRef]
  24. Bai, M.; Yang, X.; Liu, J.; Liu, J.; Yu, D. Convolutional neural network-based deep transfer learning for fault detection of gas turbine combustion chambers. Appl. Energy. 2021, 302, 117509. [Google Scholar] [CrossRef]
  25. Wunsch, A.; Liesch, T.; Broda, S. Groundwater Level Forecasting with Artificial Neural Networks: A Comparison of LSTM, CNN and NARX. Hydrol. Earth Syst. Sci. 2021, 25, 1671–1687. [Google Scholar] [CrossRef]
  26. Ragab, M.G.; Abdulkadir, S.J.; Aziz, N.; Al-Tashi, Q.; Alyousifi, Y.; Alhussian, H.; Alqushaibi, A. A Novel One-Dimensional CNN with Exponential Adaptive Gradients for Air Pollution Index Prediction. Sustainability 2020, 12, 10090. [Google Scholar] [CrossRef]
  27. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Wang, S.; Chen, H. A novel deep learning method for the classification of power quality disturbances using deep convolutional neural network. Appl. Energy 2019, 235, 1126–1140. [Google Scholar] [CrossRef]
Figure 1. (a) General Subsystems of dual-fuel GT; (b) The T-S (temperature–entropy) diagram.
Figure 1. (a) General Subsystems of dual-fuel GT; (b) The T-S (temperature–entropy) diagram.
Sustainability 14 00870 g001
Figure 2. The interdisciplinary nature of the research topic of GT modeling and simulation.
Figure 2. The interdisciplinary nature of the research topic of GT modeling and simulation.
Sustainability 14 00870 g002
Figure 3. Input and output parameters of the model developed in this study.
Figure 3. Input and output parameters of the model developed in this study.
Sustainability 14 00870 g003
Figure 4. The flowchart that summarizes data curation in this study.
Figure 4. The flowchart that summarizes data curation in this study.
Sustainability 14 00870 g004
Figure 5. The structure of NARX model developed for the gas turbine generation unit.
Figure 5. The structure of NARX model developed for the gas turbine generation unit.
Sustainability 14 00870 g005
Figure 6. Performance plot of the developed MIMO NARX model designed for the gas turbine power plant.
Figure 6. Performance plot of the developed MIMO NARX model designed for the gas turbine power plant.
Sustainability 14 00870 g006
Figure 7. Regression plot for Bayesian regularization algorithm used for proposed MIMO model.
Figure 7. Regression plot for Bayesian regularization algorithm used for proposed MIMO model.
Sustainability 14 00870 g007
Figure 8. The open-loop MIMO NARX model for the gas turbine generation unit.
Figure 8. The open-loop MIMO NARX model for the gas turbine generation unit.
Sustainability 14 00870 g008
Figure 9. Performance plot of the optimal MISO NARX model designed for the output power.
Figure 9. Performance plot of the optimal MISO NARX model designed for the output power.
Sustainability 14 00870 g009
Figure 10. Performance plot of the optimal MISO NARX structure for the system freq.
Figure 10. Performance plot of the optimal MISO NARX structure for the system freq.
Sustainability 14 00870 g010
Figure 11. Performance plot of the MISO NARX structure for the EXT.
Figure 11. Performance plot of the MISO NARX structure for the EXT.
Sustainability 14 00870 g011
Figure 12. Regression plot for Bayesian regularization algorithm used for proposed MISO model.
Figure 12. Regression plot for Bayesian regularization algorithm used for proposed MISO model.
Sustainability 14 00870 g012
Figure 13. The open-loop MISO NARX model of the gas turbine generation unit.
Figure 13. The open-loop MISO NARX model of the gas turbine generation unit.
Sustainability 14 00870 g013
Figure 14. Typical example of CNN architecture with typical numbers of the filters and layers.
Figure 14. Typical example of CNN architecture with typical numbers of the filters and layers.
Sustainability 14 00870 g014
Figure 15. One-dimensional convolution operation of the CNN in this study.
Figure 15. One-dimensional convolution operation of the CNN in this study.
Sustainability 14 00870 g015
Figure 16. Max-pooling operation for the adopted CNN in this study.
Figure 16. Max-pooling operation for the adopted CNN in this study.
Sustainability 14 00870 g016
Figure 17. Normalized exhausted gas temperature (MIMO structure performance).
Figure 17. Normalized exhausted gas temperature (MIMO structure performance).
Sustainability 14 00870 g017
Figure 18. Normalized frequency or turbine speed (MIMO structure performance).
Figure 18. Normalized frequency or turbine speed (MIMO structure performance).
Sustainability 14 00870 g018
Figure 19. Normalized power (MIMO structure performance).
Figure 19. Normalized power (MIMO structure performance).
Sustainability 14 00870 g019
Figure 20. Normalized exhausted temperature (MISO structure performance).
Figure 20. Normalized exhausted temperature (MISO structure performance).
Sustainability 14 00870 g020
Figure 21. Normalized frequency or turbine speed (MISO structure performance).
Figure 21. Normalized frequency or turbine speed (MISO structure performance).
Sustainability 14 00870 g021
Figure 22. Normalized power (MISO structure performance).
Figure 22. Normalized power (MISO structure performance).
Sustainability 14 00870 g022
Table 1. GT input variables.
Table 1. GT input variables.
VariableAbbreviationUnitActual Operational Range
Pilot gas valve position
Natural gas control valve position
Compressor outlet pressure
Compressor outlet temperature
PILTV
NGV
COP
COT
%
%
bar
°C
[41.06–44.79%]
[27.18–39.45%]
[11.45–16.75]
[366.5–439.90]
Table 2. GT output parameters.
Table 2. GT output parameters.
VariableAbbreviationUnitActual Operational Range
Output power
Frequency
Exhausted temperature
P
Freq
EXT
MW
Hz
[241.57–124.89]
[49.91–50.14]
[558.72–559.47]
Table 3. Samples of results of MIMO NARX structures for three system outputs (P, Freq and EXT).
Table 3. Samples of results of MIMO NARX structures for three system outputs (P, Freq and EXT).
Hidden Layer NeuronsTime DelayTraining AlgorithmData FormatPerformance Average MSERegression
TrainingValidationTestTrainingValidationTest
52LMActual3.4998 × 10–64.3858 × 10–66.1132 × 10–60.999970.999960.99994
1115LMStand3.1147 × 10–64.2107 × 10–68.6999 × 10–60.999980.999960.99991
2020LMNorm3.4393 × 10–64.5559 × 10–63.8095 × 10–60.999970.999950.99996
1115BRStand2.3191 × 10–6 6.050 × 10–60.99998 0.99994
1530BRNorm1.0732  × 10–6 3.2062  × 10–60.99998 0.99997
2020BRNorm2.7990 × 10–6 3.2469 × 10–60.99997 0.99996
95SCGActual4.5996 × 10–54.4098 × 10–53.1814 × 10–50.999930.999930.99993
1515SCGNorm1.4164 × 10–41.6761 × 10–42.6512 × 10–40.999920.999940.99993
Table 4. Samples of the trials of the results of MISO NARX structures for the output power P (MW).
Table 4. Samples of the trials of the results of MISO NARX structures for the output power P (MW).
Hidden Layer NeuronsTime DelayTraining AlgorithmData FormatPerformance MSERegression
TrainingValidationTestTrainingValidationTest
95LMActual1.9266 × 10–72.0124 × 10–73.2844 × 10–60.999970.999960.99997
1115LMStand2.8425 × 10–74.3205 × 10–73.4621 × 10–60.999960.999970.99991
2030LMNorm2.6179 × 10–73.0105 × 10–71.2145 × 10–60.999960.999970.99996
95BRActual2.5613 × 10–8 6.7436 × 10–60.99996 0.99989
1115BRStand1.7642 × 10–8 3.3149 × 10–71 1
1525BRNorm1.4258 × 10–8 1.4642 × 10–71 1
2030BRNorm6.2626  × 10–9 3.4983  × 10–71 1
95SCGActual4.4996 × 10–54.2032 × 10–52.3214 × 10–50.992720.990970.99505
1115SCGStand1.5093 × 10–42.6054 × 10–42.4232 × 10–40.995620.993410.99598
1525SCGNorm1.3164 × 10–42.7791 × 10–41.6512 × 10–40.957590.940210.95058
Table 5. Samples of the trials of the results of MISO NARX structures for the system frequency (Freq).
Table 5. Samples of the trials of the results of MISO NARX structures for the system frequency (Freq).
Hidden Layer NeuronsTime DelayTraining AlgorithmData FormatPerformance MSERegression
TrainingValidationTestTrainingValidationTest
95LMActual6.0188 × 10–62.0124 × 10–73.2844 × 10–60.999970.999960.99997
1115LMStand7.5393 × 10–64.3205 × 10–73.4621 × 10–60.999960.999970.99991
2030LMNorm8.4340 × 10–63.0105 × 10–71.2145 × 10–60.999960.999970.99996
95BRActual3.7643 × 10–6 6.7436 × 10–60.99996 0.99989
1115BRStand3.7362 × 10–6 3.3149 × 10–71 1
1525BRNorm1.5820  × 10–6 1.4642  × 10–71 1
2030BRNorm2.1999 × 10–6 3.4983 × 10–71 1
52SCGActual2.4112 × 10–43.1053 × 10–42.2242 × 10–40.993320.993790.99023
1115SCGStand2.4386 × 10–42.6054 × 10–42.4232 × 10–40.993410.993410.99588
1525SCGNorm3.4042 × 10–52.7791 × 10–41.6512 × 10–40.956580.960200.95038
Table 6. Samples of the trials of the results of MISO NARX structures for the exhausted temperature (EXT).
Table 6. Samples of the trials of the results of MISO NARX structures for the exhausted temperature (EXT).
Hidden Layer NeuronsTime DelayTraining AlgorithmData FormatPerformance MSERegression
TrainingValidationTestTrainingValidationTest
52LMActual1.8337 × 10–61.3246 × 10–66.1132 × 10–60.999970.999960.99994
2030LMNorm1.8947 × 10–61.5678 × 10–63.8095 × 10–60.999970.999950.99996
52BRActual1.4468 × 10–6 2.7617 × 10–60.99990 0.99979
1525BRNorm4.2333 × 10–6 2.9041 × 10–60.99998 0.99997
2030BRNorm3.3177  × 10–7 1.4889  × 10–61 1
95SCGActual1.1165 × 10–41.2062 × 10–42.1814 × 10–40.992190.990020.99108
1010SCGStand7.8369 × 10–59.0913 × 10–57.1945 × 10–50.994440.992150.99519
1525SCGNorm7.4106 × 10–46.9733 × 10–47.01950 × 10–40.946480.940700.94048
Table 7. The effect of learning rate during attempts towards optimal solution (parallel MISO and MIMO).
Table 7. The effect of learning rate during attempts towards optimal solution (parallel MISO and MIMO).
Parallel MISO CNN MIMO CNN
All Outputs
Normalized EXTNormalized FreqNormalized P
Learning RateAverage
MSE
Learning RateAverage
MSE
Learning RateAverage
MSE
Learning RateAverage MSE
10.100002302910.048632410.100002302910.0623
1 × 10–20.00092817561 × 10–10.04617461 × 10–20.00009207231 × 10–20.0010
1 × 10–30.00007543161 × 10–30.00590331 × 10–30.00005023201 × 10–30.0012
1 × 10–50.00098739121 × 10–60.00527041 × 10–50.00087639001 × 10–50.0142
1 × 10–70.35732162831 × 10–90.20944971 × 10–70.12327042801 × 10–70.1450
3 × 10–10.09879415283  × 10–30.00150803 × 10–10.09579230183 × 10–10.0616
3 × 10–20.00419732114 × 10–30.00186373 × 10–20.00119626013 × 10–20.00412842
3  × 10–30.00004523165 × 10–30.00264393  × 10–30.00003849263  × 10–30.00074576
3 × 10–50.00041281345 × 10–40.00220093 × 10–50.000327903410.01075099
3 × 10–70.00833927326 × 10–30.00350003 × 10–70.00403567211 × 10–20.03599427
Table 8. Final version of the adjustable CNN with the final MSEs (normalized data, 1000 epoch and batch size 32).
Table 8. Final version of the adjustable CNN with the final MSEs (normalized data, 1000 epoch and batch size 32).
Parallel MISO CNNMIMO CNN
CNN Adjustable Design ParameterNormalized EXT PerformanceNormalized Freq PerformanceNormalized Power PerformanceAll Outputs
No. of convolutional layers3233
Filter size for each convolutional layer2222
No. of filters in the convolutional layer64_32_256100_20064_32_256256_32_32
No. of hidden layers1111
No. of neurons in the hidden layer701006470
Max-pooling layers2222
Filter size in each max-pooling layer2222
Final MSE8.6124826  × 10–62.9210  × 10–48.3504346  × 10–61.6581  × 10–4
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Alsarayreh, M.; Mohamed, O.; Matar, M. Modeling a Practical Dual-Fuel Gas Turbine Power Generation System Using Dynamic Neural Network and Deep Learning. Sustainability 2022, 14, 870. https://doi.org/10.3390/su14020870

AMA Style

Alsarayreh M, Mohamed O, Matar M. Modeling a Practical Dual-Fuel Gas Turbine Power Generation System Using Dynamic Neural Network and Deep Learning. Sustainability. 2022; 14(2):870. https://doi.org/10.3390/su14020870

Chicago/Turabian Style

Alsarayreh, Mohammad, Omar Mohamed, and Mustafa Matar. 2022. "Modeling a Practical Dual-Fuel Gas Turbine Power Generation System Using Dynamic Neural Network and Deep Learning" Sustainability 14, no. 2: 870. https://doi.org/10.3390/su14020870

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop