Modeling a Practical Dual-Fuel Gas Turbine Power Generation System Using Dynamic Neural Network and Deep Learning

Alsarayreh, Mohammad; Mohamed, Omar; Matar, Mustafa

doi:10.3390/su14020870

Open AccessArticle

Modeling a Practical Dual-Fuel Gas Turbine Power Generation System Using Dynamic Neural Network and Deep Learning

by

Mohammad Alsarayreh

¹

,

Omar Mohamed

^1,*

and

Mustafa Matar

²

¹

Department of Electrical Engineering, King Abdullah II School of Engineering, Princess Sumaya University for Technology (PSUT), Amman 11941, Jordan

²

Department of Electrical Engineering, Graduate School of Engineering, University of Vermont, Burlington, VT 05405, USA

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(2), 870; https://doi.org/10.3390/su14020870

Submission received: 19 November 2021 / Revised: 25 December 2021 / Accepted: 6 January 2022 / Published: 13 January 2022

(This article belongs to the Section Energy Sustainability)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Accurate simulations of gas turbines’ dynamic performance are essential for improvements in their practical performance and advancements in sustainable energy production. This paper presents models with extremely accurate simulations for a real dual-fuel gas turbine using two state-of-the-art techniques of neural networks: the dynamic neural network and deep neural network. The dynamic neural network has been realized via a nonlinear autoregressive network with exogenous inputs (NARX) artificial neural network (ANN), and the deep neural network has been based on a convolutional neural network (CNN). The outputs selected for simulations are: the output power, the exhausted temperature and the turbine speed or system frequency, whereas the inputs are the natural gas (NG) control valve, the pilot gas control valve and the compressor variables. The data-sets have been prepared in three essential formats for the training and validation of the networks: normalized data, standardized data and SI units’ data. Rigorous effort has been carried out for wide-range trials regarding tweaking the network structures and hyper-parameters, which leads to highly satisfactory results for both models (overall, the minimum recorded MSE in the training of the MISO NARX was 6.2626 × 10⁻⁹ and the maximum MSE that was recorded for the MISO CNN was 2.9210 × 10⁻⁴, for more than 15 h of GT operation). The results have shown a comparable satisfactory performance for both dynamic NARX ANN and the CNN with a slight superiority of NARX. It can be newly argued that the dynamic ANN is better than the deep learning ANN for the time-based performance simulation of gas turbines (GTs).

Keywords:

dual-fuel gas turbines; dynamic modeling; accurate simulation; sustainable production; dynamic neural network; deep learning; convolutional neural network

1. Introduction

1.1. Aims and Motivations

Gas turbines’ power share has increased progressively in the global power generation mix in later decades due to the progress in their design specifications, efficiency and reliability [1,2]. The field of system modeling and identification has facilitated the path towards many notable improvements, including higher cycle efficiencies and a reduced level of emissions; therefore, GT power generation technology has become an unavoidable choice for many developed and developing countries [3,4,5]. It can be more informative to provide an adequate motivation and background for this research before reviewing the literature/The operating principle of dual-fuel GT can be as simple as shown below in Figure 1.

The air is discharged by the compressor (1–2) for more efficient combustion, while in the combustor, the air/fuel blend is fired and burned (2–3). An isentropic process is established as the operation (1–2), whereas operation 2–3 is a constant pressure or isobaric. The combusted gases are taken by the gas turbine as an isentropic operation (3–4), which activates the thermal energy to become a useful mechanical energy in order to rotate the movable part of the synchronous generator. Operation (1–4) describes the exhausted heat that is very likely to be used to feed the heat recovery steam generator (HRSG) associated with a combined cycle gas turbine unit (CCGT), or is unlikely to be discharged to the atmosphere in GTs with open cycles only. The synchronous generator changes the harvested mechanical power into electrical power in order to feed the grid with electrical energy. The combustion chamber is normally supplied by natural gas through the two valves–the pilot valve and the NG control (premix) valve–however, during low loads, startups or instances where there is a shortage in the NG, the combustion has to be stable so the pilot valve is supplied by fuel oil through a booster pump system. These operating modes are known by the operators as premix mode or premix/diffusion dual mode, in which the premix mode is active only in the normal NG operation from 50% to the rated power and the diffusion mode is possible in the entire load range (including startups and shutdowns). However, the details of the combustion process might be complicated and even unnecessary in control-oriented system identification, machine learning and deep learning because they can be precisely driven by the data variables to create the simulation rules. Modeling gas turbines has many generalized aims, which represent the core objective of our research, such as control system orientations, performance monitoring, control, prediction, reducing emissions and grid code compliance. Furthermore, the trustable modeling of gas turbines or engines can lead to significant sustainable development in these generation systems in terms of using other fuels that are formed from other materials and waste, which can be fired in the same device or by a similar principle, such as biogas, leading to promising, sustainable and flexible power generation scenarios [6,7,8,9,10]. The topic of GT power generation modeling is apparently an interdisciplinary research field that can be deeply related to many disciplines during the research phase, but their research outcomes are extremely useful, so they are therefore released beyond the boundaries of every discipline alone. Figure 2 shows the relevant disciplines and possible useful purposes of GT time-based dynamic simulations. The aforementioned purposes are general; however, the very specific objective that distinguishes our work is that the modeling procedure has been made by two of the most recent and state-of-the-art, which are the dynamic neural network and deep learning convolutional neural network (CNN), with a satisfied performance and quantified analysis of the results.

The scientific merit of this article will be discussed in the next subsection, with a discussion of the related literature.

1.2. Related Work and the Paper Contribution

The aspects of a multidisciplinary/interdisciplinary nature can also be deduced from the literature review that will be presented here; for more detailed literature, the reader may refer to the recent critical review written by the corresponding author [5]. The recently published dynamic models of GTs, whether combined with the steam cycle to become CCGT or as a single unit, can be established by physical laws, system identification, artificial intelligence, machine learning or deep learning techniques. The literature will be informative, with an emphasis on modeling via neural networks, machine learning and deep learning methodologies. Asgari et al. (2014, 2016, 2021) [11,12,13] have published NARX-type ANN models to simulate some significant variables in the startup process, which has been used to simulate the behavior of an actual general electric (GE) GT (PG 9351FA GT). The compression ratio has given the maximum error in the simulation, with a RMSE of 2.8% (0.028) and minimum RMSE of 0.0004 in the speed response [11]. The same primary author has extended the work on GT modeling by a recurrent neural network with a single hidden layer, which achieved a comparable RMSE of approximately 0.22% (0.0022) for training and 2.6% (0.026) for testing [14]. Ibrahem (2020) [15] has offered a NARX ANN model for a GT manufactured by Siemens SGT-A65 ADGTE in order to pave the way toward the design of a predictive control strategy. Different neural network structures of ensemble and single MISO NARX were trained and tested. It has been found that the minimum RMSE achieved for the turbine speed during the training phase is 0, but is 0.0022 in the testing phase for one of the spool speeds. Mohamed et al. (2019) [4] have presented the performance of feed-forward (FF) back-propagation ANN (BPNN) in simulating for the purpose of a comparison with a physics-based model subspace system identification model. The minimum error has been given by the FF ANN of 0.05048 in the frequency or speed response. Rashid et al. (2015) [16] have presented a new model for CCGT by training FF ANN via particle swarm optimization (PSO), where the MSE for training is 1.019 × 10⁻⁴ and for testing is 0.0055. Rahmoune et al. (2020) [17] have developed a NARX model to identify the dynamic behavior of the gas turbine components under the influence of the vibration phenomena. The results of the proposed NARX model validated the capability of the NARX NN in determining the dynamic behavior of the gas turbine system, with a simulation MSE of 3.8414 × 10⁻³ for the high pressure (HP) turbine, 1.29152 × 10⁻¹ and 2.12090 × 10⁻⁴ for the gas and air control valves, respectively. In terms of deep learning, Cao et al. (2021) [18] have presented different deep learning techniques that have been used to predict the changes in the efficiency and flow capacity of turbomachinery. The degradation predictions have been established via the LSTM approach, with a high accuracy ranging from 81.65% to 93.65%.

From this review and previous critical reviews [5], it can be readily found that there are no constraints for the achievable accuracy, and therefore more accurate results are probably still attainable. On the other hand, it is unfair to claim the numerical superiority of the accuracy of the proposed models with regard to the published literature because that depends on other factors rather than the NN structure design, such as the difference in on-site data from one GT to another in the literature, which prevents claiming a preference of obtaining accuracy numerically, with differences in data-sets from one research study to another. To the best of the authors’ knowledge, deep learning techniques for modeling GT haven’t yet been studied in detail on the GT time-based dynamic simulation, and it is very interesting to know whether they are comparable, superior or less effective than the dynamic neural network with a shallow structure, especially the NARX ANN.

The convolutional neural network is a well-recognized example of deep learning tools and NARX ANN is a typical and extensively used example of a dynamic neural network; therefore, they are both selected for this study. The scientific contributions of this manuscript are then:

(1) Two accurate methods for simulating Siemens dual-fuel GT have been shown, with an emphasis on the essential variables of the GT. One simulator has been established using dynamic NARX ANN and the other is based on a deep-learning convolutional neural network;

(2) The models’ performances are depicted in MIMO and parallel MISO structures with highly accurate results; as overall indicators, the performances showed that the minimum recorded MSE in the training of the MISO NARX was 6.2626 × 10⁻⁹, and the corresponding testing MSE was 3.4983 × 10⁻⁷. On the other hand, the maximum average MSE was recorded for the MISO CNN as 2.9210 × 10⁻⁴, and both networks worked successfully for more than 15 operating hours of the GT;

(3) It is newly shown that the NARX dynamic ANN was slightly superior in accuracy over the deep neural network, which indicates that the deep learning can be regarded as an alternative, but not substitutional, tool for the simulation of heavy-duty power GTs; in other words, they shall not replace the dynamic ANN, even with shallow architectures. One of the features that makes the NARX ANNs superior is the adoption of past outputs as additional direct inputs in NARX, which increase their overall accuracies. This major advantage has no equivalence in the deep convolutional networks in spite of the variety of their hyper-parameters.

The rest of the paper is organized as follows: Section 2 presents the data preparation of the adopted GT, inputs/outputs selection, normalization, standardization and actual quantities. Section 3 presents the NARX ANN model development, Section 4 presents the CNN model establishment, Section 5 shows the simulations results of both methodologies with a comparison against the real measurements and quantified analysis of the results and, finally, Section 6 concludes the research study and findings with some feasible future trends.

2. Data Curation and Analysis

The utilized datasets for this study have been collected from a real gas turbine generation unit and are provided by the corresponding author. The data-set comprises long-term data that represent 16 h of the GT operation. According to Table 1 and Table 2, the collected datasets have been classified as GT inputs and outputs variables, with the operational range for each variable. As can be seen from the tables, four variables have been identified as the GT’s inputs–the NG valve, the pilot valve, the compressor outlet temperature and the compressor outlet pressure–whereas the remaining three variables, which are the output power, exhausted temperature and the frequency, i.e., speed of the rotor, have been appointed as the outputs of the system.

After defining the input and output parameters from the obtained datasets, the corresponding data have been divided into two groups alternatively, namely training and validation datasets; this will ease the evaluation of the model generalization and prevent over-fitting during the training phase.

The first group of data has been used to train the model, whereas the other group has been applied to evaluate the models’ accuracy, which comprises unseen data, i.e., samples that have not been utilized during the training process. The system formation, including the input and output variables, is shown in Figure 3.

It is worth mentioning that the usual way of considering inputs is to include the compression ratio (CR) as an input instead of both COT and COP; however, these can be equivalent, and the improved accuracy has been notable during the testing phase of the GT.

Standardization and normalization are the most popular rescaling techniques. Both approaches specify the features of the system data within a restricted range rather than a wider range, making it very complex for the model to map inputs to outputs properly. However, both techniques differ in the way they work, and each of them have special use cases. Based on this, the collected data-sets from the GT unit have been pre-processed and rescaled in two main formats aside from the SI units’ data–normalized data and standardized data–in order to train and validate the built networks. This will be more valuable in providing a brief description through these two processes in order to understand how and why the given data are normalized or standardized.

2.1. Data Normalization

This specifies the data between the 0 and 1 range or between the −1 and +1 range. Normalization is required when there is a large difference in ranges of system’s features, furthermore, this scaling approach can be beneficial when the collected data do not follow any distribution, such as a Gaussian distribution. Therefore, this technique can be very useful in the neural networks algorithm, since it does not assume any data distribution. This technique is also known as min-max scaling. Equation (1) presents the mathematical formula for the normalization approach [19,20,21].

x_{n o r m} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}

(1)

where (x_max) and (x_min) are the maximum and minimum values of the input or output feature to the model, respectively. From the above equation, it can be clearly noticed that the range of features for each variable falls between the 0 and 1 range according to the following three scenarios:

When $x$ equals the minimum, then $(x_{n o r m})$ is 0;
On the other hand, when $x$ is the maximum point in the array, then ( $x_{n o r m}$ ) is 1;
However, if $x$ is between the minimum and maximum, then ( $x_{n o r m}$ ) will be between 0 and 1.

2.2. Data Standardization

This is another common rescaling approach that typically rescales the data to be about the mean with a unity deviation or unit variance. This indicates that the mean is zero and that the resulting distribution has a deviation of one. On the other hand, standardization might be useful when the data have a Gaussian distribution; however, this does not have to be the case. Furthermore, contrasting normalization, standardization has no boundary range; as such, if the data contain outliers, standardization will have no effect on them. Equation (2) shows the associated formula with the standardization technique [19,20,21].

x_{s t a n d} = \frac{x - µ}{σ}

(2)

where

µ

is the mean of features and

σ

is a standard deviation of the feature values. It can be noticed from the above equation that the input and output values are not restricted to a particular range.

In conclusion, using the normalization or standardization will ultimately rely upon the type of data and the machine-learning-based technique that will be employed. There is no hard and fast rule that states when the data should be normalized or standardized. Fitting the model by utilizing the actual, normalized and standardized data in order to achieve the best results, and then comparing the performance among these three types of data formatting, can be a powerful criterion in the deployment of the final model of a GT power plant; see Figure 4, which is dedicated toward data curation in this study.

3. The NARX Model Setup

The mathematical expression of the NARX model can be given as [22]

\hat{y} (t) = f [\begin{matrix} u (t), u (t - 1), . . . ., u (t - n_{u}), \dots, \\ y (t - 1), . . . ., y (t - n_{y}) \end{matrix}] + e (t)

(3)

where

y (t)

and

\hat{y} (t)

are the target and predicted output variables, respectively;

u (t)

is the input variable of the network;

n_{u}

and

n_{y}

are the time delays of the input and output variable; and

e (t)

is the model error between the target and prediction. In other words,

y

and

u

are the output and externally determined variables in this equation, respectively.

y (t)

is the next value of the dependent output signal, which is regressed on previous values of the output signal and an independent (exogenous) input signal.

To set up an accurate and reliable NARX model for the GT power plant with an acceptable predictive performance, much like other dynamic neural networks, various architectures may be considered over a wide range of trials [13]. These different architectures are based on several factors, such as the number of inputs and outputs, i.e., the MIMO or MISO structure; training algorithms; the number of hidden layers; the number of neurons in the hidden layer(s); the type of activation functions; the maximum number of epochs, i.e., iterations; the number of recurrent connections; and the time delays in the recurrent connections. In addition, another vital factor has been included in this study, which is the data type, i.e., data format. Figure 5 shows the NARX structure constructed for this study, in which, the tapped delay line (TDL) is employed to feed the network with the past values of inputs and outputs. As can be seen from this figure, the proposed NARX model is composed of four inputs, one hidden layer and three outputs.

Where the variables x₀ to x₄ represent the computer representation of the inputs, and w₀ to w₄ are the connection weights, which will be generalized later in the equations describing the NARX ANN, σ is the sigmoid activation function symbol and S is the linear activation function symbol, Ŷ(t) is the predicted output value. A thorough computer code in the MATLAB programming environment has been developed to set up and configure the NARX models with sophisticated generalization properties. MATLAB is a versatile programming environment that was founded and established by MathWorks for numerical computation in engineering and scientific applications. The generated code includes several hyper-parameters for training and configuring NARX models of a gas turbine generation unit. More precisely, the maximum number of iterations, learning rate, number of hidden layer’s neurons, time delays in the recurrent connections and model structure, i.e., MIMO and MISO configurations, as well as the data type, including normalized, standardized and actual data. All of these have been considered in the developed code as a combination of a variety of settings. Besides, this study employs a feed-forward multilayer dynamic neural network architecture with an input layer, one hidden layer and an output layer with a sigmoid-type transfer function and linear activation function for the output layer. Furthermore, the developed program has been used to train a wide range of NARX topologies, employing three training algorithms in the training step, which are the Levenberg–Marquardt (LM) algorithm, Bayesian regularization algorithm and scaled conjugate algorithm. Eventually, the tweaking of all hyper-parameters, in addition to the training algorithm, results in an indication of the best performance and its relevant NARX model. The mean squared error (MSE), which expresses the average squared error between the network outputs, the default performance function for feed-forward networks can be expressed as [23]:

E = m s e = \frac{1}{N} \sum_{i = 1}^{N} {(e_{i})}^{2} = \frac{1}{N} \sum_{i = 1}^{N} {(y (t) - \hat{y} (t))}^{2}

(4)

The backpropagation technique, which involves executing computations backwards through the network, is used to determine the gradient and the Jacobian. However, it is tough to estimate which training method will be the most efficient for a given situation [23]. It is determined by a variety of parameters, including the problem’s complexity, the quantity of data points in the training set, the number of weights and biases in the network, the error target and whether the network is used for pattern recognition (discriminant analysis) or function approximation (regression) [23]. Therefore, the proposed NARX model of the GT power plant has been trained over a wide range of trials, including the three different optimization algorithms, in order to obtain the best performance and the most applicable NARX network. For more details about the training algorithms, Levenberg–Marquardt (LM) algorithm, Bayesian regularization (BR) algorithm and scaled conjugate gradient (SCG) algorithm, refer to [23]. According to the input variable

u (t)

in Equation (3), the output from the hidden layer at

t

time is computed as [22]:

H {(t)}_{i} = f_{1} [\sum_{j = 0}^{n_{u}} w_{i j} u (t - j) + \sum_{k = 1}^{n_{y}} w_{i k} y (t - k) + a_{i}]

(5)

where

w_{i j}

is the connection weight between the input neuron

u (t - j)

and the

i_{t h}

hidden neuron;

w_{i j}

is the connection weight between the

i_{t h}

hidden neuron and the output feedback delayed loop;

a_{i}

is the bias of the hidden layer neurons;

f_{1} (.)

is the hidden layer transfer function, i.e., activation function [22]. As mentioned before, the sigmoid function has been used in the proposed code as a hidden layer activation function. Equation (6) shows the mathematical expression of the sigmoid function [22]:

f_{1} = \frac{1}{1 + e^{- S}}

(6)

The final NARX prediction value network can eventually be obtained by integrating the hidden layer outputs as given [22].

\hat{y_{l}} (t) = f_{2} [\sum_{i = 1}^{n_{h}} w_{l i} H {(t)}_{i} + b_{l}]

(7)

where

w_{l i}

is the connection weight between the

i_{t h}

hidden neuron and

l_{t h}

estimated output

n_{h}

;

b_{l}

is the bias

l_{t h}

predicted output;

n_{h}

is the number of hidden neurons; and

f_{2} (.)

is the output layer activation function. The mathematical representation of the linear activation function

f_{2}

(.) is presented in Equation (8) [22]:

f_{2} = S

(8)

According to the written code, the early stopping condition for the number of iterations, i.e., epochs, has been set to 1000. The datasets with three formats have been divided into three subsets: the training set (70%) for training the model, the validation set (15%) to confirm that the network is generalized properly and to stop the training step before overfitting and the test set (15%), which is utilized as a totally independent test of network generalization. The divided datasets have been applied to train the open-loop NARX model to guarantee an efficient learning procedure, since the true outputs are available during the training process as discussed before. After determining the optimal open-loop NARX model over a wide range of trials, the optimal open-loop network can then be transformed into a closed-loop mode for multi-step prediction. In this study, there are eighteen NARX architectures based on an open-loop mode with MIMO and parallel MISO structures and with different parameters: the number of hidden layer neurons, the training algorithms, the time delay in the recurrent connection and the data format.

The next subsection explains the MIMO and MISO NARX models.

3.1. The MIMO Model

The model has been evaluated with one hidden layer and various numbers of neurons in the hidden layer and various time delays, as well as different data types. The network has a three-neuron output layer, which means that the output power, frequency and exhausted temperature are three steps ahead. Furthermore, the three learning approaches have been tested, i.e., Levenberg–Marquardt, Bayesian regularization and the scaled conjugate. Due to the very high number of trials, it is infeasible to mention all of them here, but some samples that show the performance MSE and regression parameter R of the resultant MIMO NARX models are tabulated in Table 3, with the best design bolded.

According to the findings shown in Table 3, it can be noticed that the MIMO NARX structure with fifteen hidden layer nodes and a recurrent connection with thirty seconds employing the normalized data format, as well as the Bayesian regularization training algorithm, produced the best results in the test subset with time delay of 30 time samples and hidden neurons of 15 and normalized data format. Furthermore, the best regression coefficient was also found in the same network. The optimum performance and regression of the developed MIMO NARX model with four inputs and three outputs at the time is shown in Figure 6 and Figure 7, respectively. These graphs depict both the mean squared error (MSE) trend for the training and test sets and their regression training coefficient R during the learning procedure.

The decrease in both the training and especially the test sets trends demonstrates that there is no over-fitting in the model. As the performance figure shows, the best training performance was obtained after 503 iterations (epochs), since the minimum gradient was reached, with the MSE averaging 1.0732 × 10^–6. Figure 8 represents the optimal open-loop MIMO NARX model based on fifteen neurons in the hidden layer.

It can be noticed from Figure 8 that the three outputs are fed into the input layer and output layer at the same time. Despite the relatively high performance and regression coefficients of the MIMO NARX network created, dealing with one output at a time is more efficient in the NARX network and will result in a high performance for the time prediction of each output parameter of the GT unit. Therefore, further developments have been carried out on the MATLAB code to create an open-loop MISO NARX model to predict the GT parameters individually. The constructed MISO models and their performance are elaborated in the next section.

3.2. The Parallel MISO Model

The model has been evaluated with one hidden layer and various numbers of neurons in the hidden layer and various time delays, as well as different data types. The network had a one-neuron output layer, which means that the output power, frequency or exhausted temperature were one step ahead at a time in each trial. Furthermore, the three learning approaches have been tested, i.e., Levenberg–Marquardt, Bayesian regularization and the scaled conjugate. Some samples of the trials for establishing the MISO model with the MSE performance and regression coefficients R of the resultant MISO NARX models for each parameter (output power, frequency and exhausted temperature) are tabulated in Table 4, Table 5 and Table 6, respectively.

The computational reasons for the superiority of the BR training algorithm can be argued to be due to the fact that the BR has no earlier stopping criteria, such as those in the LM and SCG algorithms. In addition, the normalized data are much better handled by the NARX ANN than the actual and standardized because of the harmony in the upper and lower limits of all outputs of the GT in normalized values, and the given set of data is a time-based measurement record of the data, which doesn’t belong to the class of data that embeds a Gaussian distribution. From Table 6 above, the optimal average MSE and the regression coefficient of the three GT’s parameters have been found in the twenty hidden layer neurons structure, with a 30 sample time delay employing a normalized data type and Bayesian training algorithm. The optimal training performance with an average MSE of 8.46 × 10^–7 was obtained after 1000 iterations (epochs), since the maximum epochs number was reached. Furthermore, the best regression coefficient was also found in the same NARX network. Figure 9, Figure 10, Figure 11 and Figure 12 show the performance and the regression plot of each developed MISO NARX model that is based on four inputs and one output at a time for the three output variables. These figures illustrate both the mean squared error (MSE) trend for the training and test sets and their regression training coefficient R during the learning procedure.

The decrease in the MSE trend demonstrates that there is no overfitting in the proposed MISO NARX model. The regression plots demonstrate that the model achieved the optimum fits, since the datasets lie against the line at which all of the outputs are on par with the targets. Figure 13 represents the optimal open-loop MISO NARX model with twenty neurons in the hidden layer. It can be noticed from the figure above that there is one output fed into the input layer and output layer at the same time.

4. The Deep Learning Convolutional Neural Network (CNN) Model Setup

In this section, it will be valuable to elaborate on what the gradient descent algorithm that is used for training the CNN is and how this technique works in order to justify the GT data curation. It is an optimization technique that is utilized when training a neural network model based on the convex function [19]. The gradient descent tweaks the network parameters of the CNN model parameters to attain the minimum cost function of the given model. This function quantifies the performance of the model by computing the error between the predictions and the actual data values, then represents it in a single real number form. In other words, gradient descent is a paramount technique in machine learning models that offers the determination of the function’s coefficients that minimize a cost function as much as feasible; more details regarding the gradient descent algorithm can be found in the powerful Coursera course [20]. In machine learning and deep learning terms, the gradient descent can be assumed as a derivative of a function with more than one input [19]. The mathematical translation of the gradient descent technique is as follows [21]:

ω_{j + 1} = ω_{j} - α \frac{1}{m} \sum_{i = 1}^{\infty} (h_{ω} (x^{(i)} - y^{(i)}) x_{j}^{(i)}

(9)

This equation will be adjusted through the weight values until reaching the convergence (i.e., the minimum value of cost function), where:

ω_{j + 1}

: iterated weight value;

ω_{j}

: previous weight value;

α

: learning rate;

m

: number of training samples;

h_{ω}

: hypothesis;

x^{(i)}

: the

i_{t h}

training example;

y^{(i)}

: the corresponding predicted

i_{t h}

example;

x_{j}^{(i)}

: the

j_{t h}

feature in a given training example.

As can be seen in Equation (9), the cost function is firstly based on the initial value of the weight vector. These weights are adjusted iteratively using the gradient descent method over given data-sets in order to minimize the cost function of the generated model. From the aforementioned basics, it is clear that the presence of variable (x), which represents the input variables that are fed into the model will influence the gradient descent step size. Moreover, as mentioned before, the datasets that are used for the proposed models have been drawn from a practical GT generation unit, that, in turn, means that the system’s variables have a highly dynamic distribution. Therefore, the input and output datasets to the NARX- and CNN-based GT-model may differ greatly in scale, range and distribution for each variable; for example, the deviation among the output power values and exhausted temperature values is slightly larger than the change in the frequency instances. Differences in scales among the model’s parameters may exacerbate the difficulty of the modeled problem [19]. Some of the large input and output values may result in a model that learns large weight values. A model with large weight values is frequently unstable, which means that it may perform poorly during the learning phase and may be delicate to input values, resulting in an increased mean squared error, i.e., generalization error. Therefore, there is a need to apply a features-rescaling technique to the GT’s variables in the step of data pre-processing. Data pre-processing guarantees that the gradient descent of the model heads smoothly towards the minimum error and that the gradient descent steps are updated at the same rate for all parameters. Having the features of the data on a similar scale makes all input and output variables of a GT power plant equally important and easier to compile by the NARX and CNN model [21]. The convolutional neural network (CNN) is one of the most popular deep neural networks [24]. CNN usually comprises various layers, such as convolutional layers, pooling layers, fully connected layers, i.e., dense layers, etc. Figure 14 represents a typical example of CNN architecture.

According to the figure above, the first type of layer, which is called a convolutional layer, consists of filters and feature maps. The input to this filter is known as the receptive field [25], and it has a defined size. Each filter is pushed across the previous layer, producing an output that is collected in the feature map. In other words, the CNN’s convolutional layer adjusts the local perception and weight sharing, which consequently improves their ability to extract the significant features [25,26]. It will be informative to mention that the used GT’s datasets belong to a one-dimension distribution; thus, the corresponding convolutional layer that deals with the given datasets will be a 1D convolutional layer. One-dimensional-CNN accomplishes convergence across the local area of input parameters to generate the appropriate feature. Each kernel, i.e., filter, has unique characteristics on the feature map in all locations. Since the 1D-CNN utilizes the weight-sharing approach as mentioned before, fewer parameters need to be converged with the 1D-CNN than with conventional neural networks [26]. This ensures that the 1D-CNN converges earlier and faster. An example of a 1D convolutional operation is illustrated in Figure 15. Regarding the kernel’s size, it is set to 2, which means that all weights (

w_{1}, w_{2})

will be shared by every step of the input layer (

x_{1}, x_{2}, \dots, x_{n})

and the output

(y_{1}, y_{2}, \dots, y_{n})

. In the kernel window (

w_{1}, w_{2})

, which represents the filter size, the input values are multiplied by the weights and then the values are summed up in order to compute the value of the features map. In the shown example, the value of

y_{2}

is obtained from

y_{2} = (w_{1} x_{1} + w_{2} x_{2}

) [26]. The output of the convolution layer is provided as both the output and the input of the following layer. It also represents the features derived from training samples using the convolution kernel.

In order to obtain one-dimensional features, 1D-CNN performs input signal convolution operations in the local area, and various kernels extract certain features from the inputs. As illustrated in Figure 15, each kernel recognizes certain characteristics in any location on the input features map, and weight-sharing is performed on the same input feature map. This mechanism minimizes the number of parameters during training. The mathematical formula if the

L_{i}

is a 1D convolutional layer can be generally expressed in Equation (10) [26]:

x_{j}^{l} = f (b_{j}^{l} + \sum_{i = 1}^{M} x_{i}^{l - 1} * k_{i j}^{l})

(10)

where

k

denotes the number of convolution kernels,

j

is the kernel size and

M

refers to the channel input number

x_{i}^{l - 1}

. The kernel bias is indicated by

b

, where the symbol (*) is the convolution operator.

f (c .)

represents the non-linear activation function. CNNs usually utilize the rectified linear unit (ReLU), i.e.,

f (x) = \max (0, x)

, as an activation function [24]. Pooling layers are paramount for CNNs. Pooling methods can be thought of as down-samples used to minimize the parameter number while maintaining the major features in order to speed up the next stage, since there are more feature maps in the downstream sampling phase, resulting in an increased data dimensionality, which makes calculations too complex [27]. Figure 16 illustrates some max-pooling operation used in this study.

The learning rate determines how fast or slow we will move towards the optimal weights. If the learning rate is very large, we will skip the optimal solution; if it is too small, we will need too many iterations to converge to the best values. Therefore, using a good learning rate is crucial.

Table 7 and Table 8 represent some important samples of the CNN-developing attempts in MIMO and MISO structures, with the best choices bolded. Other design parameters, such as number of filters, filter size, number of hidden layers, number of neurons in the hidden layer, number of convolutional layers and number of filters in the convolutional layer, are tuned widely with implication of 3 × 10^–3 learning rate. We shall avoid confusion in mentioning all trials that have been made, so we have mentioned only the ultimate parameters that have attained the lowest possible MSE. Then, the final CNN architectures with MIMO and MISO structures, and final optimal MSE value for each, are mentioned in Table 7.

5. Time-Based Simulation Results and Discussion

This section depicts the simulation results of the two approaches and their architectures (Figure 17, Figure 18, Figure 19, Figure 20, Figure 21 and Figure 22). From the results and the previously corresponding tabulated MSEs, it is evidenced that the deep CNN and the dynamic NARX ANN have shown a satisfactory performance in their application to heavy-duty dual-fuel GTs. They can be used for short-term or long-term predictions, controller upgrading, performance monitoring during measurement device malfunctioning, fuel requirements for the demand, GT characteristics with different fuels and so on.

The trends are followed successfully by both techniques for the power (ranges: 0–1 normalized and 124.89–241.57 MW actual power range of load-down, then load-up), with very negligible errors (minimum MSE 6.2626 × 10^–9 and maximum MSE of 2.9210 × 10^–4) for the adopted long operation time of the GT (more than 15 h of continuous operation), which indicate the robustness of deep learning and shallow dynamic ANNs. Such accuracies in the responses of GTs are difficult to attain by physics-informed or other system identification techniques because the power plant noises and uncertainties are high, and increasingly vary with the changes in the operating conditions. In addition, the differences in the nature of the responses make the simulation far more challenging; for instance, the power variations appear to be slower than the changes in the temperature and frequency, whereas the later responses change more severely, which makes the problem computationally over-complicated for the models to track all these variation trends simultaneously. Nevertheless, the proposed techniques in this paper have easily handled such computational burdens and prediction capabilities for a longer time than what has been previously published, which covers more than 15 h (or more than 54,000 sec) of operation.

It can also be seen that the NARX ANN has shown a slight superiority in the error values and also in zooming the results for both structures (parallel MISO and MIMO); this could be due to the following reasons:

Its simplified structure that implicates the direct effect of inputs and outputs; therefore, there are more realistic reflections of the inputs on the outputs;
The use of feedback delayed outputs as additional inputs, which increase the number of inputs utilized to depict the output more accurately. This important feature has no equivalence in CNN, despite its sophistication in the variety and number of its layers.

It can be generally deduced that the dynamic ANN, even if recognized as a shallow ANN with a single hidden layer, is still a leading choice for the modeling and simulation of GTs, which have negligible simulation errors and a high simulation performance of the variation trends of GT power plants. For other different successful applications of CNN and NARX ANNs, rather than time-based simulations, the reader may refer to the references [24,25,26,27,28].

6. Conclusions

Based on the most recent proposed future trends, simulated models of deep CNN and dynamic NARX ANN have been presented with extremely accurate results, which confirm the scientific merits of deep learning and shallow dynamic ANNs for the emulation of the GT power station performance. Some paper findings are below:

It is generally highly recommended to normalize the data of GTs rather than dealing with actual quantities in using ANNs in models;
The training algorithm of BR outperforms other training algorithms because of its late ultimate termination criteria, unlike other aforementioned earlier ones (LM and SCG);
The prediction capabilities of NARX ANN and CNN for the GTs time-based dynamic performance are satisfactory, with very small negligible errors for both techniques.

Based on the aforementioned points, the paper’s goals have been generally achieved. Further goals are important to mention based on the observation and investigation of the results:

There was a slight superiority of the dynamic NARX type in terms of its accuracy. A new conclusion can be suggested by stating that the main computational reason, which is the feedback delay element in NARX despite the shallow structure, is capable of providing additional information with other direct inputs in order to improve the accuracy over the deep CNN, in which there is no delay feedback element;
Based on the aforementioned results, deep learning can act as an alternative choice of modeling GTs in real applications, but cannot be a substitutional tool for the shallow dynamic ANN. This is because both have shown successful performances and can be used reliably in real applications;
Despite the achieved targets of the paper, there are still some deep learning techniques that have not been investigated in the literature; these techniques might have a comparable performance, and this motivates the mentioning of some future research opportunities;
One of the clearer future trends is to use other deep learning techniques and to compare them appropriately with developed/published ones. This may include the advanced deep recurrent neural network and locally connected neural networks;
Another possible future outcome is to include the fuel preparation system, especially for biogas firing for such turbines, and the process of (gasification/digestion), in order to quantify the amount of materials used to be converted to biogas and to link those with an enhanced control strategy with new objectives;
Another feasible future point is designing a supervisory controller for the developed ANN models and applying it to regulate the diffusion and premix modes, together with the objectives of a higher efficiency and lower emissions. A comparative study with other modeling philosophies may be useful, such as physics-based models and other black-box and grey-box models, with emphasis on many performance criteria rather than the mere numeric value of the accuracies.

Author Contributions

Conceptualization, M.A. and O.M.; methodology, M.A. and O.M.; software, M.A., O.M. and M.M.; validation, M.A. and O.M.; formal analysis, M.A. and O.M; investigation. O.M.; resources, M.A., O.M. and M.M.; data curation, M.A. and O.M; writing–original draft preparation, M.A. and O.M.; writing–review and editing, M.A. and O.M.; visualization, M.A. and O.M.; supervision, O.M.; project administration, O.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the preference of its availability upon request.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

ADGTE	Aero-derivative gas turbine engine
ANNs	Artificial neural networks
BPNN	Back-propagation Neural Network
BR	Bayesian regularization
CCGT	Combined cycle gas turbine unit
CR	Compression ratio
COP	Compressor outlet pressure
COT	Compressor outlet temperature
CNN	Convolutional neural network
EXT	Exhausted temperature
FF	Feed-forward
Freq	Frequency
GT	Gas turbine
GE	General electric
HRSG	Heat recovery steam generator
LM	Levenberg-–Marquardt
LSTM	Long short term memory
MSE	Mean squared error
MIMO	Multi-input multi-output
MISO	Multi-input single-output
NG	Natural gas
NGV	Natural gas control valve position
NARX	Nonlinear autoregressive network with exogenous inputs
Norm	Normalized data
1D	One-dimension
P	Output power
PSO	Particle swarm optimization
PILTV	Pilot gas valve position
R	Regression parameter
RMSE	Root mean squared error
SCG	Scaled conjugate
Stand	Standardized data
TDL	Tapped delay line

References

Boyce, M.P. Gas Turbine Engineering Handbook; Elsevier: Amsterdam, The Netherlands, 2011. [Google Scholar]
Rayaprolu, K. Boilers for Power and Process; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar]
Mohamed, O.; Wang, J.; Khalil, A.; Limhabrash, M. Predictive control strategy of a gas turbine for improvement of combined cycle power plant dynamic performance and efficiency. SpringerPlus 2016, 5, 501. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Mohamed, O.; Za’ter, M. Comparative Study Between Three Modeling Approaches for a Gas Turbine Power Generation System. Arab. J. Sci. Eng. 2020, 45, 1803–1820. [Google Scholar] [CrossRef]
Mohamed, O.K. Progress in Modeling and Control of Gas Turbine Power Generation Systems: A Survey. Energies 2020, 13, 2358. [Google Scholar] [CrossRef]
Thangavelu, S.K.; Arthanarisamy, M. Experimental investigation on engine performance, emission, and combustion characteristics of a DI CI engine using tyre pyrolysis oil and diesel blends doped with nanoparticles. Environ. Prog. Sustain. Energy 2020, 39, e13321. [Google Scholar] [CrossRef]
Murugesan, A.; Umarani, C.; Subramanian, R.; Nedunchezhian, N. Bio-diesel as an alternative fuel for diesel engines–A review. Renew. Sustain. Energy Rev. 2009, 13, 653–662. [Google Scholar] [CrossRef]
Yaqoob, H.; Teoh, Y.H.; Ud Din, Z.; Sabah, N.U.; Jamil, M.A.; Mujtaba, M.A.; Abid, A. The potential of sustainable biogas production from biomass waste for power generation in Pakistan. J. Clean. Prod. 2021, 307, 127250. [Google Scholar] [CrossRef]
Teoh, Y.H.; How, H.G.; Sher, F.; Le, T.D.; Nguyen, H.T.; Yaqoob, H. Fuel Injection Responses and Particulate Emissions of a CRDI Engine Fueled with Cocos nucifera Biodiesel. Sustainability 2021, 13, 4930. [Google Scholar] [CrossRef]
Ayanoglu, A.; Yumrutas, R. Production of gasoline and diesel like fuels from waste tire oil by using catalytic pyrolysis. Energy 2016, 103, 456–468. [Google Scholar] [CrossRef]
Asgari, H. Modelling, Simulation and Control of Gas Turbines Using Artificial Neural Networks. Ph.D. Thesis, University of Canterbury, Christchurch, New Zealand.
Asgari, H.; Chen, X.; Morini, M.; Pinelli, M.; Sainudiin, R.; Rugerro Spina, P.; Venturini, M. NARX models for simulation of the start-up operation of a single-shaft gas turbine. Appl. Therm. Eng 2016, 93, 368–376. [Google Scholar] [CrossRef]
Asgari, H.; Ory, E. Prediction of Dynamic Behavior of a Single Shaft Gas Turbine Using NARX Models. In Proceedings of the ASME Turbo Expo 2021: Turbomachinery Technical Conference and Exposition. Volume 6: Ceramics and Ceramic Composites; Coal, Biomass, Hydrogen, and Alternative Fuels; Microturbines, Turbochargers, and Small Turbomachines, Virtual, Online, 7–11 June 2021. V006T19A007. ASME. [Google Scholar]
Asgari, H.; Ory, E.; Lappalainen, J. Recurrent Neural Network Based Simulation of a Single Shaft Gas Turbine. In Finland Linköping Electronic Conference Proceedings, Proceedings of The 61st SIMS Conference on Simulation and Modelling SIMS, Virtual Conference, 22–24 September 2020; LiU Electronic Press: Linköping, Finland, 2020; Volume 176, pp. 99–106. [Google Scholar] [CrossRef]
Ibrahem, I.M.A. A Nonlinear Neural Network-Based Model Predictive Control for Industrial Gas Turbine. Ph.D. Thesis, Université Du Québec, Quebec, QC, Canada, 2020. [Google Scholar]
Rashid, M.; Kamal, K.; Zafar, T.; Sheikh, A.; Shah, A.; Mathavan, S. Energy prediction of a combined cycle power plant using a particle swarm optimization trained feed-forward neural network. In Proceedings of the IEEE International Conference on Mechanical Engineering Automation and Control Systems, Tomsk, Russia, 1–4 December 2015; pp. 1–5. [Google Scholar]
Rahmoune, M.B.; Hafaifa, A.; Kouzou, A.; Chen, X.; Chaibet, A. Gas turbine monitoring using neural network dynamic nonlinear autoregressive with external exogenous input modelling. Math. Comput. Simul. 2021, 179, 23–47. [Google Scholar] [CrossRef]
Cao, Q.; Chen, S.; Zheng, Y.; Ding, Y.; Tang, Y.; Huang, Q.; Wang, K.; Xiang, W. Classification and prediction of gas turbine gas path degradation based on deep neural networks. Int. J. Energy Res. 2021, 45, 10513–10526. [Google Scholar] [CrossRef]
Bulit in’s Expert Contributor Network, Gradient Descent: An Introduction to 1 of Machine Learning’s Most Popular Algorithms, Built in Beta. 2021. Available online: https://builtin.com/data-science/gradient-descent. (accessed on 21 October 2021).
Ng, A.; Bensouda Mourri, Y.; Katanforoosh, K. DeepLearning.AI, Coursera. 2020. Available online: https://www.coursera.org/learn/deep-neural-network (accessed on 27 December 2021).
Bhandari, A. Feature Scaling for Machine Learning: Understanding the Difference Between Normalization vs. Standardization. Analytics Vidhya. 2020. Available online: https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/ (accessed on 27 December 2021).
Liu, Q.; Wei, C.; Huosheng, H.; Quingyuan, Z.; Zhixiang, X. An Optimal NARX Neural Network Identification Model for a Magnetorheological Damper With Force-Distortion Behavior. Front. Mater. 2020, 7, 10. [Google Scholar] [CrossRef]
Markova, M. Foreign Exchange Rate Forecasting by Artificial Neural Networks. AIP Conf. Proc. 2019, 2164, 060010. [Google Scholar] [CrossRef]
Bai, M.; Yang, X.; Liu, J.; Liu, J.; Yu, D. Convolutional neural network-based deep transfer learning for fault detection of gas turbine combustion chambers. Appl. Energy. 2021, 302, 117509. [Google Scholar] [CrossRef]
Wunsch, A.; Liesch, T.; Broda, S. Groundwater Level Forecasting with Artificial Neural Networks: A Comparison of LSTM, CNN and NARX. Hydrol. Earth Syst. Sci. 2021, 25, 1671–1687. [Google Scholar] [CrossRef]
Ragab, M.G.; Abdulkadir, S.J.; Aziz, N.; Al-Tashi, Q.; Alyousifi, Y.; Alhussian, H.; Alqushaibi, A. A Novel One-Dimensional CNN with Exponential Adaptive Gradients for Air Pollution Index Prediction. Sustainability 2020, 12, 10090. [Google Scholar] [CrossRef]
Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wang, S.; Chen, H. A novel deep learning method for the classification of power quality disturbances using deep convolutional neural network. Appl. Energy 2019, 235, 1126–1140. [Google Scholar] [CrossRef]

Figure 1. (a) General Subsystems of dual-fuel GT; (b) The T-S (temperature–entropy) diagram.

Figure 2. The interdisciplinary nature of the research topic of GT modeling and simulation.

Figure 3. Input and output parameters of the model developed in this study.

Figure 4. The flowchart that summarizes data curation in this study.

Figure 5. The structure of NARX model developed for the gas turbine generation unit.

Figure 6. Performance plot of the developed MIMO NARX model designed for the gas turbine power plant.

Figure 7. Regression plot for Bayesian regularization algorithm used for proposed MIMO model.

Figure 8. The open-loop MIMO NARX model for the gas turbine generation unit.

Figure 9. Performance plot of the optimal MISO NARX model designed for the output power.

Figure 10. Performance plot of the optimal MISO NARX structure for the system freq.

Figure 11. Performance plot of the MISO NARX structure for the EXT.

Figure 12. Regression plot for Bayesian regularization algorithm used for proposed MISO model.

Figure 13. The open-loop MISO NARX model of the gas turbine generation unit.

Figure 14. Typical example of CNN architecture with typical numbers of the filters and layers.

Figure 15. One-dimensional convolution operation of the CNN in this study.

Figure 16. Max-pooling operation for the adopted CNN in this study.

Figure 17. Normalized exhausted gas temperature (MIMO structure performance).

Figure 18. Normalized frequency or turbine speed (MIMO structure performance).

Figure 19. Normalized power (MIMO structure performance).

Figure 20. Normalized exhausted temperature (MISO structure performance).

Figure 21. Normalized frequency or turbine speed (MISO structure performance).

Figure 22. Normalized power (MISO structure performance).

Table 1. GT input variables.

Variable	Abbreviation	Unit	Actual Operational Range
Pilot gas valve position Natural gas control valve position Compressor outlet pressure Compressor outlet temperature	PILTV NGV COP COT	% % bar °C	[41.06–44.79%] [27.18–39.45%] [11.45–16.75] [366.5–439.90]

Table 2. GT output parameters.

Variable	Abbreviation	Unit	Actual Operational Range
Output power Frequency Exhausted temperature	P Freq EXT	MW Hz $℃$	[241.57–124.89] [49.91–50.14] [558.72–559.47]

Table 3. Samples of results of MIMO NARX structures for three system outputs (P, Freq and EXT).

Hidden Layer Neurons	Time Delay	Training Algorithm	Data Format	Performance Average MSE			Regression
Hidden Layer Neurons	Time Delay	Training Algorithm	Data Format	Training	Validation	Test	Training	Validation	Test
5	2	LM	Actual	3.4998 × 10^–6	4.3858 × 10^–6	6.1132 × 10^–6	0.99997	0.99996	0.99994
11	15	LM	Stand	3.1147 × 10^–6	4.2107 × 10^–6	8.6999 × 10^–6	0.99998	0.99996	0.99991
20	20	LM	Norm	3.4393 × 10^–6	4.5559 × 10^–6	3.8095 × 10^–6	0.99997	0.99995	0.99996
11	15	BR	Stand	2.3191 × 10^–6		6.050 × 10^–6	0.99998		0.99994
15	30	BR	Norm	1.0732 × 10^–6		3.2062 × 10^–6	0.99998		0.99997
20	20	BR	Norm	2.7990 × 10^–6		3.2469 × 10^–6	0.99997		0.99996
9	5	SCG	Actual	4.5996 × 10^–5	4.4098 × 10^–5	3.1814 × 10^–5	0.99993	0.99993	0.99993
15	15	SCG	Norm	1.4164 × 10^–4	1.6761 × 10^–4	2.6512 × 10^–4	0.99992	0.99994	0.99993

Table 4. Samples of the trials of the results of MISO NARX structures for the output power P (MW).

Hidden Layer Neurons	Time Delay	Training Algorithm	Data Format	Performance MSE			Regression
Hidden Layer Neurons	Time Delay	Training Algorithm	Data Format	Training	Validation	Test	Training	Validation	Test
9	5	LM	Actual	1.9266 × 10^–7	2.0124 × 10^–7	3.2844 × 10^–6	0.99997	0.99996	0.99997
11	15	LM	Stand	2.8425 × 10^–7	4.3205 × 10^–7	3.4621 × 10^–6	0.99996	0.99997	0.99991
20	30	LM	Norm	2.6179 × 10^–7	3.0105 × 10^–7	1.2145 × 10^–6	0.99996	0.99997	0.99996
9	5	BR	Actual	2.5613 × 10^–8		6.7436 × 10^–6	0.99996		0.99989
11	15	BR	Stand	1.7642 × 10^–8		3.3149 × 10^–7	1		1
15	25	BR	Norm	1.4258 × 10^–8		1.4642 × 10^–7	1		1
20	30	BR	Norm	6.2626 × 10^–9		3.4983 × 10^–7	1		1
9	5	SCG	Actual	4.4996 × 10^–5	4.2032 × 10^–5	2.3214 × 10^–5	0.99272	0.99097	0.99505
11	15	SCG	Stand	1.5093 × 10^–4	2.6054 × 10^–4	2.4232 × 10^–4	0.99562	0.99341	0.99598
15	25	SCG	Norm	1.3164 × 10^–4	2.7791 × 10^–4	1.6512 × 10^–4	0.95759	0.94021	0.95058

Table 5. Samples of the trials of the results of MISO NARX structures for the system frequency (Freq).

Hidden Layer Neurons	Time Delay	Training Algorithm	Data Format	Performance MSE			Regression
Hidden Layer Neurons	Time Delay	Training Algorithm	Data Format	Training	Validation	Test	Training	Validation	Test
9	5	LM	Actual	6.0188 × 10^–6	2.0124 × 10^–7	3.2844 × 10^–6	0.99997	0.99996	0.99997
11	15	LM	Stand	7.5393 × 10^–6	4.3205 × 10^–7	3.4621 × 10^–6	0.99996	0.99997	0.99991
20	30	LM	Norm	8.4340 × 10^–6	3.0105 × 10^–7	1.2145 × 10^–6	0.99996	0.99997	0.99996
9	5	BR	Actual	3.7643 × 10^–6		6.7436 × 10^–6	0.99996		0.99989
11	15	BR	Stand	3.7362 × 10^–6		3.3149 × 10^–7	1		1
15	25	BR	Norm	1.5820 × 10^–6		1.4642 × 10^–7	1		1
20	30	BR	Norm	2.1999 × 10^–6		3.4983 × 10^–7	1		1
5	2	SCG	Actual	2.4112 × 10^–4	3.1053 × 10^–4	2.2242 × 10^–4	0.99332	0.99379	0.99023
11	15	SCG	Stand	2.4386 × 10^–4	2.6054 × 10^–4	2.4232 × 10^–4	0.99341	0.99341	0.99588
15	25	SCG	Norm	3.4042 × 10^–5	2.7791 × 10^–4	1.6512 × 10^–4	0.95658	0.96020	0.95038

Table 6. Samples of the trials of the results of MISO NARX structures for the exhausted temperature (EXT).

Hidden Layer Neurons	Time Delay	Training Algorithm	Data Format	Performance MSE			Regression
Hidden Layer Neurons	Time Delay	Training Algorithm	Data Format	Training	Validation	Test	Training	Validation	Test
5	2	LM	Actual	1.8337 × 10^–6	1.3246 × 10^–6	6.1132 × 10^–6	0.99997	0.99996	0.99994
20	30	LM	Norm	1.8947 × 10^–6	1.5678 × 10^–6	3.8095 × 10^–6	0.99997	0.99995	0.99996
5	2	BR	Actual	1.4468 × 10^–6		2.7617 × 10^–6	0.99990		0.99979
15	25	BR	Norm	4.2333 × 10^–6		2.9041 × 10^–6	0.99998		0.99997
20	30	BR	Norm	3.3177 × 10^–7		1.4889 × 10^–6	1		1
9	5	SCG	Actual	1.1165 × 10^–4	1.2062 × 10^–4	2.1814 × 10^–4	0.99219	0.99002	0.99108
10	10	SCG	Stand	7.8369 × 10^–5	9.0913 × 10^–5	7.1945 × 10^–5	0.99444	0.99215	0.99519
15	25	SCG	Norm	7.4106 × 10^–4	6.9733 × 10^–4	7.01950 × 10^–4	0.94648	0.94070	0.94048

Table 7. The effect of learning rate during attempts towards optimal solution (parallel MISO and MIMO).

		Parallel MISO CNN				MIMO CNN All Outputs
Normalized EXT		Normalized Freq		Normalized P		MIMO CNN All Outputs
Learning Rate	Average MSE	Learning Rate	Average MSE	Learning Rate	Average MSE	Learning Rate	Average MSE
1	0.1000023029	1	0.0486324	1	0.1000023029	1	0.0623
1 × 10^–2	0.0009281756	1 × 10^–1	0.0461746	1 × 10^–2	0.0000920723	1 × 10^–2	0.0010
1 × 10^–3	0.0000754316	1 × 10^–3	0.0059033	1 × 10^–3	0.0000502320	1 × 10^–3	0.0012
1 × 10^–5	0.0009873912	1 × 10^–6	0.0052704	1 × 10^–5	0.0008763900	1 × 10^–5	0.0142
1 × 10^–7	0.3573216283	1 × 10^–9	0.2094497	1 × 10^–7	0.1232704280	1 × 10^–7	0.1450
3 × 10^–1	0.0987941528	3 × 10^–3	0.0015080	3 × 10^–1	0.0957923018	3 × 10^–1	0.0616
3 × 10^–2	0.0041973211	4 × 10^–3	0.0018637	3 × 10^–2	0.0011962601	3 × 10^–2	0.00412842
3 × 10^–3	0.0000452316	5 × 10^–3	0.0026439	3 × 10^–3	0.0000384926	3 × 10^–3	0.00074576
3 × 10^–5	0.0004128134	5 × 10^–4	0.0022009	3 × 10^–5	0.0003279034	1	0.01075099
3 × 10^–7	0.0083392732	6 × 10^–3	0.0035000	3 × 10^–7	0.0040356721	1 × 10^–2	0.03599427

Table 8. Final version of the adjustable CNN with the final MSEs (normalized data, 1000 epoch and batch size 32).

	Parallel MISO CNN			MIMO CNN
CNN Adjustable Design Parameter	Normalized EXT Performance	Normalized Freq Performance	Normalized Power Performance	All Outputs
No. of convolutional layers	3	2	3	3
Filter size for each convolutional layer	2	2	2	2
No. of filters in the convolutional layer	64_32_256	100_200	64_32_256	256_32_32
No. of hidden layers	1	1	1	1
No. of neurons in the hidden layer	70	100	64	70
Max-pooling layers	2	2	2	2
Filter size in each max-pooling layer	2	2	2	2
Final MSE	8.6124826 × 10^–6	2.9210 × 10^–4	8.3504346 × 10^–6	1.6581 × 10^–4

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alsarayreh, M.; Mohamed, O.; Matar, M. Modeling a Practical Dual-Fuel Gas Turbine Power Generation System Using Dynamic Neural Network and Deep Learning. Sustainability 2022, 14, 870. https://doi.org/10.3390/su14020870

AMA Style

Alsarayreh M, Mohamed O, Matar M. Modeling a Practical Dual-Fuel Gas Turbine Power Generation System Using Dynamic Neural Network and Deep Learning. Sustainability. 2022; 14(2):870. https://doi.org/10.3390/su14020870

Chicago/Turabian Style

Alsarayreh, Mohammad, Omar Mohamed, and Mustafa Matar. 2022. "Modeling a Practical Dual-Fuel Gas Turbine Power Generation System Using Dynamic Neural Network and Deep Learning" Sustainability 14, no. 2: 870. https://doi.org/10.3390/su14020870

APA Style

Alsarayreh, M., Mohamed, O., & Matar, M. (2022). Modeling a Practical Dual-Fuel Gas Turbine Power Generation System Using Dynamic Neural Network and Deep Learning. Sustainability, 14(2), 870. https://doi.org/10.3390/su14020870

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling a Practical Dual-Fuel Gas Turbine Power Generation System Using Dynamic Neural Network and Deep Learning

Abstract

1. Introduction

1.1. Aims and Motivations

1.2. Related Work and the Paper Contribution

2. Data Curation and Analysis

2.1. Data Normalization

2.2. Data Standardization

3. The NARX Model Setup

3.1. The MIMO Model

3.2. The Parallel MISO Model

4. The Deep Learning Convolutional Neural Network (CNN) Model Setup

5. Time-Based Simulation Results and Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI