Deep-Learning Model Selection and Parameter Estimation from a Wind Power Farm in Taiwan †

: Deep learning networks (DLNs) use multilayer neural networks for multiclass classiﬁcation that exhibit better results in wind-power forecasting applications. However, improving the training process using proper parameter hyperisations and techniques, such as regularisation and Adam-based optimisation, remains a challenge in the design of DLNs for processing time-series data. Moreover, the most appropriate parameter for the DLN model is to solve the wind-power forecasting problem by considering the excess training algorithms, such as the optimiser, activation function, batch size, and dropout. Reinforcement learning (RN) schemes constitute a smart approach to explore the proper initial parameters for the developed DLN model, considering a balance between exploration and exploitation processes. Therefore, the present study focuses on determining the proper hyperparameters for DLN models using a Q-learning scheme for four developed models. To verify the effectiveness of the developed temporal convolution network (TCN) models, experiments with ﬁve different sets of initial parameters for the TCN model were determined by the output results of Q-learning computation. The experimental results showed that the TCN accuracy for 168 h wind power prediction reached a mean absolute percentage error of 1.41%. In evaluating the effectiveness of selection of hyperparameters for the proposed model, the performance of four DLN-based prediction models for power forecasting—TCN, long short-term memory (LSTM), recurrent neural network (RNN), and gated recurrence unit (GRU) models—were compared. The overall detection accuracy of the TCN model exhibited higher prediction accuracy compared to canonical recurrent networks (i.e., the GRU, LSTM, and RNN models).


Introduction
Wind power forecasting techniques refer to estimates of the expected electrical output from one turbine or many wind turbines of a wind farm in the near future, up to one year. The prediction method for wind power forecasting involves performing model pre-training and power prediction by collecting meteorological information and wind power generation historical data. Meteorological historical data include temperature, humidity, average wind speed, and wind direction. Conventionally, meteorological engineers use physical methods for wind forecasting associated with numerical weather prediction (NWP) to identify weather changes. Treating weather changes as deterministic events is not a trivial task. In fact, it requires that meteorological engineers handle numerical weather data updates for immediate weather changes. Basically, forecasting of wind power generation becomes imprecise when the NWP results are poor. Moreover, current statistical schemes, such as 2 of 24 autoregressive (AR) and autoregressive integrated moving average (ARIMA) [1], have been proposed to determine the relationship between weather features and predicted power, where weather changes are considered random processes. Statistical models generally perform well in simple time series forecasting, but their ability to process nonlinear data in dynamic states remains insufficient [2], and the fitting performance is limited.
Recently, numerous machine learning (ML) algorithms [3][4][5] have been proposed to help engineers distinguish the correlations between climate features and wind power outputs by accumulating historical meteorological information and wind power generation data. For example, artificial neural networks (ANNs), genetic algorithms, fuzzy regression, cluster analysis (K-means), and support vector machines have been introduced to solve the problems of wind-power forecasting in climate change.
However, traditional ML models for predicting the accuracy of wind-power forecasting have challenges with some problems, including, long-term stable prediction under climate change, as their complex learning architecture may result in low efficiency, long training time, and under-fitting [6]. Thus, developing a precise and robust approach for forecasting a wide range of meteorological phenomena with wind power forecasting remains a challenge.
To address the aforementioned problems in model training that often occur in the thinlayer network architecture of ML approaches, deep learning network (DLN) approaches use multilayer neural networks to extract the more complex climate features from historical weather information that accurately assist engineers in predicting the power generation (kw/h) for wind turbines. Typical deep learning models for sequence-to-sequence data include recurrent neural networks (RNN) [7], long short-term memory (LSTM) [8,9], gated recurrent units (GRU) [10], and temporal convolution networks (TCN) [11][12][13]. These four models have been used as learning models for the DLN to establish a classifier after their architecture hyperparameters and features are considered.
Practically, when attempting to build a DLN model, there are several questions that should be answered: (1) What is the best model for the wind power prediction problem to be solved? (2) How to select hyperparameters associated with the model? Hyperparameters include model architecture-related parameters, such as the size of hidden layers. Evidently, one should decide training-related parameters, such as activation function, batch size, the learning rate, and stopping time, given the alternatives of the optimizers, such as RMSprop, SDG, and Adam [14].
We attempt to answer these questions in this study and achieve high prediction precision with low convergence errors of the cost function in the predicted output.
To answer the first question, there are a number of options with regard to DLN model choice; for example, if the input data form a sequence with dependencies between the elements of the sequence, then an RNN is required. However, the RNNs suffered from gradient exploding and vanishing problems with long sequence inputs [8,9]. The choice of an appropriate model for wind-power forecasting is discussed later.
For the second question, the process of setting the hyperparameters requires expertise and extensive trial and error. Some drawbacks of the developments in wind power forecasting have been discussed, such as the use of manual trial and error for setting the initial parameters for the model in the training process. There are no simple and easy ways to set the hyperparameters. Recently, a reinforcement learning (RL)-based Q-learning algorithm [6,15,16] has shown great success in tuning hyperparameters in controlled environments, such as energy forecasting, Atari, robotic manipulation, and Go. The Q-learning algorithm uses a model predictive control process to optimise a reward by an agent interacting with the environment states and makes better decisions compared to the traditional trial and error approaches.
To improve forecasting accuracy for wind turbines, this study proposes a DLN-based model [7][8][9][10][11][12][13] with a Q-learning algorithm for wind power forecasting to determine the hyperparameters for four competing models by minimising the prediction error. There are several options with regard to DLN model choice; here, four sequence-to-sequence models for deep learning were compared to examine the model performance and recommendations for wind power forecasting. In the experiment, four completing models were pre-trained using real historical climate data and power generation outputs of wind turbines from a wind power farm in the Chang-Bin industrial zone of Taiwan. The experimental results show that the TCN exhibited higher prediction accuracy compared to recognized recurrent networks, such as GRUs, LSTMs, and RNNs. In developing the proposed model, our work focused on three important aspects: In this study, the proper setting of the hyperparameters for the developed model was investigated using the Q-learning algorithm by minimising the loss of the cost function in the learning process. Five initial parameters for the developed model were analysed by incorporating the Q-learning algorithm in the learning process: (i) number of stacks, (ii) filter size, (iii) dilatation coefficient, (iv) activation function, and (v) optimiser to decide the proper setting of hyperparameters for model training by the input of 1-year dataset of wind farms from the Chang-Bin industrial zone (CBTP) for Tai-Power Corporation in Taiwan. In our experiment, the TCN mode for 168 h (7 days) ahead, mean absolute percentage error (MAPE) = 1.41%, had the highest prediction accuracy, followed by GRU MAPE = 3.72%, LSTM mode MAPE = 6.99%, and RNN mode MAPE = 28.18%.
The remainder of this paper is organised as follows. Section 2 reviews previous studies on the DLNs for sequence-to-sequence data and a reinforcement learning algorithm. Section 3 introduces the reinforcement learning framework and the proposed Q-learning approach for determining the proper hyperparameters for the wind-power forecast model. A performance analysis of the experimental results is presented in Section 4. Finally, Section 5 draws the conclusions.

Relate Work
In this section, we review convolutional neural networks for sequence-to-sequence data and a reinforcement learning algorithm.

Convolutional Neural Networks for Sequence-to-Sequence Data
Recently, researchers developed a DLN for time-series data processing that contains RNN, LSTM, GRU, and TCN used for handling sequence-to-sequence inputs, performing feature extraction of long-term sequence climate data, and estimating the expected electrical outputs for wind turbines [7][8][9][10][11][12][13].
Typically, an RNN is a class of artificial neural networks where connections between nodes form a directed graph along a sequence [7]. RNNs focus on sequential data forecasting, speech recognition, text generation, stock market prediction, and language translation. This specific design allows the network to process sequences of inputs using their internal state. This makes them suitable for tasks such as unsegmented connected handwriting recognition and speech recognition. However, RNNs suffer from the problem of vanishing gradients and have limited learning in long memory applications.
Due to the problem of exploding and vanishing gradients with long sequences, LSTMs were designed with three specific gates operating by controlling the cell states. Basically, LSTM units are units of an RNN developed to deal with the exploding and vanishing gradient problems that can be encountered when training traditional RNNs. Their relative insensitivity to gap length gives LSTM networks a key advantage over RNNs, hidden Markov models, and other sequence learning methods in numerous applications. The compact forms of the equations for an LSTM unit are as follows [8,9]: Updating rules for the forget gate, input gate and output gate in an LSTM block Candidate (memory) cell state Memory cell and hidden state Cell output where symbol x t denotes the input vector to the neural network, f is the forget gate (a neural network (NN) with sigmoid), c the candidate (memory) cell state (an NN with tanh function), i the input gate (an NN with sigmoid), W the weight matrices and b the bias vector parameters that need to be learned during training, o the output gate (an NN with sigmoid), h the hidden state (a vector), c the memory state (a vector) and σ is the activation function in an LSTM unit. Notably, b t , b i , b o , and b c represent bias vector parameters to be learned during training in forget gate, input gate, output gate, and memory cell, respectively. Later, GRU was proposed to handle simple problems for shorter sequential data by reducing the number of gates. Unlike LSTM, GRU replaces the forget gate and input gate in the LSTM with an update gate. Engineers usually choose GRU, considering the effects of computing power and time overhead of the hardware [10].
In 2017, Lea et al. [11] proposed a new neural framework for sequence modelling tasks, namely temporal convolutional networks (TCNs) to address the problem for long sequential data of uncertain length. TCN consists of three parts: dilated causal convolution, nonlinear activation function, and residual connection. The TCN was designed using two basic principles: (i) Causal convolutions-the convolutions are causal, indicating that there is no information leakage from the future to past; (ii) One-dimensional fully convolutional network architecture-the architecture can take a sequence of any length and map it to an output sequence of the same length [12]. In other words, TCNs use one-dimensional (1D) separable convolutions that are a form of factorised convolutions that factorise a standard convolution into a depth-wise and pointwise convolution to preserve the massive memory requirements of deep learning models [13]. Further, a TCN was proposed to extend the input large memory processing using a dilated causal convolution architecture and residual connections to preserve more accurate prediction results.
A TCN block was regularly selected using the combinational 1D fully convolutional architecture followed by a nonlinear activation function (rectified linear unit, ReLU) to determine the weights of an input sequence that improve classification accuracy in model prediction, then adding a dropout function to reduce overfitting. In other words, a dropout function in a dense layer decreases the number of parameters to be learned and helps reduce overfitting. From a design perspective, the TCN is capable of handling a series of uncertain length and outputs it with the same length. Experimental results [17][18][19][20] have shown that TCNs exhibit better performance with sequences of uncertain length, where the TCN architecture performs well in very long sequences of inputs. Sequence data prediction is the most frequently encountered issue in the TCN model, which is illustrated as follows [11].
Consider a given training dataset D(x t , y t ) in a sequence modelling task, where x t denotes input sequence (x t ∈ R n , t = 1, . . . , T) and y 0 , y 1 , . . . , y T is a prediction sequence with the corresponding sequential input {x 0 , x 1 , . . . , x T }.
The mapping function f is defined for a temporal convolutional network as follows: where y t depends only on previous observations x 0 , x 1 , . . . , x t at time t. The network is trained by supervised learning to find the function f that minimizes the error between the actual outputs (y 0 , . . . , y T ) and the predictions ( y 0 , y 1 , . . . , y T ). Under the causal constraint, the TCN develops dilated causal convolutions to expand the receptive field exponentially, taking more historical information into consideration. The dilated convolution operation F on element s of the 1-D sequence x t ∈ R n for a filter f : (0, 1, . . . , k − 1) → R is formulated as Equation (9) [13]: where k is the stack size, d is the dilation factor, and symbol * is the convolution operator. The dilated causal convolution used in this paper is illustrated in Figures 1 and 2  filter. Further, we inserted a dropout layer after each convolutional layer to avoid overfit ting. In essence, the TCN model was composed of one or more 1D convolutional layer with dilated residual blocks for input processing. Through use of the residual block in th convolutional layers in TCN, deep and large TCNs can achieve further stabilisation. TCN are useful for all types of long sequence problems because they can perform feature ex traction and correlation analyses. An example of the residual block (d = 8) used between each layer in the TCN to accelerate convergence and enable the training of deeper model is shown in Figure 2.

Hyperparameter Optimization for Machine Learning Models
The performance of deep learning models strongly depends on choosing a set of optimal hyperparameters. In other words, the prediction performance of the deep learning network model is influenced quite heavily by the choice of hyperparameters, hence it can incorporate a reinforcement learning optimisation process to improve the searching efficiency in model development. Hyperparameter tuning is an optimization problem where the objective function of optimization is unknown. Traditional optimization techniques such as the Newton method or gradient descent cannot be applied.
Practically, data engineers find a set of hyperparameter values in the training phase to archive the best performance for machine learning models within a reasonable amount of time. Further, they continuously adjust hyperparameters for models with the selection of different sets of values for a learning algorithm and then compare the model performance to choose the best model. This process is called hyperparameter optimization.
In essence, the TCN model was composed of one or more 1D convolutional layers with dilated residual blocks for input processing. Through use of the residual block in the convolutional layers in TCN, deep and large TCNs can achieve further stabilisation. TCNs are useful for all types of long sequence problems because they can perform feature extraction and correlation analyses. An example of the residual block (d = 8) used between each layer in the TCN to accelerate convergence and enable the training of deeper models is shown in Figure 2.  Typically, there are mainly two kinds of hyperparameter optimization methods, i.e., manual search and automatic search methods. Manual search identifies the important parameters depending on the experience of expert users. The process of tuning hyperparameters is not easily reproducible. Moreover, as the number of hyperparameters and the range of values increase, it becomes quite difficult to manage [21]. To overcome the drawbacks of manual search, automatic search algorithms have been proposed, such as grid search [22][23][24][25]. Mainly, grid search trains machine learning models with different values of hyperparameters in the training set and compares the performance according to evaluation metrics. Finally, grid search outputs hyperparameters that have the optimal performance. Two hyperparameter optimization approaches are compared in Table 1. Table 1. Comparison of two hyperparameter optimization methods.

Manual search (MS)
The MS depends on the experience of expert users to identify the important parameters. It is suitable for expert users to handle simple or low-dimensional hyperparameter tuning problems using the visualization tools.
When the number of hyperparameters and the range of values increase, it becomes quite difficult to manage since humans are not good at handling high dimensional data.

Automatic search (AS)
The AS trains machine learning models with selection of the optimal hyperparameters for a model that result in the most precise performance.
Basically, the AS is a brute-force and exhaustive searching approach, and it suffers from a high computational load with time cost.
Automatically tuning hyperparameters for the ideal model structure is a challenging task. Numerous automatic search techniques comprising machine learning approaches with optimised algorithms have been used for achieving high-precision performance. These AS schemes are summarised in Table 2. Table 2. Machine learning approaches for hyperparameter tuning.

Scheme Achievement Limitations
Lee, Park, Sim (2018) [21] Proposed a novel approach to improve CNN performance by hyperparameter tuning in the feature extraction step using a parameter-setting-free harmony search (PSF-HS) approach. By two simulations, it is possible to improve the performance by tuning the hyperparameters in CNN architectures with reference to CifarNet and a Cifar-10 dataset proposed in the past.
A host needs relatively high computational capabilities to train the PSF-HS algorithms for image recognition.

Wu, Chen, et al. (2019) [22]
Proposed a hyperparameter tuning algorithm for machine learning models based on Bayesian optimization to find a set of hyperparameters that archive the best performance. Experimental results show that the proposed method can find the best hyperparameters for the widely used machine learning models, such as the random forest algorithm and the neural networks.
The model accuracy highly relies on large amounts of prior information to tune the hyperparameters for developing model. Proposed a method to tune hyperparameters for machine learning algorithms using Grey Wolf Optimization (GWO) and Genetic algorithm (GA) metaheuristics.
Experimental results show that in all trials, the performance of the training phases is improved. Additionally, GWO demonstrates better performance with a p-value of 2.6E Providing a ready-to-use library and packages for different platforms and also an online web tool is needed in practice due to investigating a wider range of several machine learning training algorithms and different metaheuristics.
Although the methods in Table 2 achieve automatic tuning approaches such as grid search and can theoretically obtain the near optimal value of the hyperparameters, they suffer from a high computational load with time cost.

Reinforcement Learning Algorithm
To solve the problem of expensive cost in grid search, a dynamic programming-based reinforcement learning algorithm has been proposed to assist engineers in finding the optimal performance for specific problems by adjusting the definite hyperparameters for the developed model.
Notably, the reinforcement learning-based algorithm has shown great success in many real-world hyperparameter optimization problems, including hyperparameter tuning in controlled environments for deep learning network modelling [6,15,16,[26][27][28]. Many researchers have focused on neural architecture search (NAS) and hyperparameter optimization using reinforcement learning approaches. For example, Zoph and Le addressed hyperparameter optimization of RNN and LSTM using reinforcement learning to design automation of deep learning networks [26]. Later, Iranfar et al. used reinforcement learning to tune the hyperparameters for a multi-layer perceptron (MLP) and CNN [27]. Jalali et al. used an ensemble learning scheme with reinforcement learning strategy to increase the prediction accuracy of wind power forecasting [28].
In the following, we review the basic principle of reinforcement learning algorithm.
In real applications, decisions seldom observe the complete state of the system; it is more common to receive observations of the state that are noisy or incomplete. Markov decision processes (MDPs) with stochastic theory have been proposed to solve these model decision problems in incomplete states. The MDP focuses on finding a good policy that will yield the maximum cumulative rewards by taking a series of actions with respect to specific states of the observed system. In the problem model for the MDP, an agent (decision maker) may interact with its environment and change its behavior using an action in available states S from the consequences of actions A. The dynamic process responds at the next time step by randomly moving into a new state and giving the decision maker a corresponding reward R [6].
A major approach in reinforcement learning, namely the Q-value approximation, is to select an action a t that maximises the expected reward at any state s t . The goal of the agent in Q-value approximation is to find an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps, starting from the current state. Therefore, this algorithm calculates the quantity of a state-action pair as [26].
The update rule Q-value approximation is shown as Here, α ∈ (0, 1) is the learning rate. A factor of value '0' lets the agent not learn anything, while a factor of value '1' lets the agent consider only the most recent information. Equation (11) indicates that the Q-value approximation is a multi-decision process based on the MDP framework to provide a generalised framework for optimal behaviour based on actions of the agent to maximise the cumulative rewards of a series of decisions.
Recently, Iranfar et al. addressed hyperparameter optimization of CNNs through a Q-learning-based approach with a multi-agent system [27]. In the proposed multi-agent Q-learning-based approach, agents are able to communicate through our novel definition of state-action pairs, Q-tables, and a Q-table update rule. Thus, a Q-learning scheme was adopted for hyperparameter optimization and to examine the predicted accuracy of the model for overcoming an expensive cost problem scheme in grid search.

Determining the Hyperparameters for the Developed Model with Q-Learning Scheme
This section presents our proposed TCN model design for wind power forecasting 24-168 h ahead, incorporating a reinforcement learning algorithm for long-term, wind Appl. Sci. 2022, 12, 7067 9 of 24 power forecasting. The proposed model comprises three phases: the architecture design phase, hyperparameter selection phase and the overall process for model development phase.

Architecture Design for the TCN Model
For a TCN model design for wind power forecasting 24-168 h ahead, it is expected that the resulting deep features will consist of climate features obtained through the convolutional feature learning process from historical weather information, thereby reducing the prediction errors of power generation (kw/h) for wind turbines as much as possible.
Adopting TCNs for wind power forecasting has two advantages. First, in the training phase, TCNs automatically learn the correlations to capture complex variations with wind power output by leveraging a large amount of training data. Second, during the learning and testing phase, deep learning networks can be easily parallelised on GPU cores for acceleration, allowing the results to be quickly obtained [17]. Thus, we incorporated the convolutional network architecture involved in casual convolution with residual connections to construct a stable TCN-based prediction model for 24-168 h wind power forecasting.
In the present study, a wind power forecasting technique was developed based on a TCN model, with the goal of achieving high predictive accuracy with the minimum convergence error where the input sequence comprises historical wind speed and direction observations over past periods, while the actual output is the current wind power output.
First, TCN is used to learn the correlations between meteorological features and wind power generation from various historical data. In the case of wind power forecasting, the TCN model can be defined as follows [11]: where F d (.) is the dilated convolution function of d-factor; x l t is the value of the neuron of l-th hidden layer at time t; W l x and b l x are the weights and bias corresponding to the l-th hidden layer, respectively; and σ is the activation function.
A detailed workflow of the wind power forecasting using dilated residual blocks to extend a large receptive field for processing historical data is shown in Figure 1. In theory, TCN has two effective approaches to increase the receptive field size, that is, increase the dilation factor d and choose a larger filter size k.
As shown in Figure 1, we proposed a TCN based dilated causal convolution design with dilation factors d = 1, 2, 4, 8 and filter size k = 2. In the residual block, TCN uses a two-layer dilated causal convolution, the activation function of the convolution is a linear rectification function (ReLU), and weight normalization is used to normalize the convolution filter. Further, we inserted a dropout layer after each convolutional layer to avoid overfitting.
In essence, the TCN model was composed of one or more 1D convolutional layers with dilated residual blocks for input processing. Through use of the residual block in the convolutional layers in TCN, deep and large TCNs can achieve further stabilisation. TCNs are useful for all types of long sequence problems because they can perform feature extraction and correlation analyses. An example of the residual block (d = 8) used between each layer in the TCN to accelerate convergence and enable the training of deeper models is shown in Figure 2.

Hyperparameter Selection for the TCN Model Using Q-Learning Algorithm
Inspired by Iranfar, Zapater, and Atienza in [27], we adapted Q-learning agents to split the design space into independent smaller design sub-spaces such that the agents fine-tune the hyperparameters for the assigned layer and provide faster, yet accurate design space search. Thus, the optimal parameters for the developed wind power prediction model with Q-learning algorithm are analyzed as follows.
In the training phase, the present study used a TCN model pre-trained on the experimental dataset containing collected wind turbine data from the Chang-Bin wind power plant in Taiwan from 2020 to 2021 (2-year period) and incorporated a Q-learning algorithm to determine the optimal parameters for the developed models in the training process ( Figure 3). As shown in Figure 4, the selection process for architecture parameters is modelled as an MDP.
Inspired by Iranfar, Zapater, and Atienza in [27], we adapted Q-learning agents to split the design space into independent smaller design sub-spaces such that the agents fine-tune the hyperparameters for the assigned layer and provide faster, yet accurate design space search. Thus, the optimal parameters for the developed wind power prediction model with Q-learning algorithm are analyzed as follows.
In the training phase, the present study used a TCN model pre-trained on the experimental dataset containing collected wind turbine data from the Chang-Bin wind power plant in Taiwan from 2020 to 2021 (2-year period) and incorporated a Q-learning algorithm to determine the optimal parameters for the developed models in the training process ( Figure 3). As shown in Figure 4, the selection process for architecture parameters is modelled as an MDP.
where ( , ; θ) represents the reward value of the current state s, θ is the parameter from the previous iteration, s is the state after performing action , and is a number between 0 and 1, 0 γ 1, called the discount factor, which trades off the importance of sooner against later rewards. Further, may also be interpreted as the likelihood of succeeding at every step k. Q(s, ; θ) denotes the quantity of a state-action pair (s, ). By performing action a, the agent can move from a state s to a new state s , in which each state provides the agent with a reward. As described above, the agent is used for maximising its total reward by learning the optimal action for each state.
To minimise the errors between the predicted values and actual outputs (i.e., maximise the reward to minimise the prediction errors between prediction value and real output) for wind power forecasting, we define the reward (R) between state s and new state as In finite state-action spaces, the Q-learning approach can solve for Q-value using an update rule as follows (Figure 4).  In summary, the Q-learning computation was performed in the following four substeps [21,22] Calculate L(θ), and then stop updating when ∆L(θ) .
Selection of the optimal parameters (θ) for the developed TCN model is illustrated in Figure 5. To solve the problem of exploring the proper setting of hyperparameters for the TCN model, this study proposed a prediction model involving Q-learning with the reinforcement learning approach as follows.
Initially, the loss function of the Q-learning model, L(θ), is given by where r k (s, a; θ) represents the reward value of the current state s, θ is the parameter from the previous iteration, s is the state after performing action a, and γ is a number between 0 and 1, 0 ≤ γ ≤ 1, called the discount factor, which trades off the importance of sooner against later rewards. Further, γ may also be interpreted as the likelihood of succeeding at every step k. Q(s, a ; θ) denotes the quantity of a state-action pair (s, a). By performing action a, the agent can move from a state s to a new state s , in which each state provides the agent with a reward. As described above, the agent is used for maximising its total reward by learning the optimal action for each state.
To minimise the errors between the predicted values and actual outputs (i.e., maximise the reward to minimise the prediction errors between prediction value and real output) for wind power forecasting, we define the reward (R) between state s and new state s as In finite state-action spaces, the Q-learning approach can solve for Q-value using an update rule as follows (Figure 4).
where Q k+1 (s, a; θ) is a new Q-value of the state-action pair for state s, Q k (s, a; θ) is the Q-value of the state-action pair for current state s; r k (s, a; θ) is the reward value of the stateaction pair for current state s, and Q k (s , a ; θ) represents the Q-value of the state-action pair for a new state s . In summary, the Q-learning computation was performed in the following five substeps [21,22] Selection of the optimal parameters (θ) for the developed TCN model is illustrated in Figure 5.  In summary, the detailed Q-learning algorithm for wind power forecasting with the TCN learning model described by the PDL is as Algorithm 1.  In summary, the detailed Q-learning algorithm for wind power forecasting with the TCN learning model described by the PDL is as Algorithm 1.

Repeat
The state vector S consists of N independent states S = [s 1 , s 2 , . . . , s N ] Set R = 0 For each state s i in S Initialize Q(s i , a i ), ∀s i ∈ S i , a i ∈ A(s i ), arbitrarily, and Q(terminal − state, 0) = 0 Repeat (for each episode of ith state s i ) Initialize S i Repeat (for each step of episode) Choose A from S i using policy derived from Q Take action A, observe R, S i S = s 1 , . . . , s i , . . . ,

Overall Process for Model Operations
A detailed flowchart of the developed model for wind power forecasting is shown in Figure 6. The figure also illustrates the proposed TCN model incorporating three subphases in the model operation process: (i) data pre-processing, (ii) model training (vs. determination of proper setting for hyperparameters), and (iii) model validation. Appl

Overall Process for Model Operations
A detailed flowchart of the developed model for wind power forecasting is shown in Figure 6. The figure also illustrates the proposed TCN model incorporating three subphases in the model operation process: (i) data pre-processing, (ii) model training (vs. determination of proper setting for hyperparameters), and (iii) model validation.  Table 3.  Table 3. Once the wind turbine location is located, the forecasting algorithm can include future weather forecasting data from the Central Weather Bureau (CWB) in Taiwan, where the forecasts on the CWB website are issued four times daily, updating future 7-day ahead weather forecasts for various areas of Taiwan. Moreover, historical data contain real weather observations and wind turbine power outputs with anomalous data, and engineers need to pre-process data for wind farm datasets before performing model pre-training.
Step 2. Model training Step 2.1. Initial setting of hyperparameters from GitHub Then, the proposed model uses a set of pre-training parameters from keras-tcn at GitHub [29] as initial parameters for model training. keras-tcn was used as the transfer learning (TL) model to fine-tune the developed model and improve learning speed and training accuracy. In the experiment, four crucial architecture-related parameters (hyperparameters) were selected from the transferring learning cases in the TCN predictor, namely (i) number of stacks, (ii) filter size, (iii) dilatation coefficient, (iv) activation function, and (v) optimiser. The accuracy (%) associated with the distinct model parameters for wind power prediction will be examined using Q-learning algorithm.
Step 2.2. Hyperparameter tuning of the model In this phase, the following sub-steps were used to determine the proper parameters for the training model for wind-power forecasting using the Q-learning algorithm (Algorithm 1), by exploring these possible solutions and deciding on the proper parameter selection for the training model.
Step 3. Model validation In the following, the system provides the benefit of a quick response for wind power forecasting through the use of neural net weights using four trained DLN models. To validate the accuracy of the four experimental models, an evaluation index was chosen to evaluate the prediction performance of the proposed model as follows: The MAPE can be defined as where O t is the real wind power value,Ô t is the estimated wind power output, and N is the number of data points. As presented in Equation (15), the MAPE is calculated by dividing the estimation error by the real value of wind power generation.

Experimental Results
In this section, the performance of the proposed TCN-based model was demonstrated using an example of wind power prediction. The prediction system was executed in the Linux environment, and the software packages installed are listed in Table 4. Step 1. Data pre-processing phase In the experiment, model pre-training used a project dataset of wind farms from the Chang-Bin industrial zone (CBTP) for Tai ates with that wind speed, which is generated by the turbine manufacturer.
To increase the recognition accuracy, the correct value of null fields was mislabelled to build a clean dataset for the experiment. The data cleaning process in experimental data was manually pre-processed to remove null fields and were pre-filled in the linear proportions of the neighbouring observation data. The training dataset used samples collected every 10 min from 1 January 2018 to 31 December 2018. Further, the test dataset in the experiment was 7 days in total, from 1 January to 7 January 2019.
Step 2. Model training phase To examine model efficiency, the experiment incorporated four deep neural models for series data processing, that is, GNN, LSTM, GRU, and TCN, to conduct wind power forecasting 24-168 h (1-7 days) ahead.
First, a series of experiments were pre-trained to investigate the performance of the TCN-based classifier effectiveness using the training dataset, where the learning results were regarded as the basis for the proper selection of model parameters, including five important hyperparameters for structure design, i.e., the number of filters, dilatation coefficient, activation function, optimiser, and number of stacks.
To achieve proper selection of hyperparameters in a TCN model, we used a reward function defined in Equation (10) for neural network generation and hyperparameter tuning based on Markov decision process. The decision process for generating a TCN architecture with the optimal parameters on neural network layers using Q-learning algorithm is shown in Figure 7  To increase the recognition accuracy, the correct value of null fields was mislabelled to build a clean dataset for the experiment. The data cleaning process in experimental data was manually pre-processed to remove null fields and were pre-filled in the linear proportions of the neighbouring observation data. The training dataset used samples collected every 10 min from 1 January 2018 to 31 December 2018. Further, the test dataset in the experiment was 7 days in total, from 1 January to 7 January 2019.
Step 2. Model training phase To examine model efficiency, the experiment incorporated four deep neural models for series data processing, that is, GNN, LSTM, GRU, and TCN, to conduct wind power forecasting 24-168 h (1-7 days) ahead.
First, a series of experiments were pre-trained to investigate the performance of the TCN-based classifier effectiveness using the training dataset, where the learning results were regarded as the basis for the proper selection of model parameters, including five important hyperparameters for structure design, i.e., the number of filters, dilatation coefficient, activation function, optimiser, and number of stacks.
To achieve proper selection of hyperparameters in a TCN model, we used a reward function defined in Equation (10) for neural network generation and hyperparameter tuning based on Markov decision process. The decision process for generating a TCN architecture with the optimal parameters on neural network layers using Q-learning algorithm is shown in Figure 7   To improve input processing capability, the TCN model was developed to generate the receptive field to solve series data input problems. In particular, TCNs address the problems associated with dilated causal convolution, and residual connection. In theory, To improve input processing capability, the TCN model was developed to generate the receptive field to solve series data input problems. In particular, TCNs address the problems associated with dilated causal convolution, and residual connection. In theory, the receptive field size of the TCN depends on the network depth n (number of stacks), filter size k, and dilation factor d; making the TCN deeper and larger is important to obtain a sufficiently large receptive field. Empirically, making the network deep and narrow, which means stacking a large number of layers and choosing a thin filter, results in an effective architecture [31].
(i) Number of stacks First, we need to decide the number of stacks for the TCN model. In essence, the network architecture of a TCN is an extension of a 1D CNN in which a series of 1D convolutional layers are stacked on one another. [24] In the experiment, we analysed three different stack numbers n (i.e., 2, 3, 4) from transferring learning cases in the TCN predictor.
(ii) Filter size Similarly, we analysed three different filter sizes k (i.e., 8,16,32) for the receptive field size of the TCN for the designed TCN-based prediction model using iterations of the Q-learning computation.
(iii) Dilatation coefficient As described before, the residual block (d) was used between each layer in the TCN to accelerate convergence and enable the training of deeper models. From [14], we analysed three different sets of combinations for the dilatation coefficient d (i.e., 1, 2, 4; 1, 2, 4, 8; 1, 2, 4, 8, 16) in the receptive field size of the model input.
(iv) Activation function We analysed two major types of nonlinear activation functions, that is, Norm_relu and Tanh*Sigmoid activation functions used in Wavenet. Then, we calculated the corresponding convergence error of the cost function for the two activation functions.
(v) Optimiser To minimise loss during model training, an optimiser was adopted to improve the predictive accuracy of the model by adjusting the filter weights. We assessed three popular optimisers for the TCN model: Adam, SGD, and RMSprop.
In summary, the computation process for the proper selection of TCN model parameters is as follows: 1. Initialize the parameters for the model.
The architectural components of the proposed TCN model from GitHub [24] were regarded as the initial values and are listed in Table 5.  Table 5.
a. The parameter Q table is iterated until the termination condition. b.
Update the Q-table using Equation (15) as shown in Table 6.  In the experiment, the architecture parameters were determined by computing the Q-table (Figure 6). Finally, the architecture parameters were selected according to the results of the Q-learning computation and are listed in Table 7. Then, the selected hyperparameters for the TCN were used to draw comparisons and calculate the predictive precision of the proposed scheme. As shown in Figure 8, we evaluated the accuracy of the model in terms of prediction ability to examine the expected outcome and whether the appropriate model parameters were selected. The optimal hyperparameters for the TCN can be determined through the error value and error reduction speed of the loss function. Specifically, the model parameters may require adjustment if the convergence error decrease is not smooth. From Figure 8, the TCN accuracy for wind power prediction increased with a decrease in the iteration of the training data, leading to achievement of stable convergence. To compare predictive accuracy of DLNs for sequence data processing, three deep neural models, namely RNN, LSTM, and GRU, were incorporated to examine the proper setting of the model parameters. First, the initial parameters for the three models from the transfer learning were regarded as the initial values and are listed in Table 8. In the experiment, the architecture parameters were determined based on the results of the Q-learning computation process. Tables 9 and 10 and Figures 9 and 10 show the corresponding convergence errors of the cost function for the RNN, LSTM, GRU and TCN prediction models for wind power forecasting 72 and 168 h ahead. Finally, the hyperparameters for the model were selected based on the results of the Q-learning computation and are listed in Table 11.  To compare predictive accuracy of DLNs for sequence data processing, three deep neural models, namely RNN, LSTM, and GRU, were incorporated to examine the proper setting of the model parameters. First, the initial parameters for the three models from the transfer learning were regarded as the initial values and are listed in Table 8. In the experiment, the architecture parameters were determined based on the results of the Q-learning computation process. Tables 9 and 10 and Figures 9 and 10 show the corresponding convergence errors of the cost function for the RNN, LSTM, GRU and TCN prediction models for wind power forecasting 72 and 168 h ahead. Finally, the hyperparameters for the model were selected based on the results of the Q-learning computation and are listed in Table 11.
Once the optimal parameters for the RNN, LSTM, and GRU models were selected, three predictive models were validated with the same experimental data to conduct wind power forecasting 72-168 h ahead using the model parameters selected, as listed in Tables 7 and 11.    Once the optimal parameters for the RNN, LSTM, and GRU models were selected, three predictive models were validated with the same experimental data to conduct wind power forecasting 72-168 h ahead using the model parameters selected, as listed in Tables  7 and 11.
Step 3. Model validation phase In the experiment, 1-year historical data were used to train four DNNs for wind power data, and the prediction results for 72, 120, and 168 h ahead are shown in Figures  11-13, respectively. Figure 13 shows the accuracy associated with using four different pre-   Once the optimal parameters for the RNN, LSTM, and GRU models were selected, three predictive models were validated with the same experimental data to conduct wind power forecasting 72-168 h ahead using the model parameters selected, as listed in Tables  7 and 11.
Step 3. Model validation phase In the experiment, 1-year historical data were used to train four DNNs for wind power data, and the prediction results for 72, 120, and 168 h ahead are shown in Figures  11-13, respectively. Figure 13 shows the accuracy associated with using four different pre-  Step 3. Model validation phase In the experiment, 1-year historical data were used to train four DNNs for wind power data, and the prediction results for 72, 120, and 168 h ahead are shown in Figures 11-13, respectively. Figure 13 shows the accuracy associated with using four different prediction models where the prediction error of the TCN mode for 168 h (7 d) ahead had the highest prediction accuracy, MAPE = 1.41%, followed by GRU mode MAPE = 3.72%, LSTM mode MAPE = 6.99%, and RNN mode MAPE = 28.18%. Appl

Method Comparisons
As shown in Figure 14a, the precision of four prediction models was increased by the decreased convergence error of the cost function in each iteration of the model training. More concisely, the descending speed of error convergence for the cost function was rapid with the increasing number of iterations (epochs). After 60 iterations, the loss function tended to be stable when TCN and GRU prediction models were applied (Figure 14b). However, the errors for the RNN and LSTM models did not converge slowly. Notably, the convergence error of the TCN model decreased more than that of the GRU model close to the fifteenth epoch, and the convergence error decreased steadily as the number of iterations increased. The experimental results revealed that the prediction error of the TCN model decreased the most steadily among the four models, followed by GRU and LSTM.
(a) Figure 13. Experimental results for 168 h wind power forecasting.

Method Comparisons
As shown in Figure 14a, the precision of four prediction models was increased by the decreased convergence error of the cost function in each iteration of the model training. More concisely, the descending speed of error convergence for the cost function was rapid with the increasing number of iterations (epochs). After 60 iterations, the loss function tended to be stable when TCN and GRU prediction models were applied (Figure 14b). However, the errors for the RNN and LSTM models did not converge slowly. Notably, the convergence error of the TCN model decreased more than that of the GRU model close to the fifteenth epoch, and the convergence error decreased steadily as the number of iterations increased. The experimental results revealed that the prediction error of the TCN model decreased the most steadily among the four models, followed by GRU and LSTM.
To discriminate the performance of four prediction models in long-term prediction, different lengths of the prediction period based on the same amount of training data were tested. Notably, the experiment varied the lengths of the prediction periods set to examine the accuracy of the convergence error in practice; Table 12 lists the prediction error (i.e., MAPE), and the output results of the modules were sorted for a test set of distinct lengths of prediction periods with the same data size.
Experimental results show that the prediction error increased as the length of the prediction period increased. In other words, the prediction error of wind power output increased with increasing length of prediction period. Particularly, the prediction error of the TCN-based model remained below 2% with increasing length of the prediction period, and the forecast errors for 72, 120, and 168 h were 0.07%, 1.71%, and 1.41%, respectively. In summary, the average prediction error of the proposed TCN-based model was approximately 1.063% based on 1-year historical data in three different predictive periods. In other words, the TCN model achieved highly precise prediction accuracy with consistent and stable results using selected parameters from the Q-learning algorithm. tended to be stable when TCN and GRU prediction models were applied (Figure 14b). However, the errors for the RNN and LSTM models did not converge slowly. Notably, the convergence error of the TCN model decreased more than that of the GRU model close to the fifteenth epoch, and the convergence error decreased steadily as the number of iterations increased. The experimental results revealed that the prediction error of the TCN model decreased the most steadily among the four models, followed by GRU and LSTM. To discriminate the performance of four prediction models in long-term prediction, different lengths of the prediction period based on the same amount of training data were tested. Notably, the experiment varied the lengths of the prediction periods set to examine the accuracy of the convergence error in practice; Table 12 lists the prediction error (i.e., MAPE), and the output results of the modules were sorted for a test set of distinct lengths of prediction periods with the same data size.
Experimental results show that the prediction error increased as the length of the prediction period increased. In other words, the prediction error of wind power output increased with increasing length of prediction period. Particularly, the prediction error of the TCN-based model remained below 2% with increasing length of the prediction period, and the forecast errors for 72, 120, and 168 h were 0.07%, 1.71%, and 1.41%, respectively. In summary, the average prediction error of the proposed TCN-based model was approximately 1.063% based on 1-year historical data in three different predictive periods. In other words, the TCN model achieved highly precise prediction accuracy with consistent and stable results using selected parameters from the Q-learning algorithm. Compared to other studies in wind power forecasting, such as with LSTM [17], LSTM with grid search (LSTM-grid) [18], bidirectional LSTM with grid search (BiLSTM-grid) [18], GRU [17], TCN [17], and the ensemble learning-based AAA aggregation model [19], the detailed comparison information for distinct approaches is listed in Table 13. Notably, the project of the BSI electric power company used an AAA aggregation model based on an ensemble-based learning algorithm for 120 h forecasts by aggregating numerous deep  Compared to other studies in wind power forecasting, such as with LSTM [17], LSTM with grid search (LSTM-grid) [18], bidirectional LSTM with grid search (BiLSTM-grid) [18], GRU [17], TCN [17], and the ensemble learning-based AAA aggregation model [19], the detailed comparison information for distinct approaches is listed in Table 13. Notably, the project of the BSI electric power company used an AAA aggregation model based on an ensemble-based learning algorithm for 120 h forecasts by aggregating numerous deep learning models in different climate scenarios and reached approximately 2% prediction error in 2016. Overall, the proposed TCN-based model with Q-learning algorithm provides a lower prediction error with higher prediction accuracy than those of earlier deep learning schemes such as LSTM, BiLSTM, TCN and AAA aggregation models [32] in studies of wind power forecasting.

Conclusions
This paper presented a TCN-based model based on a casual convolution architecture with a Q-learning algorithm to efficiently learn the correlations between meteorological features and wind power generation. The experimental results show that the Q-learning algorithm efficiently helps data engineers in determining appropriate parameters for temporal convolutional networks. Additionally, the TCN has a great capability for feature extraction of long-term sequence data and retains higher prediction accuracy than GRU and LSTM. Overall, the proposed TCN-based approach outperformed GRU, LSTM, and RNN in our experiments for wind power forecasting.