Review on Deep Learning Research and Applications in Wind and Wave Energy

: Wind energy and wave energy are considered to have enormous potential as renewable energy sources in the energy system to make great contributions in transitioning from fossil fuel to renewable energy. However, the uncertain, erratic, and complicated scenarios, as well as the tremendous amount of information and corresponding parameters, associated with wind and wave energy harvesting are difﬁcult to handle. In the ﬁeld of big data handing and mining, artiﬁcial intelligence plays a critical and efﬁcient role in energy system transition, harvesting and related applications. The derivative method of deep learning and its surrounding prolongation structures are expanding more maturely in many ﬁelds of applications in the last decade. Even though both wind and wave energy have the characteristics of instability, more and more applications have implemented using these two renewable energy sources with the support of deep learning methods. This paper systematically reviews and summarizes the different models, methods and applications where the deep learning method has been applied in wind and wave energy. The accuracy and effectiveness of different methods on a similar application were compared. This paper concludes that applications supported by deep learning have enormous potential in terms of energy optimization, harvesting, management, forecasting, behavior exploration and identiﬁcation.


Introduction
Renewable energy has been catching researchers' eyes for many years, and its exploitation, harvesting, energy management, efficiency improvement and applications have been the concentration in the energy research fields. Comprehensive applications of renewable energy have the responsibilities of gradually taking the place of traditional fossil fuels. Among them, wave and wind energy no doubt have huge potential. Wave energy has been drawing researchers' attention for years since the huge potential energy from oceans may support energy transition and sustainability. The global wave energy market will be increased from USD 44 million to USD 107 million in 2025. However, the design, construction and planning applications require all-round understanding and knowledge on the wave energy behaviors, which have the properties of being the most intermittent, unstable and uncertain. In both short-term and real-time interval scenarios, the wave behaviors are domestically in deterministic. The wave height and wave period are the most dynamic and critical properties of wave energy, and they change not only temporally but also spatially. The wave period condition could affect the wave energy converters (WECs). Hence, to some extent, the characteristics of the wave height and period both have a vital influence on wave energy prediction. Many models have been built to simulate and explore the way to make the wave energy harvesting, forecasting and optimization more accurate. The accurate prediction occupies an important position in energy reliability and performance, impacting the cost and the entire energy management system [1].
In the growing developing field of wave energy, wind energy has been expanded and has become a more mature form of renewable energy application. The application conditions and periods, and the paper attempts to grasp the core factors while summarizing the relevant information.
This review and summary of methods used in wind and wave energy not only reveals the development of deep learning methods but also explores the new possibilities of new applications in wind and wave energy and tracks the possibility to expand the trend of prolongation. Continually reorganizing the existing research results would allow us to obtain excellent methods and experience, avoid the repetition of redundancy and mistakes, and extend the novel combination of methods and extensive applications. On the other hand, it could provide and motivate insights into further research directions. It is reasonable to conclude that applications of renewable energy based on deep learning could develop splendidly and gradually mature in technique and cost management due to the corresponding support techniques increasing sharply. The literature review of current research in wave and wind energy application will also provide a valuable evaluation of available datasets from existing case studies, which is a necessary support for future research in similar fields and research directions to explore the effectiveness and efficiency of potential in different applications.

Deep Learning Applications of Wind and Wave Energy
Wind energy and wave energy have become critical clean sources of renewable energy. The deep learning method has been used in the field of renewable energy since it provides a feasible method for not only linear correlations but also nonlinear dynamic prediction and correlations process. As the utilization and exploration of marine energy increase, enormous wave energy has gradually become the leading role, while its harvesting, control, behavior exploring, movement tracking, stability and generation provide too many possibilities on its growth.
In order to gradually utilize the wave energy and expand its commercial scale, one of the options is to combine wind and wave energy to explore offshore wind and wave energy by analyzing its behavior and correlations. Another option is to summarize and analyze onshore and offshore wind energy applications in different models to encourage the possibility of applying it in the wave energy optimization, visualization and forecasting. This method can promote the development of deep learning wave energy in many contexts. In the meantime, stability and resilience should always be considered both in current and future research. Both wind and wave energy are similar in their variation. The variation of wind and wave energy is critical for a stable grid power demand. If the fluctuation goes over a certain threshold, it may cause grid failure or system restart. With the renewable installation increases that have occurred since 2009, balancing different renewable energy sources is another effective way to meet the demand from the grid. However, the existing variation in wind and wave energy could be reduced and even canceled out in time series and spatial dimensions [13,14] using the hybrid method, which could lead to meaningful development of both energy sources. Wind and wave energy applications were chosen to be reviewed and discussed in this paper together based on publications in the last 5 years. Google Scholar, ReseachGate and Science Direct search engines were used for searching the literature for this paper. The keywords used for searching this literature include wind/wave energy, deep learning, RNN, LTSM, GRU, etc. Articles were selected and reviewed based on chosen time range starting from 2016 and the pertinence of the theme.

Forecasting of Wind and Wave Energy
Even though both wind and wave conditions are predictable, the actual wind and wave power output is difficult to predict in order to match the demand. The application of a deep learning structure to prediction provides a deficient way to process a large number of historical data for prediction. The forecasting methods have been developed in different ways including traditional statistical method [15,16], such as Kalman-filter [17] and regression model [18], physical model, artificial intelligence (AI) techniques and hybrids on different structures [19]. Physical models usually rely on numerical weather prediction (NWP) or time series models with a number of related variables considered to predict the wind speed or wind power. Research shows that the physical model may act better in wind speed or wind power forecasting6 h to 1 day ahead, and it has usually been applied in power system management and trading systems [20,21]. The statistical method is the most common in applications forecasting less than 6 h ahead, which could benefit the wind turbine control and tracking [22,23].
AI-based models, such as support vector machines (SVMs), back propagation, fuzzy logic methods and artificial neural networks (ANNs), have been implemented in many forecasting fields. With the fast engagement and increasingly attractive use, different structures of deep neural networks have been developed for different applications. The ability to handle enormous datasets and nonlinear correlation with more flexibility and resilience has resulted in the remarkable development of renewable energy applications. Meanwhile, existing methods also can be divided into nonhybrid models and hybrid models [24][25][26][27][28] based on current research. Bootstrap is another method used in wind speed forecasting (WSF), in which the dataset is small and is not divided into the training set and the testing set. One of its advantages is that it generates multiple training datasets, which may benefit the ensemble learning method. Due to the importance of wind and wave energy forecasting in power systems and the electricity supply market, the accuracy of the forecasting has gradually become an increasingly critical factor in forecasting models. Hence, intelligent forecasting models have been widely implemented, preferably and frequently due to their ability to address correlations among variables compared to statistical methods or physical models.
In the last five years, the following deep learning models have been developed and used: convolutional neural network [29]; recurrent neural network; long short-term memory [30]; deep brief network [31]; stacked auto-encoder; deep neural network [32]; gated recurrent network [33]; and deep hybrid models. In previously research, deep learning-based models performed better than statistical models and physical models [34]. Forecasting models based on deep neural networks not only improve the accuracy of forecasting but also reduce the operational cost to increase the wind or wave energy competitiveness compared to other forms of renewable energy. Based on recent research, wind power and wave power forecasting can be performed based on hindcast wind or wave power data or rely on direct related variables, such as wind speed, wave height and wave period, to calculate the wind/wave power through the power curves of specific harvesting devices. Table 1 summarizes the recent applications of deep learning models in wind and wave forecasting field. For example, Francisco [35] and Gu [36] applied the power curves of Vesta 90 wind turbine and Pelamis 750 kW wave energy converter to calculate the available wind and wave power through wind speed, wave height and period. Different harvesting devices have slightly different power curves, which may cause slightly differences [37]. In nonlinear relationship problem solutions, models based on deep learning perform better than traditional statistical models; thus, deep leaning neural network has been widely considered in forecasting research.

Differences on Datasets Used
Typically, the objective of forecasting includes wind speed, wind power, wave period and wave power among the published studies, and it is also common to conduct based on deep learning method in this context. Meanwhile, the data used in forecasting might come from historical datasets from meteorological, remote sensing, geographic data in wind speed, wind direction, wave height, wave period and wave direction, along with many corresponding factors including temperature, parameters of devices, weather condition, sea surface salinity, ocean depth, pressure, humidity, orographic and dynamic atmosphere as computation resources. Meanwhile, the dataset could include different time series, spatial or remote sensing images. If the reviewed paper uses the historical dataset with meteorological parameters, the wave and wind power prediction need to be calculated through the equations by critical parameters, such as wave height, wave period, wind speed, wind turbine hub height and pitch angle of devices' parameters. Table 1. Summary of deep learning models in wind and wave energy forecasting.

Applications
Wave height (Buoy), wind speed [38] Wave height/period, wind speed/direction, sea level pressure, gust speed, air pressure, Sea surface temperature, buoy data [39] Mean wave period (wave buoy data) [37] Offshore wind speed (light detection and ranging and seashore meteorological mast) [40] Wave height/period/direction (buoy station from NOAA) [41] Daily ocean wave height prediction [42] Wind power generation [43] Wind power forecast [44] Wind speed forecasting [45][46][47][48] Wind forecasting [49] Wind farm cluster power prediction [50] Surface wind forecast [51] Prediction of directly gained or measured parameters are more flexible and easier to achieve than calculated power. However, indirect factors such as temperature, salinity, pressure and precipitation might be used for forecasting and prediction or the exploration of correlations [13]. From the perspective of calculation and workload, if the corresponding factors can be used for forecasting instead of direct parameters, it may be a useful and supportive method for irregular data mitigation and model improvement. Lin implemented 11 features as inputs of the predictive model, which included four different wind speeds at different heights and three different pitch angles of blade, as well as the parameters of nacelle orientation, atmosphere temperature and error [52].
Chen et al. [29] used combination auto-encoder of CNN and LSTM to perform the 2-D wind plane prediction, and the dataset implemented in the case study comprised meteorological data collected from Wind Integration National Dataset by NREL, located in Indiana, US, from 2010 to 2012 within 10 by 10 wind array. The resolution was in 5 min time series in raw data and modified based on it into 2 h time interval. The raw data include 1314,000 points from this three-year period. The proportion of training vs. testing data was 4:1 to feed and assess the CNN-LSTM model. Wei and Chang [38] chose the study area located in the coastal water of Keelung and Kaohsiung ports in Taiwan, while the buoy and radar images datasets were collected for typhoon data during 2013-2019. The dataset was typically time-series-sensitive. Wave height, wind speed, air pressure at sea level, surface temperature, surface wind speed/direction and instantaneous maximum surface wind speed/direction were used as the dataset group and attribute group for the gated recurrent unit neural network (GRU) and CNN feature and time series extraction input and adopted as two wind-speed-prediction models and extended four wave-height-prediction models based on the results of wind [38]. The comparison was conducted using different inputs as two different groups: one group included raw data from the meteorological data and the significant wave height dataset separately, while the other group combined the group one with the outcome from two wind speed prediction models, respectively.
Bento et al. [53] collected sites buoy data from 11 sites in four different months from US National Data Center (NDBC) and two sites from Canadian Integrated Science Data Management (ISDM) at Pacific and Atlantic coasts and the Gulf of Mexico to forecast the wave energy flux directly and indirectly through the wave height and wave period forecasting, respectively, based on DNN [54]. The wave dataset used had both temporal and spatial properties, which were used to revealed the correlations between the predicted value and raw variables. Mousavi et al. [55] proposed a model based on LSTM to predict wave power. The dataset comprised experimental data from He [56] collected from a Fetch experiment by flow-3D simulation to investigate the wave power and relationship with wave height based on LTSM structure. Fan et al. [57] implemented a bidirectional gated recurrent units (BiGRU) network to predict the wave height in 3 h, 6 h, 12 h and 24 h of the tropical cyclone lead time with more accurate feature extraction and compared it with another deep learning-based model of a new typhoon. Fan et al. [57] also applied LSTM to predict the wave height 6 h in advance using 9 years (2010-2018) of 14 buoys' data of windwave tropical cyclones to predict the typhoon trajectory. Pirhooshyaran et al. [39] chose an LSTM structure to conduct prediction of wave parameters from multistep to multistep and feature selection. These researchers used hindcast historical data from National Oceanic and Atmospheric Administration (NOAA), and the dataset included the related wave and wind features. However, the hindcast data may have missing data in specific time periods, which need to be reconstructed to compensate the dataset. Cornejo et al. [58] applied Evolution Algorithms and Bayesian optimization to address the reconstruction of missing data. Chen L. et al. [59] considered a multiperiod-ahead stacked denoising auto encoder model, which is unsupervised and uses nonlabeled reconstructed data to conduct the wind speed forecasting with the reconstructed data inputs. The training samples amounted to 20,000 within each wind speed series from 15 min to 24 h, and the period covered was six months.
Shi et al. [60] adopted continuous wavelet transforms to detect the spatial and temporal correlations of wind speed and focused on an SC-LSTM (temporal-spatial correlation LSTM) network with the data collected from the Buck City wind farm located in Washington State in the US. The comparison was conducted with a conventional back-propagation model and support vector machine (SVM) through RMSE, MAE and MAPE parameters. The dataset came from wind turbines at the wind farm, and the period ranged from 2010 to 2010 with 5 min resolution. There were 10,656 groups of data, including raining data, validation data and testing data, and the proportion was 30:2:5 (days), respectively. Wei [41] demonstrated a wind waves forecasting model using LSTM models with 2 years of historical buoy station data from NOAA; the input parameters consisted of wind speed, atmospheric pressure and surface temperature data, which were used to predict the significant wave height, wave period and mean wave direction in the short-term forecast. The result of the LSTM improved the prediction accuracy of the short-term forecast. Ahmad and Zhang [61] demonstrated the use of a sequence-to-sequence LSTM regression model to predict the wind speed using high-quality data preprocessing to obtain a stable, robust and accurate result. Data from Belgian, DSO and Elia wind farms at three sites and in different seasons were collected monthly and annually to the forecasting, and the MAE, MAPE and RMSE were compared to achieve higher accuracy. The coefficient of variation error was addressed to validate the forecasting results.
Wang [62] implemented a deep neural network model to process wave period calculations using the data from altimeters' observation parameters. The global wave reanalysis was addressed using the DNN model of altimeters and complied with the distribution of mean wave period and steepness, which could achieve a lower bias compared to the buoy data. The parameter of significant wave height data from altimeters was denoised to improve the accuracy on wave-current interaction capture. The estimate parameter still focused on the comparison of mean absolute error (MAE), root mean square error (RMSE) and scatter index (SI). Saxena [39] conducted a comparison of six deep learning-based models to perform short-term forecasting of offshore wind speed in two study sites in India. Data were preprocessed using the ensemble empirical mode decomposition (EEMD) method [40] to denoise and reduce the forecasting error, which also increased the accuracy of the models. Different hub heights were chosen for two sites to compare and evaluate the performance indicators and explain the reasons for different accuracies in datasets with different hub heights.
Different parameters have different impacts on prediction models. Dataset used also could obtain different results on forecasting models. Meanwhile, since the weights on each variable are diverse, it affects the feature extraction and the variables. The smaller the weight, the less impact of the variable for deep learning models. Tables 2-4 summarize the wave and wind forecasting models reviewed in this paper.   Raw data need to be pre-processed before using them as the input of network models. Usually, raw data may have missing values, incorrect values, noise with data, different resolution data combinations, inactive time series records and large unit difference among variables, which need to be handled with the abnormal values using denoising, normalization, filling out estimation data, deleting inactive or incorrect values, or augmentation processes.
Data preprocessing techniques are commonly used for data feature extraction, data regression and error analysis for raw data, either in time series or spatially in different frequencies or visualized images. It has been implemented in many published research papers and was used to reduce the noise and data discrepancy, reconstruct the missing data, combine noncontinuous data, delete the repeated and redundant data and correct the data to keep a consistent series of a dataset. A clean and smooth dataset can reduce the forecasting error, which is critical to the model performance. Wang et al. [37] divided pre-processing methods into two groups. The first is signal processing, which decomposes the data into several series to reduce noise. The other group is outlier detection. The wavelet-based method was used to decompose the raw data into different series to process rather than keep them at the same level [32]. Ensemble EMD [34,52], EMD [65] and mode decomposition (VMD) [77] are decomposition-based methods. Xiang [78] applied a secondary decomposition (SD) preprocessing method, which is a signal processing method, to propose a deep learning-based model with higher efficiency, to conduct wind speed and wind power forecasting. On the other hand, Wang et al. [52] chose a support vector machine to organize the raw data outliers and corrected the data with the value undergoing the estimation. It was focused on abnormal values of raw data, while signal processing methods were implemented for data decomposition and denoising. Some of the preprocessing methods are listed in Table 2.

Evaluation and Comparison Methods
Convolutional neural networks (CNNs) have been used in feature extraction in pattern recognition, auto-drive tracking, image processing and so on. The structure of CNNs enhances the ability to extract the features from target objective variables. It usually consists of different layers and is also called a "black box" with input and output. The black box (hidden layers) portion actually includes a padding layer, a convolution layer, a pooling layer, a fully connected layer (flatten layer) and dropout and activation functions (ReLU/Sigmoid). Many studies have combined CNNs with other models' structures due to its excellent feature extraction capability, which could reduce the computing cost t some extent. As Chen et al. [29] explained in their article, the CNN module worked as a feature extractor; meanwhile, it translated the raw data into intrinsic deep features. Thus, the extracted spatial feature was obtained from CNN structure, and the time series feature could be extracted from LSTM, RNN and GRU, equipped with a memorial gate to record the sequence data. The preprocessing in CNN is not much higher than other classification methods because of its training and filter characteristics extraction abilities.
On the other hand, based on the input dataset feature extraction, the CNN structure can be divided into one-dimensional, two-dimensional or three-dimensional structures [32]. If the input data of the objective are vectors, the convolution kernel (filter) is quite straightforward and is placed over the vector data with weight and bias and adds it to the nearby data value. The filter moves in one direction. When the input data are a matrix, such as an image, they can be extracted or visualized to form the matrix data based on the meteorological historical data. The images also can be input as a matrix through the color channels; meanwhile, the temporal information can be collected as time-sequential feature maps. The convolutional filter follows the dimension size of the input data and runs in two different directions when performing the filtering. A three-dimensional structure extracts spatial information with temporal data with a three-direction filter extraction in three directions. The three-dimensional structure improves the accuracy of forecasting but costs more computing time. In the existing literature, CNN has been applied as a feature extractor due to its better performance on deeper information extraction, which could be a good choice. Zhu et al. [79] predicted wind power based on a CNN with the input of historical wind farm data, which was the example of the use of a CNN with two-dimensional matrix data. Zhao et al. [66] adopted a one-dimensional CNN on input matrix and convolutional filters, which are usually effective to identify simple patterns and include few samples in each channel. The input wind speed can be transferred to two-dimensional demand, which is also adaptive on a multistep wind speed and turbulent standard deviation combination dataset. Tian et al. [80] introduced a form of RNN structure in which the effective performance of the forecasting model can be evaluated synthetically using a set of evaluation criteria. Gu and Li implemented visualized images from the wave data as the input of a regional convolutional neural network (R-CNN) to explore the correlations between variables [13]. CNN architecture and fine-tuning support gave the opportunity to deal with the huge 40-year dataset. Figure 1 explains how the CNN structure works to extract features. The wave density hotspot images (1979-2009) were applied as the input training dataset. More than 2000 candidate regions, called regions of interest (RoIs), were proposed for each image, where the score of each region was determined by the value of intersection-over-union (IoU). Only the images with values greater than the predefined threshold were selected as the pool, as shown in Figure 1. The threshold of the IoU overlap rate was identified based on different applications, which is the proportion of ground truth data area (labeled data) over the region proposal area, and a value greater than 0.5 is usually considered an acceptable candidate. The remaining proposal areas were wrapped as a square dimension in order to feed the convolution neural network to produce multidimensional feature vector. These vectors are ready be fed into a classifier. An SVM (support vector machine) and a fully convolution layer are more commonly used as classifiers. The advantages of CNN include its extraction abilities and customized flexibility, preventing time consumption and inefficiency of work. value greater than 0.5 usually is considered an acceptable candidate. The remaining pro-376 posal areas are wrapped as a square dimension in order to feed the convolution neural 377 network to produce multi-dimensional feature vector. These vectors are ready to feed into 378 a classifier. SVM (support vector machine) and fully convolution layer are more com-379 monly to use as a classifier. The advantage of CNN includes its extraction abilities, cus-380 tomized flexibility, avoiding time consuming and inefficiency work. The input of Recurrent Neural Network (RNN) has been applied in time series sen-384 sitive applications, which are usually sequential dataset in temporal and spatial, such as 385 the speech prediction, wind speed/power forecasting, audio recognition, and weather 386 forecasting. The output data could be fed back as the input of next step again. Meanwhile, 387 the data is also trained as back propagation and derive from feed forward network with 388 the update weights. Hence, RNN supports on temporal dynamics scenarios with sequence 389 dataset. The current input of RNN not only considers the current time series data, but also 390 includes the feature results from previously step learned together, which is the core point 391 that so-called memory from the traditional feed forward network. The individual unit 392 showed in Figure 2 includes the input, output, weight and bias. Through sending the pre-393 vious input to the next hidden layer, the input data can be given to next layer and "mem-394 orized". Depending on different applications, the recurrent unit could be added from any 395 steps without losing the previous weights. Back-propagation provides support on updat-396 ing the weights during the training process. RNN combining with CNN can increase the 397 training efficiency on feature extraction. However, the activation function used in RNN 398 may keep the training from keeping longer memory. Another disadvantage of RNN is the 399 gradient descent vanishing, which the gradient shrinks too small could results invalid 400 learning train and lost memory from previous steps. Hence, RNN usually has short-term 401 memory as depicts scheme. The gradient can be regards as the slope of the function, 402 steeper slope can result into a faster learning progress, when the slope close to zero or 403 start to shrink quite small, the learning stops. The weights related to the change in error 404 in each back-propagation loop also is supervised and changed by the gradient. Hence, 405 solve the vanishing issue could improve the performance and keep the memory ad-406 vantage from the RNN structure. So, the LSTM is created for converge in RNN. The input of recurrent neural networks (RNNs) has been applied in time-seriessensitive applications, which are usually temporal and spatial sequential dataset, such as speech prediction, wind speed/power forecasting, audio recognition and weather forecasting. The output data can be fed back again as the input of the next step. Meanwhile, the data are also trained using back propagation and are derived from a feed-forward network with updated weights. Hence, RNN supports temporal dynamics scenarios with sequence datasets. The current input of an RNN not only considers the current time series data but also includes the feature results from previous steps learned together, which is the core point of so-called memory from traditional feed-forward networks. The individual units shown in Figure 2 include the input, output, weight and bias. By sending the previous input to the next hidden layer, the input data can be given to the next layer and "memorized". Depending on different applications, the recurrent unit can be added to any steps without losing previous weights. Back-propagation provides support in updating the weights during the training process. RNN combined with CNN can increase the training efficiency of feature extraction. However, the activation function used in RNN may keep the training from keeping its memory for a longer period of time. Another disadvantage of RNN is the gradient descent vanishing, in which the gradient shrinks to become too small, which could result invalid learning training and lost memory from previous steps. Hence, RNN usually has short-term memory as depicted in the scheme. The gradient can be regarded as the slope of the function, such that a steeper slope can result in a faster learning progress, while if the slope becomes close to zero or start to shrink, the learning stops. The weights related to the change in error in each back-propagation loop are also supervised and changed according to the gradient. Hence, solving the vanishing issue could improve the performance and keep the memory advantage from the RNN structure. Therefore, the LSTM is created for converging in RNN. LSTM has excellent performance in dealing with long-term datasets due to th and memory cell of its architecture, which is stable and better able to overcome the fitting problem. Hence, the time series dataset, which needs either a long term or a term, could use LSTM to process the model conduction. Hu et al. [65] revealed a nonlinear algorithm to combine the LSTM with an extreme learning machine and a a differential evolution algorithm (DE) to improve the number of hidden layers to b LSTM has excellent performance in dealing with long-term datasets due to the gate and memory cell of its architecture, which is stable and better able to overcome the overfitting problem. Hence, the time series dataset, which needs either a long term or a short term, could use LSTM to process the model conduction. Hu et al. [65] revealed a novel nonlinear algorithm to combine the LSTM with an extreme learning machine and applied a differential evolution algorithm (DE) to improve the number of hidden layers to balance the structure complexity and performance. Altan et al. [26] also chose LSTM combined with the decomposition method and optimizer to conduct wind speed forecasting. The weight was estimated and optimized using grey wolf optimizer (GWO), while the data were processed using a weighted moving average (WMA) before being input into the model. They mentioned that the missing data reconstructed Kalman filters, filling through interpolation, to avoid the system accuracy offset. Even though the LSTM model solves the overfitting issue and has longer memory, it still has some drawbacks. It requires more time and bigger dataset to be trained in advance, and it has the limitation of different weight initializations.
Gated recurrent units (GRUs) [33] are a popular variant of LSTM, which replace the forget gate and the input gate with only one update gate. Because GRU has fewer parameters, training is slightly faster and requires fewer data to generalize, and GRU achieves a similar performance in multiple tasks with less computation. Ding et al. [33] demonstrated a bidimensional GRU model to address wind power prediction by extracting the wind speed error as weights and considered it as the time series weight to put as input to correct the wind speed. The estimate criteria were still focused on RMSE and MAE to evaluate the forecasting performance. The bidirectional GRU model has two gates for incorporating the new input and controlling the previous memory. Nam et al. [19] introduced a case study in Korea, in which they built a GRU model for electricity demand forecasting. Kuremoto et al. [81] constructed a deep brief network (DBN) for time series forecasting.
The hybrid model is a trend that can be used to improve forecasting accuracy and explore the advantages of different combinations of models. Combining the feature extraction method with the forecasting models allows dealing with spatial-temporal data, which could improve the efficiency and computation cost for forecasting applications. Both stacking-based models [82] and weighted-based models embed different base models. Hong [25] developed a combined model of CNN with radial basis function neural network and double Gaussian function (DGF) to handle the 24 h ahead (short-term) wind power forecasting. CNN was used for feature extraction using convolutional layers, filter, pooling layers and fully connection layers. The data were collected from the real time wind farm, and the uncertainty could be used to deal with the Gaussian function.
For multistep deep learning models, a feature extraction tool combined with LTSM was an excellent choice to improve performance. Variation mode decomposition (VMD) and singular spectrum analysis were both used to obtain the low-frequency and highfrequency sub-layers from the raw data. LSTM was adopted for low-frequency-layer forecasting from VMD-SSA, while the extreme learning machine model had the target of forecasting the high-frequency layers. Prediction was performed separately for different frequencies. Low-frequency sub layers were embedded with LTSM to process time series forecasting [83]. A hybrid convolutional LSTM model was introduced in research on multi-step-ahead short-term wind speed forecasting. The model was improved by the time varying filter-based empirical mode decomposition (TVF-EMD) as a filter to smooth noise interference [84]. Liu et al. [83] proposed a hybrid deep learning model for wind speed prediction. LSTM and Elman neural network were combined together for obtaining high-precision. Empirical wavelet transformation was used to decompose the raw wind data, which were decomposed into several sub-layers so as to input them into an Elman neural network.
Mi et al. [67] chose CNN and LSTM with a decomposed method (empirical wavelet transform) as a CNN-LSTM-EWT model to conduct the wind speed prediction, and singular spectrum analysis was applied for denoising and preprocessing the dataset. Two case studies were conducted to test and compare the performance with EMD-BP, EMD-Elman and EMD-BP models using RMSE, MAE and MAPE indicators. Table 5 summarizes the deep learning models reviewed with their advantages, disadvantages and potential applications. To evaluate the efficiency and performance of each model, a set of indicators were usually adopted and calculated according to the results as summarized in Table 6. Wind speed/power, wave scenario forecasting; typhoon/hurricane forecasting; power system optimization; energy storage size optimization Table 6. Estimate criteria in recent publications.

Indicator Equation
Mean absolute error (MAE) Note: X i andX i are the measured and output forecasting wind speed at time i, and N is the number of test data. RSME represents the standard deviation of the prediction output and the raw observation values, while MAE explains the difference between two continuous variables. MAPE considers both the error and the ratio of the measured value and the predicted value, where a lower value indicates better performance. Due to the different datasets, models and parameters, comparing different research has been done relying on R and SI [57]. The performance of different deep learning models is summarized in Table 7.

Optimization Application on Wind and Wave Energy
Application of deep learning to optimization has also been widely extended into different fields, including onshore/offshore wind farm site selection, layout optimization, wind turbine design, wave converter distribution optimization, harvesting device optimization and hybrid energy storage management. Hybrid models have also been proposed to optimize the parameters of available day-ahead forecasting of wind energy with storage powers plant and the balance with energy price and available electricity generated. This topic has been brought up by many researchers in the last two decades. After all, with the growing number of installations, the commercial utility of the application of renewable energy has been increasing spectacularly. Balancing and optimizing the available renewable energy are critical for power generation, energy market management and forecasting for the day-ahead market. The implementation of deep learning technique supports handling the uncertainty in the energy price market, which was built and simulated with a recurrent neural network for wind energy forecasting with the dataset provided with hourly time series in a percentile format and with a limitation of the upper bound [85].
On the other hand, optimization of wind and wave harvesting devices plays an important role in the energy generation field. "Searaser" is a representation of a wave energy converter to be used for modeling the prediction of power generation that is lower in price and involves no gas emission. With the support of simulated data from flow-3D, the correlation between wind speed and wind power generated by the specific converter was addressed [55]. One of the applications of harvesting devices is to improve and maximize the short-term wave power generation, which was addressed through deep learning-based model to increase the absorption of the wave energy converter and reduce the prediction error to control the negative effect on energy generation [86]. Dong [43] chose LSTM to improve the efficiency of the wind power generation system and regulation. A Gaussian mixture model (GMM) was implemented to consider the correlation between wind speed and wind direction, while a preprocessed 24 h ahead dataset was used as the input of the LSTM prediction network. Combined with the appropriate control strategy, it can keep the output smooth and stable. Hence, aligning the wind energy generation with real-time demand relied on a deep learning network to help solve the rolling optimization problem. The memories of each stop were collected, which could be enabled to refer to the previous step's result and explore the next several steps using the recorded sequence to increase the accuracy.

Pattern Recognition and Correlations Identification
Traditional image processing studies form the majority of applications in the classification research field. The growth of deep learning application has increasingly drawn the interest of researchers in the field of robotics with artificial intelligence. Pattern recognition is a separate field from deep learning in terms of image processing, and it was popular in 1970s. Even though it was the first concept used to introduce image processing, pattern recognition has gradually evolved with deep learning and big data technique to improve performance in the image processing field. Hence, the original motivation of pattern recognition is presently focused on deep learning-based applications. Pattern recognition usually needs to label the training data as the input of the neural network in most cases. This process includes data conversion, segmentation, feature extraction, classification and postprocessing. Typically, the main objective of pattern recognition in supervised learning is to provide trained predictors for analyzing future data in the model. However, semisupervised and unsupervised models can also obtain classifiers [87]. James et al. [88] applied a supervised training network to predict the ocean condition using a significant wave height and wave period dataset. Pattern recognition can also be used for object detection and image classification combined with the deep learning-based model, which helps predict and analyze the trends in objective forecasting, decision-making and correlation exploration sing trained predictors and classifier. The dataset is not limited to images, videos or digital signals, but it also can include numerical multidimensional data. It also complies with the wind and wave deep learning-based applications for feature extraction, data preprocessing and correlation discovery.
Gu and Li [13] explored wind and wave energy correlations using pattern recognition, where the dataset was image-based and visualized using hindcast wind and wave data from NOAA WAVEWATCH III. The dataset was transferred from historical data to image analysis based on a regional CNN deep learning network. Hence, the pattern recognition problem could address the correlation of variables. Pattern recognition application based on deep learning can provide a deep understanding of and different aspects to solve the correlation problem. Furthermore, it may support the identification of the correlation problem as quantification. Yang et al. [89] performed damage detection for a wind turbine blade using image recognition. The blade images were segmented to reduce the background effect for the identification, and ensemble learning classifier and data fine tuning were adopted to increase the efficiency and improve the accuracy of the model. Better performance can be obtained through a CNN model than the SVM method, verified by an unmanned aerial vehicle.
Shi et al. [60] introduced Wavelet Coherence Transformation Analysis (WCT) to reveal the cross-correlation between wind speed series and adjacent turbines instead of Pearson's correlation method. Fang et al. [90] mentioned in the published research of inter-seasonal risk analysis that the characteristics of both the frequency and time series of the wind speed could be effectively and deeply depicted. The correlations between wind turbines have been analyzed in spatial and temporal dimensions. The representation of statistical correlation analysis between parameters includes Pearson coefficient (r), whose equation is shown below, where N is size of dataset, i is index, X i and Y i are individual variables, and X i and Y i are the mean value of the individual parameters.
The Pearson coefficient is within the range of [−1, 1], and the r value represents a small correlation within [0.1, 0.3], medium correlation within [0.3, 0.5], and strong correlation within [0.5, 1]. In contrast, the r value in the opposite direction to −1 means negative correlations in the same range division. Pattern recognition is not only involved in many applications in wind and wave energy fields, but also plays a critical role in seismic analysis, healthcare, fingerprint identification and computer vision. A tremendous number of hybrid deep learning-based models have been explored in many possible fields and have been implemented in many real-time situations.

Challenges and Future Research Directions
This paper has focused on reviewing the most recently published studies on wind and wave energy applications based on deep learning models. Challenges always exist with wind and wave energy development. Because variables related to wind and wave energy have strong uncertainty at temporal and spatial scales, handling raw data with different data preprocessing methods provides many possibilities for exploring, extending and smoothing datasets. Hence, the limitation of the processing method sets an upper bound on the uncertainty of data processing. Many mathematical methods can be used to deal with raw data before using them as the structure input. Meanwhile, due to the good performance of convolutional neural networks (CNNs) in feature extraction, CNN has usually been implemented as the extraction supporter and combined with other forecasting models or higher-efficiency models in order to improve the accuracy of the structure. Thus, efficiency of feature extraction is another challenge to improving the forecasting accuracy or the abilities of the classifier. Deep learning-based feature extraction models are able to reduce complexity and computation time, and increase efficiency. On the other hand, exploring efficient, fast and accurate feature extraction structures will always be a challenge in conducting and improving the capacity of forecasting, optimization and classification. The quality of the dataset determines the accuracy and computation time to some extent according to the reviewed publications, and the size and integrity of the training data also restrict the performance of the model. The configuration of the structure supports the performance. The correlations of variables play a critical role in the process of optimizing the structure of forecasting and exploration of nonlinear correlations [91].
Future research trends and strategies can be settled by exploring, developing and extending the scope of applications of deep learning-based models. Future research should aim at solving and obtaining more practical structures and finding effective solutions to above challenges in its application. The uncertainty and fluctuation need to be analyzed sufficiently, which includes either extending new preprocessing models or combining existing methods. Furthermore, the majority of the applications have used CNN as the feature extraction tool, which is not enough for achieve complete and complex data features in different applications. Even though fine tuning increases the support of different specific uses, it is far from expanding the possibilities of renewable energy growth. Recently, most wind energy forecasting studies focus on onshore applications, while offshore wind energy applications concentrate on ocean oil and gas platforms and wave energy combination. Identifying the appropriate configuration for each application should be considered one of the primary tasks due to the complexity of wind and wave energy research.

Conclusions
Forecasting and predictions form the majority of recent applications. In wind and wave energy exploitation research, energy market price and distribution forecasting, as well as the sizing and configuration of device layout and power dispatch, are also involved in its application. The similar nature of wind and waves makes it possible to analyze and compare them with each other. Combining the wind and wave energy in a hybrid setting could reduce both the uncertainty and variation. The other reason for analyzing and comparing them is that wide implementation of wind energy forecasting and management could inspire the application of similar methods of wave energy, which could promote and accelerate the development of wave energy [92]. To motivate correlation analysis between wind and wave energy, this relationship could help reduce the complexity of wave energy prediction using wind energy parameters, which can be obtained more easily than wave variables. This solution could also solve the problem of computational cost and time. Overall, it will help to inspire the greater possibility of reducing variation and promoting the development of wave energy. A hybrid model could be an effective solution to the aforementioned challenges [58,93,94]. Stable energy output will be the final target to achieve and is the most urgent.
The paper reviewed recently published papers on wave and wind applications in forecasting, optimization and pattern recognition, which are the main foci of these two types of renewable energy. The differences in dataset are listed with the data preprocessing methods. The majority of the feature extraction methods concentrated on CNN, which is flexible and consists of unsupervised and supervised structures, together known as fine tuning. Following the forecasting models with the handled dataset using data preprocessing, the paper reviewed RNN, LTSM, GRU, DBF and hybrid models. The advantages and disadvantages were listed to help choose an appropriate model for future research. The estimation indicators include RMSE, MAE, MAPE, scatter index and correction efficiency to evaluate the performance of models [9,74]. The optimization of wind and wave application could extend to many fields of device optimization, layout optimization and energy management. Pattern recognition can actually be considered an improved technique of image processing, but it focuses on the structure of the models [95] instead of being limited to images. The future development of wind and wave energy may transfer offshore and explore more accurate models with lower computation cost. This review provides a summary and inspiration for future applications of research.