1. Introduction
With the excessive exploitation of fossil fuels, global environmental pollution and energy crisis issues are becoming increasingly severe. Wind power, as a type of clean and renewable energy, not only aligns with the objectives of the Paris Agreement [
1] but also plays a pivotal role in the global energy transition toward sustainable and low-carbon energy systems. Offshore wind energy generation, particularly in deep waters, presents a promising and scalable solution due to the presence of strong and consistent wind resources, as well as the capability of transporting large engineering facilities while minimizing noise and visual pollution for the public [
2]. Floating offshore wind turbines (FOWTs) are especially suited for deep-water installations where seabed-fixed turbines are not feasible. Thus, the dynamic analysis of FOWT responses is essential for optimizing structural design and ensuring reliability.
However, accurate and efficient response analysis for FOWTs is challenging due to the need to account for the coupled effects of aerodynamics, structural dynamics, hydrodynamics, servo control, and mooring systems. Significant efforts have been made by researchers to address this issue. Open-source fatigue, aerodynamics, structures, and turbulence modeling (OpenFAST), developed for the dynamic analysis of wind turbines, has gained widespread application due to its open-source nature. It originated from the fatigue, aerodynamics, structures, and turbulence modeling (FAST) program, where Jonkman [
3,
4] developed a hydrodynamic module to enable coupled aero-hydro-servo-elastic analysis of FOWTs. Nevertheless, OpenFAST primarily relies on frequency-domain analysis and overlooks the dynamic effects of moorings, which can lead to inaccuracies in highly nonlinear wave conditions or scenarios involving significant structural motion. To overcome these limitations, other simulation tools have been integrated to enhance hydrodynamic modeling capabilities [
5,
6,
7,
8,
9,
10,
11]. For instance, advanced quantitative wave analysis (AQWA) software has been integrated with FAST to simulate the floating foundation and mooring systems of FOWTs, leveraging AQWA’s ability to handle the nonlinear dynamics of substructures [
11]. While these combined approaches are capable of producing accurate results, the underlying finite element analysis (FEA) and computational fluid dynamics (CFD) are inherently slow due to their implicit nature. This limitation becomes particularly pronounced in fatigue and ultimate response analyses, which require numerous simulations, leading to prohibitive computational times. Hence, there is an urgent need for more efficient methods while maintaining the same level of accuracy for the dynamic analysis of FOWTs.
In recent years, machine learning (ML) has gained prominence as a promising approach to tackle the highly nonlinear and complex coupled relationships inherent in engineering problems. Among the various ML techniques, artificial neural networks (ANNs) have found extensive application in offshore engineering. An efficient hybrid procedure that combines ANN with finite element methods (FEM) has been proposed, using surge, sway, and heave motions of floating production storage and offloading (FPSO) units as inputs to accurately predict the top tension of mooring lines [
12]. Another study examined the dynamic response of a buoy supporting riser (BSR) and successfully employed ANN to estimate its tension under varying sea states, demonstrating strong performance with a small relative error compared to traditional FEM analyses [
13]. Further investigations have shown the utility of ANN in predicting tensions and bending moments for fatigue analysis of steel catenary risers (SCRs), indicating the versatility of ANN in different wave environments [
14,
15,
16]. A similar methodology has been utilized to conduct comprehensive fatigue analyses of mooring lines by predicting their top tensions based on surge, sway, heave, roll, pitch, and yaw motions. Notably, this approach achieved a deviation of only 1.6% in predicted fatigue life when compared to FEM results [
17]. Building on these advancements, a new ANN-based procedure was proposed, enabling fresh predictions without the need for an additional dynamic analysis [
18]. Further enhancements to the model have incorporated the ability to account for the directionality of environmental loads [
19,
20]. In addition, three novel schemes have been introduced to improve the generalization of the ANN model [
21]. As an important branch of ML, ensemble models such as random forest and gradient boosting have also gained significant traction in various machine learning applications due to their strong generalization capabilities and computational efficiency. Random forest constructs multiple independent decision trees, aggregating their predictions to reduce variance and improve model robustness. In contrast, gradient boosting builds trees sequentially, with each new tree aiming to correct the errors of the previous ones by minimizing residual errors, often leading to superior predictive accuracy. These two methodologies have been effectively applied in forecasting wind power and modeling the structural response across diverse environmental conditions [
22,
23,
24]. However, the main limitation of traditional ML methods lies in their ability to handle a limited number of input features. The performance of these methods may degrade when faced with highly nonlinear and complex systems like FOWTs, where multiple interacting variables influence the system’s response with unknown contributions beforehand.
Deep learning (DL), a rapidly emerging branch of machine learning driven by advancements in computational power, has revolutionized predictive modeling, particularly for complex and high-dimensional problems. DL excels at extracting features and capturing intricate patterns across multiple layers. This enables it to model the highly nonlinear and coupling relationships between numerous variables effectively, which is suited for applications like time-series forecasting, computer vision, and natural language processing [
25]. As a novel technology, the application of DL in predicting the dynamic response of FOWTs remains relatively limited. For instance, a stacked sequential model comprising five layers, with a total of 500 neurons, has been employed to predict the mooring’s tension of a FOWT model [
26]. Similarly, long short-term memory (LSTM) networks have been tested for predicting the dynamic behavior of mooring systems [
27,
28,
29]. Additionally, a multilayer perceptron (MLP) model featuring three layers of 768 neurons has demonstrated its accurate identification of the tower top acceleration and tower root force [
30]. A gated recurrent unit (GRU) model was subsequently applied and found to outperform both backpropagation neural networks (BPNN) and LSTM networks in predicting flapwise moments and platform pitching [
31]. Recently, the hybrid convolutional neural network-GRU (CNN-GRU) model has gained attention for its ability to predict dynamic responses in FOWTs. However, existing approaches exhibit notable limitations. One study is restricted to shutdown conditions, utilizing only wave loads as the sole input for response prediction, which is not directly applicable to FOWTs during operational conditions [
32]. Another approach incorporates additional features into the deep learning model, but it excludes past response values, relying solely on current features [
33]. However, the inclusion of hydrodynamic forces on the platform introduces computational complexities, as the model requires additional time to compute these forces for each new prediction.
This paper presents a novel CNN-GRU deep learning model specifically designed to predict the pitch response of FOWTs. In contrast to previous studies, our model not only selects multiple relevant input features but also incorporates wave elevation and historical response values. This innovative design improves the efficiency of the model, enabling faster predictions once the training phase is complete. Meanwhile, the CNN component effectively extracts the coupling relationships among various features, while the GRU component captures the temporal dependencies between multiple input features and output responses. Dynamic analyses are performed by integrating the FAST and AQWA programs to generate accurate baseline data for the model’s predictions. In this study, two FOWTs with distinct floating foundations, namely OC4 and Umaine, are examined and compared using the same deep learning model. Several configurations, including different memory lengths, sample sizes, and optimization algorithms, are tested to identify the optimal solution. Finally, the optimal model is interpreted using SHAP, providing insights into the contribution of each feature to the predictions.
The case studies demonstrate that the proposed CNN-GRU approach achieves accurate and efficient predictions for the pitch response of FOWTs, exhibiting strong correlation and minimal discrepancies. Additionally, the optimal configuration is identified, indicating its suitability for different FOWTs. Furthermore, the most relevant features are selected through SHAP interpretation, which will guide researchers in selecting fewer features for simplicity or optimization of models in future studies.
2. Methodology
This section outlines the framework of the deep learning model employed for the response prediction of FOWTs. It begins with a review of CNNs and GRUs to explain their underlying mechanisms. Following this, the proposed hybrid method is presented in detail, showcasing its integration and application.
2.1. Review of Convolutional Neural Networks
Convolutional neural networks [
34] are a specialized class of neural networks most commonly used for analyzing grid-structured data such as time series and images. Time-series data can be regarded as a 1D grid of samples taken at regular time intervals while image data can be thought of as a 2D grid of pixels. Convolutional neural networks can effectively preserve the grid structure of the input, enabling the network to capture spatial or temporal dependencies in the data.
A typical CNN consists of three stages of data processing: the convolution stage, detector stage, and pooling stage, which transform the input data into a refined set of output features.
2.1.1. Convolution Stage
The first stage applies an affine transformation to the input data. In this process, a set of learnable filters, known as kernels, is convolved over the input, extracting feature maps that highlight important characteristics such as edges, textures, or other local features. For two-dimensional input data, the equation for convolution operation is as follows:
where
I is the input data,
K is the filter (or kernel), and
S is the output referred to as the feature map.
In traditional signal processing, the convolution operation involves flipping the kernel relative to the input; as m increases, the index into the kernel decreases but the index into the input increases. However, in deep learning frameworks, this flipping is typically omitted, and the kernel is directly applied to the input during convolution by using a related function referred to as the cross-correlation:
During the convolution stage, each output is computed as a weighted sum of the input values within the filter’s receptive field, as illustrated in
Figure 1. This process leverages three key principles to enhance the efficiency and performance of deep learning systems: sparse interactions, parameter sharing, and equivariant representations. Sparse interactions mean that each filter only interacts with a small local region of the input, reducing computational cost and capturing local features more effectively. Parameter sharing ensures that the same filter is applied across different regions of the input, leading to fewer parameters and improved generalization. Equivariant representations mean that if the input shifts, the output shifts accordingly, allowing CNNs to detect features regardless of their location in the input. These properties make convolution highly effective for tasks such as image and time-series analysis.
2.1.2. Detector Stage
After the convolution operation, the detector stage introduces nonlinearity through the application of an activation function, typically a rectified linear unit (ReLU) or other nonlinear functions such as sigmoid or tanh. The ReLU function is defined as f(x) = max(0, x), which means that all negative values are replaced by zero while leaving positive values unchanged. Compared with sigmoid or tanh, ReLU is more computationally efficient as it only outputs zero for negative values. Meanwhile, it does not saturate for positive values, helping mitigate the vanishing gradient issue during backpropagation in the training of deep networks. However, if many neurons only receive negative values during training, they will “die” and stop updating their weights, leading to non-learning neurons. To fix the “dead neuron” problem, an improved function called Leaky ReLU is proposed [
35]. The function is defined by the following equations:
where
α is a small constant, typically set as 0.01, ensuring the negative values are not completely ignored.
2.1.3. Pooling Stage
Generally, the last stage of CNNs is the pooling stage. This stage performs a down-sampling operation to reduce the spatial dimensions (height and width) of the input features, using methods such as selecting the maximum value from a defined region, known as max pooling, or computing the average of the values in that region, referred to as average pooling. Max pooling can capture the most prominent features by selecting the maximum value while average pooling provides a smoother representation by using the average value but may sometimes dilute significant information. The pooling stage reduces the number of parameters, and consequently decreases the computational complexity. Meantime, the risk of overfitting is alleviated by concentrating on the most important features. As a result, CNNs become more robust as they are invariant to small translations in the input data. On the other hand, dropout is also aimed at preventing overfitting. This method randomly “drops out” or deactivates a fraction of neurons (typically 20–50%) during the training process, forcing the model to learn more generalized representations of the input.
2.2. Review of Gated Recurrent Units
Gated recurrent units (GRUs), proposed by Cho et al. [
36], are a powerful variant of recurrent neural networks (RNNs) to deal with sequential data. GRUs introduce an innovative gating mechanism to regulate the sequential data by deciding what information to discard and what to pass on to future time steps. Due to the gating mechanism, GRUs overcome the limitations of traditional RNNs in recognizing long-term dependencies in the sequential data and mitigating the vanishing gradient problem. Compared to the more complex architecture of RNNs, known as long short-term memory (LSTM) networks, GRUs have fewer parameters, making them more computationally efficient and helping to reduce the risk of overfitting to a certain extent.
The gating mechanism of GRUs is governed by two main gates: the update gate and the reset gate, as illustrated in
Figure 2. The update gate determines the amount of information from the previous hidden state to be carried over to the current hidden state. The update gate,
zt, is defined as follows:
where
ht−1 is the previous hidden state and
xt is the current input.
Wz is the weight matrix, which is learned during the training process of the neural network. σ represents the sigmoid function, which increases monotonically within the range of (0,1) with an “S” shape.
The reset gate controls how much of the past information to omit with a similar equation:
where
Wr is the weight matrix that needs to be learned during the training. In the formulation, when
rt is close to 0, the previous hidden state is disregarded, forcing the network to focus on the current state; on the other hand, when
rt is close to 1, the previous hidden state is fully considered.
Next, the candidate activation
is computed as:
where
Wh is the weight matrix to be learned and
represents element-wise multiplication.
is the hyperbolic tangent function within the range of (−1,1).
The actual activation
ht is calculated as a weighted sum of the previous hidden state and the candidate activation:
Owing to the capability of capturing temporal dependencies in sequential data, GRUs are successfully employed in time-series prediction, natural language processing, and various other fields.
2.3. CNN-GRU Hybrid Deep Learning Model
Considering the complexity of FOWTs’ response prediction in time series, this paper proposes a CNN-GRU model that integrates CNNs and GRUs to leverage the strengths of the above two deep learning techniques. The detailed architecture is depicted in
Figure 3. CNNs are employed to extract the coupling relationships between different inputs while GRUs are designed to handle temporal dependencies between various inputs and the structural response.
Nine variables in time series are selected as the input of the deep learning model, which are rearranged in a matrix of 3 × 3. The CNN part includes three 2-dimensional convolutional layers, each with a kernel size of 2 × 2 and a stride of 1, ensuring that the receptive field of the final layer covers the entire area of the input matrix. The padding is configured as “same” to guarantee that edge information is fully preserved. The filter numbers in the three layers are 8, 16, and 32, respectively. The progressively increasing numbers of filters enable the model to learn more complex features as the network goes deeper. In addition, each CNN layer incorporates a Leaky ReLU activation function and a max pooling operation.
In the GRU part, 64 hidden neurons are set in the GRU layer to provide adequate pattern recognition capability while maintaining manageable complexity.
Typically, a fully connected network is utilized as the final layer in various deep learning models, serving a crucial function to utilize the features extracted in past layers for accurate prediction. In this model, the final fully connected layer consists of 128 neurons and is responsible for outputting the predicted structural response.
In the training process, the loss function is a key component that measures how well the model’s predictions match the actual target values. Based on this evaluation, the optimizer adjusts the model’s parameters (i.e., weight matrices) to minimize the difference between the predictions and the true values. For classification tasks, cross-entropy loss is commonly used, whereas mean squared error (MSE) is typically chosen for regression problems for its sensitivity to outlier data and smooth gradient during backpropagation.
In addition to MSE, four additional metrics are included for model evaluation: correlation coefficient (CC), mean absolute percentage error (MAPE), symmetric mean absolute percentage error (SMAPE), and coefficient of determination (denoted as R
2), as presented in Equations (8)–(11), respectively. Here, the correlation coefficient refers to Pearson’s correlation coefficient, which is the most widely used measure of linear correlation. It ranges from −1 to +1, where values closer to +1 or −1 indicate a stronger linear relationship, and values near 0 indicate little to no linear relationship. It provides a clearer measure of the strength of the linear relationship between variables, making it particularly useful for evaluating model performance in machine learning. MAPE measures the model’s accuracy by calculating the percentage difference between prediction and actual values. SMAPE can provide a symmetric view of overestimate and underestimate as it divides by the average of actual and predicted values. The coefficient of determination, R
2, is widely used to assess how well the model captures the relationship between predictors and the actual response. A value close to 1 indicates a strong fit, meaning the model explains most of the variance in the response variable. Conversely, a value close to 0 suggests that the model’s predictions are no better than simply using the mean value of the response variable. Additionally, a negative R
2 indicates that the model performs worse than a simple mean model, often pointing to issues such as model misspecification or overfitting.
where
yi is the true value,
is the predicted value, and
and
are the mean values of the true and predicted values, respectively.
4. Results and Discussion
The preprocessed data mentioned above are fed into the CNN-GRU model to optimize the weight matrices. During the training process, the data are split into two parts: 80% is allocated for training and 20% for validation. The validation data are utilized to gauge the model’s performance on unseen data and to mitigate the risk of overfitting.
Figure 7a,b depicts the changes in training and validation loss as the number of epochs increases. The model with the lowest validation loss is saved during the training process for subsequent predictions. In the case of OC4 and Umaine, the training loss and validation loss of the saved model are 2.6657 × 10
−4, 2.6098 × 10
−4 and 4.5337 × 10
−4, 2.1228 × 10
−4, respectively. The validation losses in both cases are quite low, even lower than the training loss, indicating that the models effectively capture generalizable patterns and perform well on new data. One particular simulation for each case is supplied into the models to detect the accuracy of the pitch prediction.
Figure 8 shows that the predictions for both cases closely align with the trend of the real values, with only minimal discrepancies.
To investigate the effect of different model settings on the two types of floating wind turbines, this study explores different memory lengths, training data sizes, and optimizers. Then the trained models are tested on a consistent set of 10 simulations for comparison. The prediction accuracy is measured using five metrics: correlation coefficient (CC), mean squared error (MSE), mean absolute percentage error (MAPE), symmetric mean absolute percentage error (SMAPE), and coefficient of determination, R2.
4.1. Different Memory Length
Different memory lengths, namely 50 s, 40 s, 30 s, 20 s, and 10 s, are set for the training of the CNN-GRU model, with a uniform sample size of 40 to predict the pitch response of the OC4 and Umaine cases, as listed in
Table 4. From the values of the five metrics, the deep learning models with different memory lengths for both FOWTs demonstrate accurate predictions, characterized by strong correlations and very small differences, as indicated by the MSE. In both cases, as the memory length increases from 10 s to 40 s, the MSE, MAPE, and SMAPE decrease while R
2 increases. However, when the memory length is extended from 40 s to 50 s, MSE, MAPE, and SMAPE start to rise, accompanied by a decrease in R². This trend may be attributed to an increase in parameters leading to overfitting. On the other hand, as the memory length varies, the correlation coefficient remains relatively stable, fluctuating around 0.99. Thus, the optimal memory length is determined to be 40 s for both cases.
4.2. Different Sample Size
While keeping the memory length at 40 s, the sample size of the model is adjusted from 10 to 40. The results are presented in
Table 5. In both cases, MSE, MAPE, and SMAPE essentially decrease as CC and R
2 increase with the sample size, which aligns with our expectations. With more samples, the model can better learn the relationship between inputs and outputs. However, taking into account both accuracy and training speed, a sample size of 40 is selected as the final choice.
4.3. Different Optimizer
After setting the memory length to 40 s and the sample size to 40, three different optimizers, i.e., stochastic gradient descent (SGD), adaptive moment estimation (Adam), and Nesterov-accelerated adaptive moment estimation (Nadam), are applied to the CNN-GRU model to evaluate their prediction performance. The results are provided in
Table 6. Across both cases, the proposed model with the Nadam optimizer is the most accurate, followed by those with Adam, while the SGD optimizer consistently produces the least accurate results. This result demonstrates the superiority of Nadam, which benefits from both adaptive learning rates and Nesterov momentum. The adaptive learning rate allows the optimizer to adjust based on the gradient at each step, improving convergence efficiency. Simultaneously, the anticipatory nature of Nesterov momentum enables the optimizer to “look ahead” in the gradient’s future direction, helping to mitigate issues like overshooting and oscillation. While Adam also incorporates adaptive learning rates, it lacks the anticipatory updates provided by Nesterov momentum. Conversely, SGD employs Nesterov momentum but uses a fixed learning rate, limiting its flexibility. These combined advantages of Nadam allow the model to converge to an optimal solution more efficiently, which is particularly important for addressing the complexity of this problem.
4.4. Feature Contribution Evaluation
In traditional machine learning models, the contribution of features is often more straightforward to assess. For example, in a linear regression model, a large absolute weight value indicates a strong influence on the output and a positive weight value represents a positive impact on the output. However, it is challenging to evaluate the impact of features in a deep learning model due to its complex architecture with multiple layers and nonlinear interactions between features, acting as a black box.
In this study, an interpretability method known as SHAP (SHapley Additive exPlanations) is utilized to “open” the black box and provide insights into the contribution of each feature to the prediction. The method is based on the principles from cooperative game theory, specifically Shapley values. The core idea is to assess the marginal contribution of a feature to the prediction by measuring how much the prediction changes when the feature is included versus when it is not included.
The input features of the CNN-GRU models are evaluated using SHAP.
Figure 9a,b illustrates the feature interaction heatmaps for OC4 and Umaine, respectively. The SHAP values for the nine features are computed for each sample, taking into account their individual values, which are represented by distinct colors. Additionally, the mean SHAP value for each feature is calculated to compare their contributions to the output, as shown in
Figure 10. Although the contributions of features differ between the two cases, the past pitch response of the floating platform, fore-aft bending at the base of the tower, fore-aft shear at the base of the tower, and wave elevation emerge as the most significant features influencing the present pitch response. In contrast, tower fore-aft displacement, fore-aft shear at the top of the tower, rotor torque, side-to-side bending at the top of the tower, and rotor thrust have a comparatively lesser impact.
The identified significant features are critical because they directly relate to the structural dynamics and environmental conditions impacting the platform pitch response. Conversely, features like tower fore-aft displacement and rotor torque show lesser contributions, suggesting they may have more indirect or less critical roles in this pitch prediction. This distinction in feature importance underscores the necessity of focusing on relevant factors for accurate modeling and prediction, particularly in complex systems like floating wind turbines, where environmental interactions are paramount. These insights can guide future research and the optimization of predicted models in similar applications.
4.5. Robustness and Comparative Evaluation
To assess the robustness and reliability of the optimized CNN-GRU model, two additional representative sea states from the east coast of the USA [
39] (denoted as sea state 1 and sea state 3) were considered, alongside the sea state used in the aforementioned parametric studies (denoted as sea state 2), as detailed in
Table 7. The simulated data of the FOWTs under the two sea states are used in the same manner for the training and prediction of the proposed model, ensuring consistency across all cases. As shown in
Table 8, the proposed model exhibits outstanding performance for both types of FOWTs across all sea states, with an average correlation coefficient (CC) of 0.9962, an average coefficient of determination (R²) of 0.9864, and consistently low values of MSE, MAPE, and SMAPE.
The efficiency of the proposed model is further evaluated by comparing its performance with two ensemble models, namely random forest (RF) and gradient boosting (GB), under the same sea state 2 conditions. The RF and GB model both use 100 trees (estimators), a minimum sample split of 2, and a minimum sample leaf of 1, with the former having no maximum tree depth and the latter having a maximum tree depth of 3 and a learning rate of 0.1. As shown in
Table 9, the CNN-GRU model outperforms both RF and GB in terms of higher CC and lower values of MSE, MAPE, and SMAPE. Notably, the CNN-GRU model also achieves a significantly higher coefficient of determination (R²), demonstrating its superior suitability for accurately predicting the dynamic response of FOWTs. From
Figure 11, the CNN-GRU model displays superior performance at both the peak and valley points. Furthermore, the computational efficiency of the CNN-GRU model is significantly superior, with its training time approximately half that of the RF and GB models. The computational system is equipped with an AMD Ryzen 5 5600X 6-core CPU, an NVIDIA RTX 2080 Super GPU with 8 GB of VRAM, and 16 GB of RAM. This can be attributed to the inherent parallelism of deep learning models, which are highly optimized for parallel computation, particularly when utilizing GPUs, thereby accelerating the training process. In contrast, both RF and GB rely on sequential decision tree construction, a process that becomes increasingly time-consuming as the number of trees grows, thus contributing to the longer computational time for these models.
However, the CNN-GRU model still has certain limitations. These include its reliance on complex convolutional and recurrent layers, which require high-performance GPUs to perform efficiently. Additionally, in contrast to ensemble models like RF and GB, the internal mechanisms of the CNN-GRU model are more intricate and lack straightforward interpretability. As a result, understanding the importance of individual input variables often requires supplementary methods, such as SHAP, to assess feature contributions.
5. Conclusions
This paper presents a novel deep learning-based approach, referred to as the CNN-GRU model, for predicting the response of floating wind turbines in the time domain. This model harnesses the strengths of both convolutional neural networks (CNNs) and gated recurrent units (GRUs). The CNN component is specifically designed to extract the coupling relationships among various features, while the GRU component excels at capturing the temporal dependencies between input features and output responses. This approach is successfully applied to two different types of floating wind turbines, i.e., OC4 and Umaine, achieving high accuracy with minimal discrepancies and strong correlation.
To identify the optimal model for this case, multiple configurations regarding memory lengths, sample sizes, and optimizers were explored. The accuracies of different settings were compared using five metrics: correlation coefficient (CC), mean squared error (MSE), mean absolute percentage error (MAPE), symmetric mean absolute percentage error (SMAPE), and coefficient of determination (R2). The optimal memory lengths for both OC4 and Umaine were found to be 40 s; shorter memory lengths slightly reduced the accuracy, while longer memory lengths tended to cause overfitting, which also negatively impacted accuracy. Additionally, accuracy increased with larger sample sizes; however, the sample size was determined to be 40 to balance training speed and efficiency. Furthermore, three optimizers, i.e., SGD, Adam, and Nadam, were tested. The Nadam optimizer demonstrated the best performance, benefiting from both adaptive learning rates and the anticipatory nature of Nesterov momentum.
The optimal model was interpreted using SHAP to provide insights into the contribution of each feature to the prediction. The results indicate that the past pitch response of the floating platform, fore-aft bending at the base of the tower, fore-aft shear at the base of the tower, and wave elevation are the most significant features influencing the present pitch response. These features are directly related to structural dynamics and environmental conditions, in contrast to tower fore-aft displacement, fore-aft shear at the top of the tower, rotor torque, side-to-side bending at the top of the tower, and rotor thrust. These insights can guide researchers in selecting fewer features for simplicity or the optimization of models in future studies.
To evaluate the robustness and reliability of the proposed model, three distinct sea states were considered, where the model consistently demonstrated accurate predictions for both types of FOWTs. Furthermore, in a comparative analysis with random forest (RF) and gradient boosting (GB) models, the CNN-GRU model exhibited superior performance, highlighting its superior suitability for accurately predicting the dynamic response of FOWTs.
The proposed method has the advantage of rapid prediction once training is complete, which can be attributed to its explicit formulation. In contrast, traditional finite element methods are substantially slower due to their implicit formulation. Therefore, the proposed method is well-suited for rapidly generating numerous samples required for fatigue analysis or ultimate response analysis. Furthermore, the framework—encompassing feature selection, data processing, CNN-GRU model construction and optimization, and SHAP interpretation—can be promoted for solving a wide range of engineering problems with deep learning models.