1. Introduction
Energy is widely recognized as a fundamental cornerstone for economic growth and environmental sustainability, playing a vital role in mitigating climate change and enabling the global energy transition [
1,
2]. With the gradual depletion of fossil fuels and growing international emphasis on environmental protection, renewable energy has increasingly become prioritized worldwide [
3,
4]. Among various renewable sources, wind power has emerged as one of the most promising due to its extensive availability, low environmental impact, and potential for carbon emission reduction [
5,
6,
7]. Consequently, global installed wind power capacity continues to expand significantly [
8,
9,
10].
Nevertheless, integrating intermittent wind power into conventional power systems poses significant operational challenges due to its stochastic and highly volatile nature [
11,
12]. Stable grid operation, therefore, critically depends on precise ultra-short-term wind power forecasting, typically covering a forecasting horizon from a few minutes to several hours ahead [
13,
14,
15]. Accurate predictions enable optimized scheduling, efficient market participation, and improved power system stability [
16,
17].
Traditional forecasting approaches primarily rely on statistical methods, including autoregressive integrated moving average (ARIMA) and persistence models, which are characterized by simple structures but often yield inadequate performance under volatile conditions due to linear modelling constraints [
18,
19,
20]. Subsequently, nonlinear machine learning methods, such as support vector machines (SVMs), random forests (RFs), and ELMs, have gained prominence due to their superior capability in handling nonlinear patterns [
21,
22,
23]. However, the performance of these single-model approaches typically deteriorates in scenarios characterized by rapid meteorological fluctuations or insufficient historical data [
24,
25].
Recently, deep learning methods such as LSTM, convolutional neural networks (CNNs), and TCNs have substantially improved forecasting performance by effectively capturing complex temporal and nonlinear dependencies within wind data [
26,
27,
28]. In particular, significant improvements in forecasting accuracy have been achieved through the adoption of LSTM networks, which effectively model temporal dependencies within wind power data [
27]. Nonetheless, single deep learning models may encounter instability or overfitting, thus limiting their generalization ability in scenarios involving high variability or data scarcity [
29,
30].
Hybrid forecasting frameworks that integrate complementary models have demonstrated improvement over single-model methods in short-term wind power prediction [
31,
32]. For example, pairing a deep residual network with a bidirectional LSTM [
5] or applying graph-based spatial–temporal learning [
14] yields notable accuracy gains. However, the full complexities inherent in the variety of wind power operational modes remain only partially represented in these approaches.
To effectively manage varying operational conditions, clustering methods have received significant attention in recent references as effective approaches to identifying and categorizing distinct operational modes [
12,
33]. Among these, DTW-based clustering has shown effectiveness due to its ability to temporally align sequences, enabling accurate identification of operational regimes that significantly enhance forecasting accuracy [
12,
34]. Considerable forecasting performance gains have been achieved by applying DTW-based clustering to categorize historical wind power sequences into distinct operational patterns, providing valuable insights for subsequent predictive modelling [
12].
The feature-refinement process—namely, identifying and selecting the most relevant meteorological and power-related variables—is a critical driver of forecasting accuracy [
11,
35]. Significant gains have been reported when correlation-based nonlinear regression is used to weight and filter predictors for wind speed estimation [
11], while granule-based clustering and direct optimization further enhance performance by pinpointing the most informative features [
13]. Departing from traditional ultra-short-term studies that rely almost exclusively on historical power at 15 min resolution, the present work shortens both meteorological forecasts and power samples to a sub-hour scale, thereby bolstering model reliability in the face of abrupt weather changes within the ultra-short-term horizon.
Despite these advancements, several critical challenges remain. Most existing models treat historical wind power sequences as independent inputs without considering aligned temporal structures across different days or seasons. As a result, these models often overlook latent regime-specific dynamics that arise due to varying weather patterns, operational constraints, or site-specific characteristics. Moreover, while some recent studies have introduced clustering strategies [
36], they typically rely on Euclidean distance and assume static temporal alignment, limiting their effectiveness in capturing the intrinsic variability and phase-shifting nature of wind power curves.
In addition, many deep learning methods, including LSTM [
37,
38] and CNN-based hybrid models [
39,
40], demonstrate improved accuracy over traditional statistical techniques; however, their robustness under high-volatility conditions and generalization under limited training samples are often under-reported. For example, LSTM-based models are prone to overfitting and suffer from vanishing gradient issues at ultra-short-term resolutions, especially when the forecasting horizon is less than 30 min [
41]. Similarly, graph-based spatiotemporal models [
42] enhance spatial correlation capture but may incur substantial computational overhead, making real-time deployment challenging.
To bridge these gaps, this study proposes a novel ultra-short-term wind power forecasting framework that integrates DTW-based temporal alignment clustering and relevance-driven feature refinement. The former allows for effective categorization of operational patterns with non-uniform temporal evolution, while the latter ensures regime-specific feature selection tailored to each identified cluster.
As a consequence, in this paper a novel ultra-short-term wind power forecasting framework based on temporal-alignment clustering and relevance-driven feature refinement is proposed. Compared with traditional forecasting models, the main contributions of this paper are as follows:
- (1)
A temporal-alignment clustering approach based on DTW–K-means is proposed to identify and characterize distinct operational patterns in historical wind power data, enhancing the model’s adaptability and predictive accuracy under varying operational scenarios.
- (2)
The proposed relevance-driven feature refinement method systematically analyzes the correlations among meteorological variables and historical power sequences, facilitating effective selection and weighting of critical features, thus improving the predictive performance of the forecasting model.
- (3)
A robust hybrid forecasting framework combining TCN and ELM is developed to effectively capture complex temporal dynamics and nonlinear characteristics inherent in wind power sequences, demonstrating superior prediction accuracy and robustness compared to conventional single-model approaches.
The rest of this paper is organized as follows:
Section 2 establishes theoretical methods and model structure;
Section 3 introduces the dataset and evaluation metrics;
Section 4 gives and analyzes the prediction results; and
Section 5 summarizes the paper.
2. Methodology
This section presents the core techniques developed in this paper. First, data preprocessing and feature selection are described, including the clustering of wind power patterns and grey relational feature ranking. Then, the hybrid temporal convolutional network–extreme learning machine (TCN-ELM) architecture is detailed, with emphasis on novel design elements and error correction mechanisms.
2.1. Dataset and Preprocessing and Feature Selection
Accurate input features are crucial for the performance of forecasting models. Several preprocessing steps were applied to the raw dataset to ensure the effectiveness of the model, including feature construction, selection, and normalization. The key steps are as follows:
- (1)
Wind speed is the most direct and crucial meteorological factor influencing wind power generation. Considering that wind speed measurements at turbine hub heights may introduce inaccuracies or might not fully reflect vertical variations within the wind farm, we selected wind speed data at multiple typical heights (10 m, 30 m, 50 m, and 70 m), as well as at turbine hub height. This multi-height wind speed data offers a more detailed depiction of wind shear profiles, providing richer vertical structure information for model training.
- (2)
Relying solely on wind speed magnitude and direction may fail to fully capture wind vector characteristics and their interaction with turbine blades. To better reflect wind dynamics, we calculated horizontal (typically east–west, referred to as the
component) and vertical (typically north–south, referred to as the
component) wind speed components using wind speed and direction data at various heights. The horizontal component
and vertical component
can be computed from wind speed (
) and wind direction (
, in degrees), using the following formulas:
This vector decomposition method provides the forecasting models with input features of greater physical significance, aiding the models in better capturing the wind energy conversion processes.
Feature selection is critical for reducing model complexity and improving prediction accuracy. Grey relational analysis (GRA) was utilized to rank the importance of features based on their relational grade with respect to the target variable (wind power output). The relational grade
between a reference sequence
(the wind power output) and a comparison sequence
(each feature) is calculated as follows:
where
and
are the values of the reference and comparison sequences at the
time step. Features with higher relational grades are selected for the model, effectively reducing dimensionality and improving the overall prediction accuracy.
To ensure that features with different scales did not disproportionately influence the model, min–max normalization was applied to all features. This step standardized the data by transforming all feature values into the range [0, 1], ensuring that each feature contributed equally to the model during training. The normalization formula used is as follows:
where
is the original feature value and
and
are the minimum and maximum values of the feature across the dataset, respectively. Normalization helped improve the convergence rate and stability of the training process.
The sole forecasting objective of the models is the actual power output from a wind farm. All models are trained with the aim of minimizing the prediction errors between forecasted and actual power values. By constructing the features described above, a comprehensive set of 14 original input features was obtained, including raw wind speeds and horizontal and vertical wind speed components.
2.2. Hybrid Temporal Convolutional Network–Extreme Learning Machine
The hybrid TCN-ELM architecture forms the core model developed in this study, combining the temporal feature extraction capabilities of TCNs with the rapid learning and simplicity of ELMs. This hybrid model, as follows, was designed to leverage the strengths of both techniques to address the challenges of accurate and efficient ultra-short-term wind power forecasting:
where
is the output at time step
,
are the input values at the dilated time steps
,
are the learnable weights, and
is the bias term. This allows the model to learn long-range dependencies without recursion.
The use of TCNs helps overcome the vanishing gradient problem typically encountered by RNNs, and their ability to learn multi-scale temporal features via dilated convolutions enables the model to capture both short- and long-term dependencies in wind power data.
ELM is an efficient learning algorithm based on a single hidden-layer feedforward neural network. Unlike traditional neural networks, an ELM requires no iterative training process; it uses randomly assigned weights and biases in the hidden layer and computes output weights analytically, making it computationally efficient. In the hybrid model, ELM was used to map the temporal features extracted by the TCN to the final wind power predictions.
The output of the ELM model is given by:
where
is the predicted output,
is the input feature matrix,
and
are the randomly initialized input weights and biases,
is the activation function, and
is the output weight matrix.
The rapid learning capability of ELM is particularly important for real-time forecasting applications. By using ELM, the model significantly reduces training time, making it suitable for scenarios where quick predictions are essential.
The TCN-ELM hybrid model operates in two stages:
- 1.
Feature Extraction Stage: The TCN processes the input features (such as wind speed, direction, and other meteorological parameters) and extracts temporal features from the wind power data.
- 2.
Prediction Stage: The features extracted by the TCN are passed to the ELM, which maps these high-level features to the wind power predictions.
The combination of TCN’s deep feature extraction abilities and ELM’s rapid learning creates a balanced model that achieves high accuracy while maintaining computational efficiency, making it well-suited for ultra-short-term wind power forecasting.
2.3. Improved K-Means
The high volatility and complex temporal dynamics inherent in ultra-short-term wind power forecasting pose significant challenges for traditional forecasting models. Standard clustering algorithms, such as K-means, are often limited in their ability to account for the temporal misalignment and variability present in wind power data. To address these limitations, an improved K-means algorithm that integrates DTW is proposed. This modification aims to better align time series data and improve the quality of clustering for enhanced prediction accuracy.
In traditional K-means clustering, the Euclidean distance
is used to assess the similarity between data points
and centroids
, as defined by:
where
and
represent the wind power time series data point and centroid, respectively, and
is the length of the time series.
However, this distance metric assumes that the time series data are aligned in time, which is often not the case for wind power time series where fluctuations may occur at different time points. To address this, DTW distance
is introduced. DTW calculates the optimal alignment between two sequences, allowing for time shifts and varying lengths, making it particularly suitable for wind power data. The DTW distance
between two time series
and
is computed as follows:
where
represents the warping path that minimizes the cumulative squared differences between the two sequences while considering their alignment over time. The warping path is constrained such that it follows a monotonic progression from the beginning to the end of the sequences.
In the proposed DTW–K-means clustering algorithm, the DTW distance measure replaces the traditional Euclidean distance. The algorithm begins by selecting initial centroids based on the DTW distance from the set of historical wind power time series. The centroids are updated iteratively as follows:
- 1.
Cluster Assignment: Each historical wind power curve
is assigned to the cluster
whose centroid
minimizes the DTW distance as follows:
- 2.
Centroid Update: Once the assignments are made, the centroid
of each cluster is recalculated as follows by averaging the aligned time series within the cluster:
This process continues iteratively until convergence, where the centroid no longer changes significantly.
The integration of DTW into the K-means algorithm provides a more accurate clustering of wind power time series by aligning the data temporally before clustering. By using the DTW distance metric, the algorithm is able to account for misalignments and variations in the temporal dynamics of wind power data, leading to the identification of consistent operational patterns.
These clusters form the basis for the subsequent feature refinement process, where meteorological and power sequence features are selected and weighted according to their relevance within each cluster. The refined feature set is then used as input for the TCN-ELM hybrid model, improving the overall prediction performance.
Through the integration of DTW into K-means, the clustering process is enhanced, leading to more meaningful clusters that better capture the temporal alignment of historical wind power time series. This improvement significantly contributes to the forecasting framework’s ability to handle the complex, high-variability nature of ultra-short-term wind power, ensuring improved robustness and stability across different operational scenarios.
2.4. The Proposed Forecasting Model
The model follows a structured process, as illustrated in
Figure 1, and the forecasting workflow proceeds in three successive stages:
- (1)
Data preparation and temporal clustering: Historical wind-power and meteorological records are cleaned, normalized, and then partitioned into operational regimes by the DTW–K-means method described in
Section 2.3. The resulting regime labels supply scenario context for all subsequent steps.
- (2)
Cluster-specific feature refinement: Within each regime, grey relational analysis (
Section 2.1) ranks candidate variables; the highest-ranked subset constitutes the model input, ensuring that each predictor focuses on its most influential information.
- (3)
Hybrid forecasting: The selected features are fed into a temporal convolutional network, which extracts long-range temporal patterns, and are subsequently mapped to power output by an extreme learning machine (architecture in
Section 2.2). The combined TCN–ELM predictor generates 15 min-ahead wind power forecasts.
4. Case Studies
4.1. Cluster-Level Wind Power Data Analysis Results
All 35,040 SCADA records collected at 15 min intervals from a 148 MW wind farm in Ningxia (1 January–31 December 2017) were first reorganized into 365 daily power output curves (96 points per day). Dynamic-time-warping K-means (best K = 4) was adopted to group these curves in a temporal-alignment space. The result of this clustering is illustrated in
Table 1, which presents the representative daily wind power regimes identified by the DTW–K-means method.
Figure 2.
Daily wind power curves and the four representative operational regimes identified through DTW–K-means clustering. (a) Raw daily power output sequences over the full year (365 days, 15 min resolution), showing substantial variability across seasons. Different colors are only used to distinguish individual days and do not carry specific physical meanings. (b) Cluster 1: Moderate nocturnal generation followed by afternoon decay, typically associated with synoptic-scale winter flows. (c) Cluster 2: Clear morning ramp with a stable daytime power plateau, reflecting relatively regular daily wind patterns. (d) Cluster 3: Gradual increase from low early-morning values to peak output in the evening, indicative of thermally driven diurnal circulation. (e) Cluster 4: Low baseline punctuated by irregular gust peaks, representing unstable wind conditions and high short-term variability. For (b–e), the red line indicates the mean daily profile of the cluster, while the gray lines represent the individual daily curves within the cluster.
Figure 2.
Daily wind power curves and the four representative operational regimes identified through DTW–K-means clustering. (a) Raw daily power output sequences over the full year (365 days, 15 min resolution), showing substantial variability across seasons. Different colors are only used to distinguish individual days and do not carry specific physical meanings. (b) Cluster 1: Moderate nocturnal generation followed by afternoon decay, typically associated with synoptic-scale winter flows. (c) Cluster 2: Clear morning ramp with a stable daytime power plateau, reflecting relatively regular daily wind patterns. (d) Cluster 3: Gradual increase from low early-morning values to peak output in the evening, indicative of thermally driven diurnal circulation. (e) Cluster 4: Low baseline punctuated by irregular gust peaks, representing unstable wind conditions and high short-term variability. For (b–e), the red line indicates the mean daily profile of the cluster, while the gray lines represent the individual daily curves within the cluster.
![Energies 18 04477 g002 Energies 18 04477 g002]()
Table 1 summarizes the different temporal patterns observed across the year, highlighting the distinct seasonal and diurnal characteristics of the wind power output, as categorized into four clusters. These clusters represent varying wind conditions and their corresponding power generation profiles, offering valuable insights into the wind farm’s operational behaviour throughout the year.
The alignment property of DTW ensured that peaks and valleys were synchronized before averaging, so the red centroids captured the intrinsic temporal evolution rather than simple arithmetic means. The regime count (four) offered a good balance between intra-cluster cohesion and inter-cluster separation, as validated by the silhouette index of 0.61.
Data were obtained based on the 35,040 SCADA records collected at 15 min intervals from a 148 MW wind farm in Ningxia (1 January–31 December 2017) and reorganized into 365 daily power output curves (96 points per day), and the model was trained separately on the resulting clusters: the hyper-parameter settings employed for each cluster are summarized in
Table 2.
The parameter selection strategy follows the principle of matching model complexity to pattern complexity: Cluster 4 (weak wind with sporadic gusts) requires the most sophisticated configuration with 6 TCN layers and 128 ELM neurons to capture sudden power fluctuations, while Cluster 3 (thermally driven diurnal patterns) uses a lightweight configuration with 3 TCN layers and 64 ELM neurons due to its relatively predictable behaviour. Clusters 1 and 2 adopt intermediate configurations that balance accuracy and computational efficiency.
4.2. Deterministic Prediction Performance for Typical Daily Pattern 1
Addressing the critical challenge of ultra-short-term wind power forecasting, this research systematically compared the performance of TCN, ELM, LSTM, and the novel hybrid TCN-ELM model using real-world wind farm data. During the data preprocessing stage, an innovative clustering of daily wind power curves based on DTW-based K-means was performed to distinguish different generation patterns. Simultaneously, grey relational analysis was applied to select original features, effectively reducing dimensionality and enhancing input feature quality. Experimental results clearly demonstrated the hybrid TCN-ELM model’s superior predictive accuracy and stability, as indicated by core metrics such as RMSE and capacity-normalized MAPE, significantly outperforming standalone models. This finding strongly validates the hybrid model’s ability to leverage deep learning’s robust temporal feature extraction with shallow learning’s efficient and rapid nonlinear mapping capabilities, providing notable advantages for complex sequential forecasting tasks.
The bar statistics given earlier (
Table 1) are visualized in
Figure 3a, where green diamonds represent the ground truth and coloured markers the competing models. It is evident that the naïve persistence baseline missed virtually every large ramp, the stand-alone ELM and LSTM reproduced ramp timing but suffered amplitude bias, and the proposed TCN-ELM (red squares) consistently overlapped the green trace, especially during the six major up-ramps sampled at indices 120, 240, 360, 480, 600, and 720.
Quantitatively, the hybrid attains a MAD of 2.532 MW, an RMSE of 3.700 MW, and a MAPE of 1.706%. Relative to the best single deep model (LSTM), the errors are reduced by 41.5%, 29.7%, and 41.5%, respectively, while the coefficient of determination rises from 0.973 to 0.992. Although the stand-alone ELM performs better than the plain TCN, its RMSE remains ≈ 6% higher than that of the hybrid, confirming the added value of the convolutional front-end.
Figure 3b shows the one-to-one scatter between the TCN-ELM predictions and the actual outputs. The fitted regression line (blue) almost coincides with the 45° reference, and 97.5% of the points fall within ±10 MW of the diagonal. A slight underestimation is observed for extreme high-power samples (>120 MW), owing to their limited representation in the training set. Nevertheless, the overall explanatory strength remains high (R
2 = 0.992; RMSE = 3.410 MW), confirming that the hybrid model meets the ±10 MW accuracy margin required for 15 min dispatch decisions.
Table 3 reports the deterministic errors obtained in the 15 min-ahead test set. Five evaluation indicators are reported to assess the accuracy of different models, including RMSE, MAD, MAPE, R
2, and derived accuracy. The results indicate that the hybrid TCN-ELM model consistently achieves lower errors and higher agreement with actual values, highlighting its robustness in short-term forecasting tasks.
Relative to the plain TCN, the hybrid TCN-ELM lowers MAD by 76.8%, RMSE by 72.7%, and MAPE by 76.8%, while R2 improves from 0.891 to 0.992, and the accuracy rises from 90.83% to 97.50%.
The stand-alone ELM, although markedly better than TCN, still shows an RMSE that is ≈6.4% higher and a MAD that is ≈9.5% higher than those of the hybrid, demonstrating the added value of the convolutional front-end. Both convolution-based models outperform the sequence-to-sequence LSTM by more than an order of magnitude; the latter’s high errors stem from severe overfitting and vanishing-gradient issues at the 15 min scale.
The one-to-one scatter for the TCN-ELM prediction is shown in
Figure 3b. The fitted regression line (blue) almost coincides with the 45° reference, and 97.5% of the points lie within ±10 MW of the diagonal. Slight underestimation occurs for extreme peaks (>120 MW), owing to their limited representation in the training set. Nevertheless, the diagram confirms the numerical results (RMSE = 3.700 MW, R
2 = 0.992), underlining the model’s explanatory strength.
Although the ELM performs well on this specific cluster, the hybrid TCN-ELM retains two practical advantages:
Consequently, the hybrid achieves the best balance between accuracy and robustness, trimming the baseline TCN error by roughly one-third and satisfying the grid operator’s ±10 MW tolerance for 15 min scheduling decisions.
4.3. Deterministic Prediction Performance for Typical Daily Pattern 2
In this section, the performance of four models—TCN, TCN-ELM, ELM, and LSTM—is evaluated for Typical Daily Pattern 2 using real-world wind farm data. During the preprocessing stage, daily wind power curves were clustered using DTW-based K-means, allowing for the identification of distinct generation patterns. Simultaneously, grey relational analysis was applied to select relevant features, effectively reducing dimensionality and improving the quality of the input data.
Cluster 2 (as defined in
Table 1) revealed a steady morning increase in wind power, followed by a plateau of approximately 90 MW until dusk, with persistent synoptic-scale inflow typical of spring. These characteristics typical of springtime wind patterns were effectively captured by the models, especially the TCN-ELM hybrid.
Table 4 reports the deterministic errors obtained on the 15 min-ahead test set. Five classical criteria and the derived accuracy (%) are listed; all power metrics are normalized to megawatts (MW).
The results show that the TCN-ELM hybrid model significantly outperforms the standalone models in terms of predictive accuracy and stability. The evaluation metrics, including RMSE, capacity-normalized MAPE, and R2, indicate superior performance by the hybrid model, validating its ability to combine the deep temporal feature extraction capabilities of TCN with the efficient nonlinear mapping of ELM. This makes the hybrid model highly effective for complex sequential forecasting tasks.
The TCN-ELM hybrid model achieved a MAD of 3.884 MW, an RMSE of 5.720 MW, and a MAPE of 2.617%. Relative to the best performing deep model, LSTM, the TCN-ELM reduced errors by 41.5% in MAD, 29.7% in RMSE, and 41.5% in MAPE, while the R2 increased from 0.920 to 0.967.
The TCN-ELM hybrid model delivered the highest accuracy (97.38%) and lowest RMSE (5.720 MW), proving to be the most effective model for forecasting wind power in Typical Daily Pattern 2. The results emphasize the model’s strength in handling the steady morning increase and the plateau phase observed during the spring months. By combining TCN’s deep feature extraction with ELM’s efficient mapping, the hybrid model proves capable of handling complex temporal patterns, making it a valuable tool for ultra-short-term wind power forecasting.
4.4. Deterministic Prediction Performance for Typical Daily Pattern 3
The performance of four forecasting models—TCN, TCN-ELM, ELM, and LSTM—was evaluated for Typical Daily Pattern 3 using real-world wind farm data. In the data preprocessing phase, DTW-based K-means clustering was employed to group daily wind power curves, facilitating the recognition of unique generation patterns. Furthermore, grey relational analysis was utilized to identify the most important features, which helped to reduce dimensionality and improve the quality of the input data.
Cluster 3 (as defined in
Table 1) is characterized by very low night-time power, followed by a noon-to-evening ramp to approximately 50 MW, driven by thermally driven diurnal breezes in summer. These characteristics, typical of summer wind patterns, were effectively captured by all models, with the TCN-ELM hybrid model showing the best results.
Table 5 reports the deterministic errors obtained on the 15 min-ahead test set. Five classical criteria and the derived accuracy (%) are listed; all power metrics are normalized to megawatts (MW).
The TCN-ELM hybrid model achieved a MAD of 1.552 MW, an RMSE of 2.915 MW, and a MAPE of 1.046%. Relative to LSTM, the best single deep model, the TCN-ELM, reduced errors by 68.2% in MAD, 46.5% in RMSE, and 54.6% in MAPE, while the R2 increased from 0.943 to 0.984. While the ELM model performed well, its RMSE of 3.268 MW remained approximately 150% higher than that of the hybrid model, demonstrating the added benefit of the TCN front-end.
The TCN-ELM hybrid model demonstrated the highest accuracy (98.95%) and lowest RMSE (2.915 MW), proving to be the most effective model for forecasting wind power in Typical Daily Pattern 3. The TCN-ELM hybrid’s ability to capture the low night-time power followed by a ramp to approximately 50 MW makes it an ideal choice for handling this type of wind power pattern. The model’s performance underscores its utility in ultra-short-term wind power forecasting, real-time grid operations, and decision-making processes.
4.5. Deterministic Prediction Performance for Typical Daily Pattern 4
Four forecasting approaches (TCN, TCN-ELM, ELM, and LSTM) were analyzed for their predictive performance on Typical Daily Pattern 4, which was based on real wind farm measurements. DTW-based K-means clustering was adopted in the data preprocessing step to segment daily wind power trajectories, facilitating the discovery of distinct generation modes. Important features were selected using grey relational analysis, leading to enhanced input data quality through dimensionality optimization.
Cluster 4, as outlined in
Table 1, exhibits a quasi-flat low-power trace (≤25 MW) with sporadic gust spikes, indicative of weak wind conditions primarily influenced by frontal passages. These conditions, which are typical of weak wind backgrounds, were accurately captured by all the forecasting models, with the TCN-ELM hybrid model providing the highest accuracy in prediction.
Table 6 reports the deterministic errors obtained on the 15 min-ahead test set. Five classical criteria and the derived accuracy (%) are listed; all power metrics are normalized to megawatts (MW).
The TCN-ELM hybrid model achieved a MAD of 0.752 MW, an RMSE of 1.606 MW, and a MAPE of 0.507%, resulting in the highest accuracy of 99.49%. This represents a significant improvement over the standalone TCN model, which achieved a MAPE of 4.157% and an accuracy of 95.83%. Compared to the LSTM, the TCN-ELM reduced errors by 68.2% in MAD, 46.5% in RMSE, and 54.6% in MAPE, while increasing the R2 value from 0.978 to 0.989.
The TCN-ELM hybrid model’s performance underscores its strength in accurately predicting low-power periods typically observed during weak-wind conditions. It delivers the lowest RMSE and the highest accuracy, making it the most effective model for forecasting wind power in such regimes. Its superior ability to handle low-wind power dynamics, along with its robustness across varying intra-day conditions, makes it highly suitable for real-time operational forecasting and decision-making in wind power systems.
4.6. Enhancing TCN-ELM Forecasting Accuracy with DTW–K-Means Clustering
In
Section 4.1,
Section 4.2,
Section 4.3,
Section 4.4 and
Section 4.5, the dataset was partitioned using DTW–K-means into four representative daily operating regimes. Within each regime, the TCN-ELM model was independently trained and evaluated, resulting in RMSE values between 1.6 MW and 5.7 MW (arithmetic mean ≈ 3.5 MW) and an average coefficient of determination of approximately 0.98. These results demonstrated that, once segmentation was applied, the convolutional encoder and the randomized ELM output layer were able to capture regime-specific temporal dynamics with high fidelity.
The application of DTW–K-means clustering was compared with that of traditional K-means clustering in ultra-short-term wind power forecasting. Traditional K-means clustering typically uses Euclidean distance to measure the similarity between data points, which does not account for the temporal variations and high volatility inherent in wind power data. In contrast, DTW–K-means, by employing DTW to calculate the similarity of time series, takes into consideration temporal variations and alignment and thus allows for a more accurate capture of operational patterns and temporal characteristics in wind power data.
Building on this regime-level performance, this section focuses on an experiment in which the DTW–K-means clustering stage was replaced by traditional K-means clustering, and a single TCN-ELM model was trained on the entire dataset. The errors resulting from this experiment were compared with those from the DTW–K-means clustered case to quantify the accuracy gains attributable to DTW–K-means segmentation in ultra-short-term wind power forecasting.
To visualize the quantitative impact of replacing DTW–K-means clustering with traditional K-means clustering, the principal deterministic error metrics obtained under the two experimental settings are presented in
Table 7. This table contrasts the performance of the DTW–K-means clustered model with that of the traditional K-means clustered model, providing an at-a-glance assessment of the accuracy improvements achieved by DTW–K-means segmentation.
As seen in
Table 7, the results indicate a marked improvement in the performance of the TCN-ELM model when DTW–K-means clustering is applied. The RMSE values for DTW–K-means clustering are consistently lower than those for traditional K-means clustering, with RMSE values of 3.700 MW (Cluster 1), 5.720 MW (Cluster 2), 2.915 MW (Cluster 3), and 1.606 MW (Cluster 4), compared to 10.229 MW (Cluster 1), 14.936 MW (Cluster 2), 16.570 MW (Cluster 3), and 19.604 MW (Cluster 4) for traditional K-means clustering.
Moreover, the accuracy of the TCN-ELM model using DTW–K-means clustering is significantly higher across all clusters, with accuracy ranging from 98.29% to 99.49%, compared to 88.01% to 93.91% for the model using traditional K-means clustering. These results demonstrate that DTW–K-means clustering not only enhances the accuracy of the model but also improves its robustness, especially in scenarios with high volatility and complex temporal patterns.
DTW–K-means clustering significantly improves model forecasting accuracy, particularly in operational regimes characterized by high volatility and complex temporal patterns. In contrast, traditional K-means clustering underperforms in these high-variability and complex patterns, as indicated by a significant increase in RMSE values and a decrease in accuracy. This verifies the advantage of DTW–K-means clustering in wind power forecasting. DTW–K-means enhances predictive accuracy by capturing the temporal alignment and dynamic changes in wind power data, particularly in high-volatility scenarios. In contrast, traditional K-means fails to account for temporal alignment, resulting in lower predictive accuracy, especially in high-variability and complex operational modes.
5. Conclusions
Addressing the critical challenge of ultra-short-term wind power forecasting, this research systematically compared the performance of TCN, ELM, LSTM, and the novel hybrid TCN-ELM model using real-world wind farm data. During the data preprocessing stage, an innovative clustering of daily wind power curves based on DTW-based K-means was performed to distinguish different generation patterns. Simultaneously, grey relational analysis was applied to select original features, effectively reducing dimensionality and enhancing input feature quality. Under the clustered setting, the hybrid TCN-ELM was found to deliver the lowest RMSE and capacity-normalized MAPE, while a coefficient of determination near 0.98 was maintained; when the clustering stage was omitted, deterministic errors increased markedly, thereby confirming the significant contribution of regime-aware modelling to predictive fidelity. Accordingly, it is indicated that coupling a deep convolutional encoder for temporal pattern extraction with a lightweight ELM output layer for rapid nonlinear mapping enables a balanced trade-off between accuracy and computational efficiency. Although the reported gains were observed on a specific dataset and may vary under different degrees of wind regime heterogeneity, the proposed method offers a practicable route toward more reliable real-time forecasting, thus providing technical support for grid dispatch, market participation, and wind farm operational scheduling. Future work could be extended by exploring adaptive clustering thresholds, integrating probabilistic postprocessing, and examining explainability tools to enhance model transparency and operator trust.