PatchTST Coupled Reconstruction RFE-PLE Multitask Forecasting Method Based on RCMSE Clustering for Photovoltaic Power

Qu, Yiyang

doi:10.3390/electronics14234613

Open AccessArticle

PatchTST Coupled Reconstruction RFE-PLE Multitask Forecasting Method Based on RCMSE Clustering for Photovoltaic Power

by

Yiyang Qu

School of Electrical Engineering, Southwest Jiaotong University, Chengdu 610031, China

Electronics 2025, 14(23), 4613; https://doi.org/10.3390/electronics14234613

Submission received: 4 September 2025 / Revised: 12 November 2025 / Accepted: 19 November 2025 / Published: 24 November 2025

Download

Browse Figures

Versions Notes

Abstract

With the rapid growth of photovoltaic (PV) installed capacity, accurate prediction of PV power is crucial for the safe and flexible operation of power grids. However, PV output sequences exhibit strong non-stationarity and a superposition of high-frequency disturbances and low-frequency trends, resulting in multi-frequency aliasing. Traditional models struggle to capture both long-term dependencies and short-term details, while multi-task learning (MTL) often suffers from negative transfer, limiting prediction accuracy. This paper proposes a hybrid PV power forecasting framework integrating complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN), PatchTST reconstruction, and progressive layered extraction (PLE) MTL. First, conventional models tend to prioritize learning low-frequency features while ignoring weak high-frequency signals under multi-frequency aliasing, which cannot meet the requirement for precise frequency-sensitive PV power prediction. To address this problem, CEEMDAN is employed to decompose the PV sequence into intrinsic mode functions (IMFs). Next, the fluctuation complexity of each IMF is quantified via RCMSE and K-means clustering: high-frequency components are captured using small patches to preserve details, while low-frequency components use larger patches to learn long-term trends. Subsequently, a PatchTST-BiLSTM reconstruction network with patch partitioning and multi-head attention is adopted to capture temporal dependencies and optimize data representation, overcoming the bottleneck caused by the imbalance between long-term and short-term features. Finally, recursive feature elimination (RFE) feature selection combined with a PLE multi-task network can coordinate expert models to mitigate negative transfer and enhance high-frequency response capability. Experiments on the Alice Springs dataset show that the proposed method significantly outperforms conventional deep learning and new multi-task models in the mean absolute error (MAE) and root mean square error (RMSE). The results show that, compared with the MTL_Attention_LSTM method, the proposed method reduces the average MAE by 45.9% and RMSE by 44.6%, achieving more accurate forecasting of PV power.

Keywords:

photovoltaic power; forecasting; CEEMDAN; RCMSE; deep learning

Graphical Abstract

1. Introduction

With the continuous growth of photovoltaic (PV) installed capacity [1], the penetration of PV generation in power grids is steadily increasing. To ensure stable grid operation and efficient dispatching, accurate PV power forecasting has become essential. As a typical intermittent energy source, the power output of PV plants is strongly influenced by meteorological conditions, sunlight duration, and cloud cover, leading to significant variability and uncertainty. Therefore, conducting high-precision PV power forecasting research [2] is not only beneficial for enhancing the flexibility of grid scheduling and reducing the reserve capacity of thermal power generation but also provides strong data support for the optimal scheduling of integrated energy systems [3].

PV power forecasting is a critical component of power system scheduling and energy management but has long been constrained by the nonlinear and nonstationary nature of the data [4], which poses challenges for achieving high forecasting accuracy with traditional models. Existing methods for PV power forecasting can generally be divided into three categories.

The first category includes statistical time series modeling methods, such as the autoregressive integrated moving average [5] and exponential smoothing models [6]. These methods have solid theoretical foundations and can achieve reasonable results when PV power data exhibit clear periodic or trending patterns. However, these methods generally assume linearity and stationarity, thus struggling to tackle frequent nonlinear disturbances and high-frequency fluctuations in PV power sequences, which reduces forecasting accuracy.

The second category consists of early single-model machine learning approaches, including support vector machines (SVMs) [7], random forests (RFs) [8], and k-nearest neighbors (KNNs) [9]. These methods possess strong nonlinear modeling capabilities and relatively fast training speeds. However, they often rely on manually engineered features, and a single model may fail to capture complex nonlinear relationships, making them prone to overfitting or underfitting.

The third category is the rapidly developing deep learning methods [10,11,12,13,14], with typical representatives encompassing Convolutional Neural Networks (CNNs) [15], Long Short-Term Memory Networks (LSTMs) [16], Gated Recurrent Units (GRUs) [17], Temporal Convolutional Networks (TCNs) [18], and Transformer networks [19]. Such models can automatically extract features and efficiently learn complex nonlinear structures in time series data. Signal decomposition techniques can decompose the original non-stationary, highly fluctuating, PV power sequences into multiple subsequences with strong stationarity and clear regularity, effectively separating redundant noise from core feature information and reducing data complexity. The hybrid forecasting methods formed by combining these two approaches [20,21,22] not only leverage the deep feature extraction capabilities of deep learning but also optimize the quality of input data through signal decomposition techniques. As a result, this has become the focus of much research into PV power forecasting.

1.1. Current Status and Analysis of PV Power Forecasting Research

In existing research of PV power forecasting, the LSTM and its variants have been widely applied. LSTM can effectively capture long-term dependencies in time series and possesses strong temporal memory capabilities, making it suitable for PV power data with complex temporal patterns. However, when confronted with high-frequency disturbances and short-term sharp fluctuations, LSTM models tend to respond slowly and grapple with model multiple frequency components simultaneously, resulting in biases in ultrashort-term forecasts.

To address the limitations of LSTM in spatial feature modeling, hybrid CNN-LSTM structures have been proposed [23], in which CNNs extract latent local spatial features from power sequences and LSTMs capture temporal dependencies. This approach improves the recognition of fluctuation trends. Nevertheless, when dealing with high-dimensional, multivariate inputs, CNN-LSTM models often suffer from large parameter scales, high training complexity, feature redundancy, and restricted gradient propagation, which degrade both training efficiency and generalization performance.

Some researchers introduce mutual information entropy (MIE) [24] to filter highly correlated features and combine intelligent optimization algorithms such as cuckoo search (CS) [25,26] for hyperparameter tuning. These methods can partially alleviate feature redundancy and accelerate training; however, their efficacy still depends on feature engineering quality and remains insufficiently adaptive to dynamically changing temporal structures.

The transformer architecture has recently broken the dependency bottleneck of recurrent networks in long-sequence modeling via its global attention mechanism. Transformers can model entire sequences in parallel, substantially improving the capture of long-range dependencies. However, when applied to high-resolution time series, transformers still encounter challenges in extracting local patterns and handling ultrashort-term high-frequency variations, which can constrain their predictive accuracy.

Signal decomposition techniques are widely used to address PV output nonstationarity and multi-frequency characteristics. Traditional empirical mode decomposition (EMD) [27] and ensemble EMD (EEMD) [28] often suffer from mode mixing, which reduces the accuracy of feature extraction. Complementary ensemble EMD with adaptive noise (CEEMDAN) [29] improves decomposition quality, producing intrinsic mode functions (IMFs) with clearer physical meaning and better preservation of nonlinear characteristics. By clustering subsequences via refined composite multiscale entropy (RCMSE) [30] and K-means, high-frequency and low-frequency components can be more accurately represented, addressing the challenge of capturing multi-frequency fine-grained variations.

To evaluate the correlations among decomposed subsequences, multi-task learning (MTL) architectures such as multigate mixture-of-experts (MMoE) [31] and progressive layered extraction (PLE) [32] have been applied. These architectures balance shared and task-specific expert networks, improving generalization and forecasting performance. Nevertheless, negative transfer remains a major limitation, particularly when modeling high-frequency fine-grained subsequences, reducing ultrashort-term forecast responsiveness and limiting overall accuracy.

Despite these advances, several critical challenges remain unresolved. First, although CEEMDAN decomposition can mitigate mode mixing and extract multi-frequency features, there is no unified approach for handling the decomposed IMFs. Traditional methods often treat heterogeneous IMFs in a blind or ad hoc manner. Second, existing approaches tend to focus on either LSTM-based short-term dependencies or Transformer-based long-term dependencies, leading to an imbalance between long-term and short-term feature modeling. Third, current MTL frameworks struggle with negative transfer, particularly when coordinating multiple subsequences with different frequency characteristics.

1.2. Study Contribution and Paper Layout

On the basis of the aforementioned issues, this study proposes a novel multi-task decoupling hybrid forecasting method for PV power, integrating CEEMDAN decomposition, PatchTST reconstruction, and a multi-head attention-enhanced RFE-PLE architecture, following the framework of decomposition–aggregation, PatchTST-based decoupling, temporal modeling, and multi-task collaboration.

First, to address the complex nonstationarity and multi-frequency characteristics of PV power data, CEEMDAN decomposition is applied, and the resulting power subsequences are clustered via RCMSE to reconstruct nonstationary multi-frequency power components. Then, PatchTST combined with a transformer encoder is employed to perform local segmentation and long-term dependency modeling of the time series. Through the patching mechanism, features from different frequency components are coupled and reconstructed, thereby enhancing the ability of the model to capture the variations in PV power. Finally, in the MTL stage, recursive feature elimination (RFE) is applied for key feature selection, and an improved PLE structure is adopted. By coordinating shared experts and task-specific experts, the proposed framework improves model stability and prediction accuracy, mitigates negative transfer among power subsequence forecasting tasks, and effectively captures high-frequency fine-grained PV power features, thereby enhancing the high-frequency responsiveness of the forecasting model.

The main innovations and contributions of this work are as follows:

Multi-frequency feature grouping: In existing PV power time series studies, some literature employs CEEMDAN decomposition to address multi-frequency mode mixing. However, the handling of decomposed subsequences varies, and no unified approach has been established. We introduce “CEEMDAN decomposition + RCMSE–K-means clustering”, transforming the multi-frequency mixing problem into independently grouped feature clusters quantified by fluctuation complexity, thereby avoiding the blind treatment of heterogeneous IMFs in traditional methods.
Unified modeling of long- and short-term dependencies: Existing studies often focus on LSTM for short-term dependencies or Transformer for long-term dependencies, which can lead to imbalances between long-term and short-term features. Here, a patching mechanism with an improved segmentation strategy is introduced. High-frequency IMFs (large RCMSE) are processed with small time-scale patches, while low-frequency IMFs (small RCMSE) use large time-scale patches, enabling simultaneous adaptation to both long-term and short-term temporal dependencies.
RFE–PLE multi-task forecasting framework: RFE is used to remove redundant features that may interfere with the multi-task prediction of each power subsequence. High-frequency and low-frequency subsequence forecasting tasks are assigned to dedicated private and shared expert networks, while inter-expert interactions are considered, resulting in more balanced predictions across all subsequences.
Validation on multi-season, multi-parameter, and high-frequency disturbance datasets: Experiments on the publicly available Alice Springs dataset demonstrate that the proposed method outperforms neural networks, SVM, LSTM, GRU, and common MMoE MTL models in terms of the mean absolute error (MAE) and root mean square error (RMSE), providing a novel approach for PV power forecasting research.

The remainder of this paper is organized as follows:

Section 2 presents the CEEMDAN decomposition, RCMSE–K-means aggregation, and PatchTST model, along with their theoretical foundations. Section 3 constructs the overall forecasting framework, details the RFE method, and describes the PLE model structure and module workflows. Section 4 presents the empirical analysis and comparative evaluations against mainstream methods. Section 5 summarizes the research findings and outlines future research directions.

2. Coupled Feature Extraction and Feature Selection of Decomposed Power Subsequences

2.1. Decomposition and Aggregation of Power Data

2.1.1. CEEMDAN Decomposition of Power Signals

In the present study, complementary ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is employed to decompose the PV power series. Through an iterative process, the original nonstationary load data are decomposed into multiple IMF components at different time scales, along with a residual term. This process ensures that each component exhibits greater local stationarity, thereby reducing the learning difficulty for the forecasting model [29].

The original PV power data are decomposed by CEEMDAN into several subsequences, each possessing a distinct physical meaning:

P (t) = \sum_{i = 1}^{n} {I M F}_{i} (t) + r_{N} (t)

(1)

where n is the number of

{I M F}_{i} (t)

,

{I M F}_{i} (t)

denotes the ith IMF, which represents oscillatory components at different scales, and

r_{N} (t)

is the residual term, which represents the trend component.

Each IMF component obtained from the CEEMDAN decomposition is used as the input to the PatchTST–BiLSTM model, providing a solid foundation for extracting coupled features from the data. This setup enables more effective capture of information across different frequency bands, thereby enhancing the model’s learning capability and generalization performance.

2.1.2. K-Means Clustering Based on RCMSE

In the PatchTST–BiLSTM forecasting model, directly using all IMF components for modeling may result in excessive computational complexity and degraded fitting performance. Therefore, it is necessary to further identify and group power components with similar characteristics so that the PatchTST–BiLSTM model can more effectively learn the coupling relationships among power loads. In this study, RCMSE is employed to analyze the similarity of power subsequences. The K-means clustering method is subsequently used to group subsequences with similar complexity features, thereby enhancing the consistency of the input features for the PatchTST–BiLSTM model.

The RCMSE evaluates the complexity of a time series through multiscale decomposition, calculation of the sample entropy, and a composite strategy. The detailed calculation process is described as follows:

(1): Multiscale decomposition

For each power subsequence

{I M F}_{i} (t)

, a sliding window method is used to construct multiple time scales:

P^{(τ)} (t) = \frac{1}{τ} \sum_{j = 0}^{τ - 1} {I M F}_{i} (t + j)

(2)

where

P^{(τ)} (t)

denotes the resampled sequence at time scale

τ

and where

τ

is the scale factor controlling the degree of data smoothing.

Equation (1) decomposes the original power series into IMF components of different frequencies and a residual term, while Equation (2) performs multiscale resampling on each IMF using a sliding window to generate sequences at different time scales for complexity calculation and clustering. Thus, Equation (2) is a multiscale processing step applied to the results of Equation (1).

(2): Calculation of the Sample Entropy

For the time-scaled sequence

P^{(τ)} (t)

, the sample entropy (SampEn) is given by:

S a m p E n (E d, r, N) = - l n \frac{C_{e d} (r)}{C_{e d + 1} (r)}

(3)

where N denotes the length of the time series

P^{(τ)} (t)

,

e d

is the embedding dimension (typically set to 2),

r

is the tolerance threshold (generally set to 0.2 times the standard deviation of the data), and

C_{e d} (r)

represents the number of matching templates of dimension

e d

.

(3): Composite Strategy

To increase the stability of the calculation, a composite strategy is adopted, in which the SampEn values across multiple scales are averaged with equal weighting:

R C E S E (P_{i}) = \frac{1}{r_{m a x}} \sum_{τ = 1}^{τ_{m a x}} S a m p E n (P^{(τ)} (t), e d, r)

(4)

where

r_{m a x}

is the maximum time scale (set to 10 herein) and

R C E S E (P_{i})

represents the final complexity measure of the

i

-th power subsequence.

(4): K-means Clustering

On the basis of the feature similarity calculated via RCMSE, K-means clustering is applied to group similar power subsequences by minimizing within-cluster distances. The number of clusters is set to K = 3, meaning that all power subsequences are divided into three categories, each representing a set of subsequences with similar complexity characteristics. These three clustered power sequences are then used as inputs to the PatchTST–BiLSTM model, enabling PatchTST to focus on power components with similar physical characteristics when learning coupling relationships, thereby improving both the accuracy and stability of predictions.

In summary, by computing the complexity of power subsequences via RCMSE and grouping them via K-means clustering, the final power sequence inputs to PatchTST–BiLSTM are more physically meaningful and better aligned with load coupling characteristics. This step not only optimizes the quality of PatchTST’s input data but also enhances the ability of the model to capture coupling relationships among multiple subsequences, laying a solid foundation for the subsequent power forecasting stage.

2.2. PatchTST–BiLSTM Coupled Reconstruction

2.2.1. PatchTST Encoder: Local Perception and Temporal Enhancement

To exploit the long-term influence and short-term disturbances of meteorological features on PV power fluctuations to the fullest extent, a time series encoder module based on PatchTST, which enables both local enhancement and global modeling of the input meteorological sequences, is established. Unlike conventional transformer architectures that model time points sequentially, PatchTST introduces a patch slicing mechanism for the first time. This mechanism divides a long time series into multiple fixed-length subsequences (patches) and models them at the subsequence level. By doing so, it reduces computational complexity while enhancing the sensitivity of the model to local disturbance patterns.

The structural diagram of the PatchTST–BiLSTM coupled reconstruction network is shown in Figure 1.

As shown in Figure 1, the PatchTST–BiLSTM coupled network reconstructs the coupling relationships among the subsequences obtained after the decomposition–aggregation processing of PV power data. The framework consists of the following components:

(1): Patch segmentation mechanism and input structure

Let the input meteorological feature matrix be defined as:

X_{m e t e o} \in R^{T \times D_{f e a t}}

(5)

where

T

denotes the length of the historical observation window (set to T = 512 herein), and

D_{f e a t} = 6

represents the dimensionality of the meteorological variables, including temperature, relative humidity, global horizontal radiation, diffuse horizontal radiation, wind direction, and daily rainfall. PatchTST divides

X_{m e t e o}

into equally spaced segments of length

L_{p a t c h}

, producing a sequence of patches:

\{S^{(1)}, S^{(2)}, \dots, S^{(N_{s e q})}\}, S^{(i)} \in R^{L_{p a t c h} \times D_{f e a t}}

(6)

where

N_{s e q} = [T / L_{p a t c h}]

is the total number of patches and

L_{p a t c h} = 16

herein.

(2): Patch embedding mapping and temporal information incorporation

Each patch subsequence

S^{(i)}

is first projected into a unified feature space through a linear mapping layer:

Z_{p a t c h}^{(i)} = S^{(i)} \cdot W_{e m b} + b_{e m b}, Z_{p a t c h}^{(i)} ϵ R^{L_{p a t c h} \times D_{f e a t}}

(7)

In this step, each patch subsequence

S^{(i)}

of length

L_{p a t c h}

is linearly projected into a unified feature space of dimension

D_{f e a t} = 64

using the embedding weight matrix

W_{e m b}

and bias

b_{e m b}

. This transforms each time step in the patch into a

D_{f e a t}

-dimensional feature vector, enabling the model to represent local temporal patterns consistently and capture complex short-term and long-term dependencies in the subsequent transformer or BiLSTM layers.

After the embedding process, all patches are concatenated:

Z_{p a t c h} = [Z_{p a t c h}^{(1)}; Z_{p a t c h}^{(2)}; \dots; Z_{p a t c h}^{(N_{s e q})}] \in R^{N_{s e q} \times L_{p a t c h} \times D_{m o d e l}}

(8)

where

N_{s e q}

denotes the number of patch subsequences.

To enhance the ability of the model to learn temporal order, each patch embedding incorporates both fixed positional encoding and learnable timestamp embeddings.

(3): Multi-head self-attention mechanism and global dependency modeling

The embedded patch sequence is fed into a multilayer transformer encoder for modeling. The transformer encoder consists primarily of multi-head self-attention (MHSA) layers and feed-forward networks (FFNs). The attention layer is responsible for capturing dependency structures among different patches, and its core formulation is as follows:

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q \cdot K^{T}}{\sqrt{d_{k}}}) \cdot V

(9)

where

Q

,

K

, and

V

represent the query, key, and value vectors, respectively, all obtained by linearly projecting the embedded matrix

Z_{p a t c h}

,

d_{k}

is the dimension of the key vectors

K

. Multiple attention heads are computed in parallel and concatenated:

M H S A (Z) = C o n t a c t ({h e a d}_{1}, \dots, {h e a d}_{h}) \cdot W_{o u t}

(10)

where

W_{o u t}

is the output projection weight matrix,

h e a d

represents a separate attention mechanism that independently computes attention scores for the input sequence,

M H S A (Z)

denotes Multi-head Self-Attention applied to the input Z.

Each transformer layer output is processed with residual connections and layer normalization to improve the training stability and facilitate gradient flow. In the present study, the transformer encoder is configured with two layers and h = 4 attention heads, ensuring strong global modeling capability while keeping the parameter count moderate.

(4): Patch representation output and dimensionality reduction

The transformer encoder produces the following output:

H_{t s t} \in R^{N_{s e q} \times L_{p a t c h} \times D_{m o d e l}}

(11)

To acquire a high-level semantic representation of the overall time series, the last time step from each patch sequence is extracted as its representative feature, forming the final encoding vector

Z_{f i n a l}

. This vector preserves both the local structural information within each patch and the interpatch correlation information and is subsequently fed into the BiLSTM decoder to perform coupled modeling and reconstruction forecasting of the target power subsequences.

In summary, the PatchTST encoder serves as the core time series representation learning module in this framework, offering strong local disturbance modeling capability, comprehensive global dependency capture, efficient parameter control, and high compatibility with the CEEMDAN-decomposed subsequence structure. This design enhances the ability of the model to capture global structural patterns under multidimensional meteorological driving conditions.

2.2.2. BiLSTM Decoder: Bidirectional Coupling and Feature Reconstruction

As shown in Figure 1, after temporal modeling using the PatchTST encoder, and embedding representation of the multidimensional meteorological inputs, a bidirectional long short-term memory (BiLSTM) network is introduced as the decoding module the further to enhance the modeling of coupling relationships among multi-frequency components of PV power. Owing to its bidirectional information flow mechanism, BiLSTM offers strong contextual awareness in time series prediction tasks, enabling more precise fitting of subtle disturbances and coupled variations in power sequences.

The BiLSTM models sequence dependencies in both the forward and backward temporal directions, thereby retaining richer contextual information. This is particularly advantageous for PV power forecasting scenarios characterized by asymmetric disturbance patterns and lagged response behaviors. Unlike a standard LSTM, BiLSTM incorporates both the forward hidden state

\vec{h}

and the backward hidden state

\overset{\leftarrow}{h}

, which are concatenated or summed to form the complete temporal representation:

h_{b i} = [\vec{h}; \overset{\leftarrow}{h}] \in R^{2 H}

(12)

where

H

denotes the dimensionality of the unidirectional hidden state.

In Figure 1, the final output vector

z_{f i n a l}

from the PatchTST encoder is first reshaped into a sequential format suitable for the LSTM input and then fed into the BiLSTM. The final prediction for each power subsequence is obtained by linear transformation:

\hat{y} = W_{o u t} \cdot h_{b i} + b_{o u t}, \hat{y} ϵ R^{B \times D_{t a r g e t}}

(13)

where

b_{o u t}

is the bias vector of the linear layer,

B

denotes the batch size and where

D_{t a r g e t} = 3

herein, corresponding to the three clustered power subsequences (Cluster_1, Cluster_2, Cluster_3).

The modeling approach depicted in Figure 1 integrates spatial–local modeling, global temporal modeling, and multitarget coupled prediction, thereby ensuring that the model maintains strong responsiveness and high predictive accuracy when handling high-frequency disturbances and multiscale dynamic variations in PV power outputs.

3. Modeling Process

3.1. Model Framework

A solar power forecasting framework integrating CEEMDAN decomposition–RCMSE aggregation, PatchTST-BiLSTM secondary decoupling, and RFE-PLE MTL is proposed. The core objective is to leverage a progressive “decomposition-aggregation-decoupling-modeling” architecture that fully utilizes multiscale decomposition techniques and time series decoupling models. This approach captures coupling characteristics among aggregated subsequences, explores multiscale features and inter-subsequence coupling relationships in PV power sequences, and incorporates an improved MTL strategy to achieve accurate and stable power forecasting. The framework consisting of three main components, is implemented via the following steps:

Multiscale decomposition and initial decoupling

The Central Australia Alice Springs PV power dataset is obtained, and complete time segments of both feature and power data are extracted for analysis. According to the seasonal characteristics and coupling relationships within the data, the power series can be decomposed via CEEMDAN to obtain sequence components of different frequency characteristics. The RCMSE of each power subsequence is computed, and K-means clustering is utilized to group components with similar dynamic characteristics into three categories, thereby achieving effective separation of the power data features. This process is referred to as first-stage decoupling.

2.: PatchTST-based reconstruction and secondary decoupling

PatchTST is employed for power data reconstruction on the basis of the coupling relationships among the three power subsequences obtained in the previous step. The patching mechanism segments the time series into fixed-length windows, whereas the transformer encoder captures long-range temporal dependencies. A random masking strategy is incorporated to enhance the generalizability of the model. The output is a set of reconstructed power subsequences, which serves as the second-stage decoupling of the power series, providing stable inputs for subsequent modeling.

3.: RFE–PLE MTL prediction

The RFE wrapper method is integrated into the PLE MTL architecture to identify key feature variables and their coupling relationships with each power subsequence. A MTL prediction model is then built for the subsequences, employing both shared experts and task-specific experts to improve the forecasting accuracy while mitigating negative transfer across tasks. Finally, the predictions of all the subsequences are aggregated to produce the final PV power forecast. The overall prediction framework is illustrated in Figure 2.

3.2. RFE–PLE Multi-Task Predictive Model

In multisubsequence forecasting tasks for PV power, power subsequences from different frequency bands often exhibit potential coupling relationships while also retaining distinct characteristics. Effectively balancing the sharing of information among tasks and the preservation of task-specific representations is a key challenge in MTL.

The conventional MMoE architecture enables the sharing of low-level expert representations across multiple tasks through a gating mechanism. However, its structure typically relies on a single-layer shared expert pool. In scenarios where task coupling is weak or differences between tasks are pronounced, this unified structure can lead to task interference or negative transfer, thereby degrading overall forecasting performance.

To address the problem arising from coupling between PV power subsequences of different frequency bands, this study incorporates the concept of progressive feature extraction into the MMoE framework and adopts the PLE structure to establish a multi-task prediction framework for the PV power subsequences, as illustrated in Figure 3.

Compared with MMoE, the PLE model in Figure 3 introduces explicit task-specific expert pathways and employs a hierarchical gating mechanism to control separately the combination of shared experts and task-specific experts. This structural design avoids information confusion among PV power subsequence prediction tasks of different frequency bands caused by hard parameter sharing.

(1): Incorporation of a wrapper-based feature selection mechanism to optimize the multi-task input structure

In multi-task PV power forecasting, the selection of input features directly influences the modeling effectiveness of both shared experts and task-specific experts. Therefore, at the multi-task input stage, a wrapper-based RFE method is introduced, which uses a linear regressor as the base learner, to optimize the multi-task input structure globally.

The core idea of RFE is to rank the importance of each feature according to the model’s evaluation and iteratively remove the least important ones, thereby obtaining the optimal feature subset

S^{*}

(

x_{m}

, …,

x_{i}

, …,

x_{k}

) for multi-task forecasting. Specifically, in each iteration, the current feature set

S^{(k)}

(

x_{1}

,

x_{2}

, …,

x_{m}

, …

x_{i}

, …

x_{k}

, …

x_{n}

) is used to train a multi-task linear regression model, and the aggregated prediction loss is given by:

L (S) = \frac{1}{T} \sum_{t = 1}^{T} L_{t} (S)

(14)

where

T

denotes the number of tasks (e.g.,

T = 3

corresponds to three prediction tasks), and

L_{t} (S)

represents the prediction loss (such as the MAE or RMSE) of feature set

S

for task

t

.

The importance of feature

x_{j}

is calculated on the basis of the absolute value of its linear regression coefficients:

I (x_{j}) = \frac{1}{T} \sum_{t = 1}^{T} | w_{j, t} |

(15)

where

w_{j, t}

is the regression coefficient of feature

x_{j}

in the prediction model for task

t

.

In each iteration, the

r

least important features are removed as follows:

S^{(k + 1)} = S^{(k)} \ \{\arg \min I (x_{j}), x_{j} \in S^{(k)}\}

(16)

This process is repeated until the remaining number of features reaches the predefined threshold. Finally, the feature subset with the minimum aggregated prediction loss is selected:

S^{*} = {a r g}_{S} \min L (S)

(17)

This method not only effectively removes redundant and noisy features but also enhances the sparsity and generalization capability of the input set, thereby reducing the modeling burden on the downstream PLE network. In this study, the feature sets filtered by RFE achieved higher prediction accuracy across all three forecasting tasks of PV power subsequences, with particularly significant improvements in modeling high-frequency fluctuations.

(2): Integrating Transformer Encoders to Enhance Expert Output Representation

Unlike the shallow expert modules in conventional PLE frameworks—which typically consist of a single-layer multilayer perceptron (MLP)—this approach embeds a multilayer stacked transformer encoder within each expert module to reinforce contextual dependency modeling between experts. By leveraging the multi-head self-attention mechanism (multi-head self-attention, MHSA), transformer-enhanced experts can more accurately capture multiscale temporal patterns in PV power sequences.

In the proposed RFE-PLE framework, expert modules are categorized into two types:

Shared Experts (Es_1~Es_4): These handle global features common to all tasks, focusing on extracting generalizable patterns shared among different PV subsequences.

Task-specific Experts (Et1_1~Et3_2): These are dedicated to modeling each PV subsequence individually, emphasizing task-specific patterns and high-frequency details.

The gating networks act as dynamic controllers that connect expert modules to their corresponding task towers, computing adaptive weights for each expert on the basis of the current input:

Shared Expert Gates: These regulate the contribution of global shared information to each individual task.

Task-specific Gates (Gate1, Gate2, Gate3): These correspond to Cluster_1, Cluster_2, and Cluster_3, respectively, and dynamically select the most relevant combination of shared and task-specific experts for each prediction task.

The RFE-PLE model adopts a collaborative architecture of “feature screening- expert modeling-gated fusion-aggregated output”, comprising five core components: a feature selector based on RFE, shared expert networks, task-specific expert networks, a gated fusion module, and task output towers. Its operational mechanism is described as follows: Meteorological and reconstructed PV power features filtered via the RFE stage, are first processed by multiple shared and task-specific experts. The outputs of these experts are then enhanced by transformer encoders before they enter the gating network, which adaptively fuses them according to task requirements. The resulting high-quality task representations are passed to independent task towers to predict the future power of each cluster. The final total PV power prediction is obtained via a summation layer. This “hierarchical sharing + task-exclusive path” organization, coupled with the temporal modeling capabilities of multilayer transformers, allows the proposed framework to deliver superior forecasting accuracy under complex dynamic conditions, such as seasonal variations and sudden fluctuations. Compared with the “single-pool, multigate” strategy of traditional MMoE, the RFE-PLE structure yields notable improvements in the task balance, fine-grained fluctuation capture, and multi-task collaborative modeling.

By advancing the architecture, feature representation, and information fusion strategies beyond those of MMoE, the proposed RFE-PLE model provides an effective solution for fine-grained, multi-frequency collaborative PV power forecasting, offering a new perspective for advancing research in this field.

4. Case Study

4.1. Numerical Example

The dataset used in the present study is obtained from the Alice Springs PV power generation dataset in central Australia, covering the period from 00:00 on 1 January 2018, to 23:55 on 30 September 2018 [33]. The data have a temporal resolution of 5 min, with a total length of 76,824 records. Since the PV power output is zero at night and does not require forecasting, to ensure that the effectiveness of the proposed method is accurately evaluated, all zero-power points are removed. This eliminates the redundancy caused by the absence of solar radiation at night and avoids introducing noninformative samples into model training. After this preprocessing, the dataset is reduced to 36,633 records. The PV power series is then decomposed via CEEMDAN, producing seven IMF components. On the basis of the RCMSE metric, k-means clustering is applied to group IMFs with similar dynamic complexity into three clusters. Each of these three clustered power subseries can be subsequently reconstructed via the PatchTST-BiLSTM coupling framework, yielding three reconstructed power subseries. The final dataset—consisting of a timestamp column, six meteorological feature columns, and three reconstructed power subseries—can be divided into three seasonal subsets: January–March, April–June, and July–September. For each subset, the first two months are used as the training set, and the last month serves as the test set. This seasonal segmentation allows the model to learn and forecast PV power under different seasonal conditions. Model performance is evaluated on the test set, and the prediction accuracy is computed accordingly. Meteorological feature data, including Weather_Temperature_Celsius, Weather_Relative_Humidity, Global_Horizontal_Radia tion, Diffuse_Horizontal_Radiation, Wind_Direction and Weather_Daily_Rainfall, are obtained from the National Solar Radiation Database (NSRDB) [34]. To assess the forecasting performance comprehensively, the root mean square error (RMSE) and mean absolute error (MAE) are employed as evaluation metrics for each power subseries.

4.2. Analysis of the Effectiveness of the Proposed Predictive Model

4.2.1. Comparative Analysis of the PatchTST Reconstruction Model

To verify the effectiveness of the proposed PatchTST–BiLSTM coupled reconstruction in PV power forecasting tasks, we compared the predictive performance of the original data and the reconstructed data across different power subseries prediction tasks. In this experiment, the three power subseries (Cluster_1, Cluster_2, Cluster_3) obtained from CEEMDAN decomposition and RCMSE–K-Means clustering in the case study dataset are selected as target variables. The forecasting results of the RFE–PLE MTL framework were compared when fed with either the unprocessed data or the PatchTST–BiLSTM reconstructed data. To ensure fair comparison, all configurations other than the input data—such as the network architecture, hyperparameters, and number of training epochs—are kept identical. Furthermore, to maintain the credibility of the results, both the reconstructed and original datasets are evaluated under the same seasonal conditions. Specifically, the reconstruction process uses January–March and July–September as the spring and autumn reconstruction datasets, respectively, to capture the seasonal characteristics to the fullest extent. The subsequent training sets are January–February and July–August, whereas the corresponding test sets are 1–7 March and 1–7 September. For computational efficiency, in addition to the subsequent comparative experiment of RFE wrapper feature selection, all other comparative experiments include RFE-based feature selection. Figure 4 shows the comparison between the reconstructed and original datasets for the spring season.

Taking Figure 4 as an example, the reconstructed data closely follow the overall trend of the original data. However, owing to the integration of coupling information and seasonal characteristics, the reconstructed values differ slightly in magnitude and exhibit a denoising effect, particularly in the reduction in peak fluctuations.

To evaluate the capability of the PatchTST reconstruction model in capturing seasonal variations and coupling structures in PV power data, comparative experiments were conducted for both spring and autumn using a continuous one-week period as the prediction target. The experiments compared the performance of models using the original power subsequences directly as inputs with those employing the PatchTST-BiLSTM reconstruction module to preprocess the subsequences for enhanced coupling.

By comparing the predictive performance under these two configurations, the role of PatchTST in capturing latent coupling relationships and seasonal dynamics in time series data can be assessed. The comparison of the aggregated total power predictions before and after reconstruction provides a clear measure of the effectiveness of the reconstruction.

The spring and autumn prediction curves are shown in Figure 5.

Based on the comparison of data from 4, 5, 7 March and 5, 7 September, the reconstructed predictions (Reconstructed Prediction) outperform the original predictions without reconstruction (Raw Prediction) in terms of both MAE and RMSE. Over the five-day period, the average MAE of the reconstructed predictions is 0.249, compared to the raw predictions (0.271); the average RMSE is 0.305 for the reconstructed predictions, versus 0.314 for the raw predictions.

This indicates that the reconstruction of PatchTST-BiLSTM brings the predicted sequence closer to the actual PV power curve, capturing power fluctuations more accurately and thus significantly improving the prediction accuracy and stability.

Table 1 shows that data reconstruction based on PatchTST plays a critical role in decoupling PV power components from meteorological features. Specifically, it separates the inherent temporal patterns of power sequences from the interference of complex and variable meteorological factors, highlighting the characteristics of the power components, making them easier for the model to capture. In terms of quantitative results, compared with predictions using original (unreconstructed) data, the MAE of the reconstructed data decreased by an average of 6.27%, and the RMSE decreased by an average of 3.34%, reflecting an overall improvement in the prediction accuracy. It is worth noting that the improvement is more significant on volatile days (MAE decreased by 8.48%), indicating that PatchTST-based reconstruction effectively mitigates the impact of sudden meteorological changes (such as sudden cloud cover or fluctuations in short-term irradiance) on power prediction.

4.2.2. PLE Experiments Using Reconstructed Power Subsequences Without RFE Feature Selection

(1): Experimental design

This experiment investigates the impact of RFE on the PLE model. The PatchTST reconstruction module is retained, but the selection of RFE-based features is omitted; instead, all the input features are used directly for model training. The objective is to evaluate model training efficiency and forecasting accuracy without feature selection and to compare the outcomes with those of the feature-selected model. This comparison reveals the role of feature selection when coping with high-dimensional inputs, particularly its ability to capture high-frequency fluctuations and improve fine-grained prediction performance.

(2): Experimental results and effectiveness analysis

The results in Figure 6 show that omitting feature selection leads to a decline in prediction accuracy and an increase in training time. Without RFE-based selection, redundant features and noise remain in the input data, which interferes with the ability of the model to learn high-frequency fluctuations, causing larger prediction errors in certain regions. Compared with the feature-selected model, the non-selective model results in more redundancy interference and overfitting issues in predicting Clusters 1 to 3, leading to lower accuracy.

This comparison clearly demonstrates the importance of feature selection in improving model accuracy and mitigating overfitting.

Table 2 shows that RFE better decouples power subsequences from meteorological features: compared with the original data, the RFE-processed data achieve lower average MAE (approximately 2.19%) and RMSE (approximately 5.13%), leading to better overall forecasting performance and capturing finer-grained characteristics of the power subsequences.

4.2.3. Replacing the RFE-PLE with the Original MMoE Model

(1): Experimental design

The objective of the experiment is to compare the performance differences between the proposed RFE-PLE model and the original MMoE model in MTL and time series modeling. In this setting, the PLE network in the proposed framework is replaced with a standard MMoE structure while keeping all other modules, training datasets, and hyperparameters unchanged. The comparison focuses on evaluating the two models in terms of their ability to capture high-frequency fluctuations, model temporal dependencies, and perform collaborative MTL. This experimental set-up is intended to reveal the relative strengths and weaknesses of the MMoE and PLE, especially with respect to their responsiveness to high-frequency disturbances in PV power forecasting.

(2): Experimental results and effectiveness analysis

Figure 7 demonstrates that although the MMoE model demonstrates strengths in MTL and intertask information sharing, its performance in modeling high-frequency fluctuations and temporal dependencies is inferior to that of the PLE model. MMoE mitigates negative transfer between tasks by leveraging a shared pool of experts, which indeed enhances collaborative learning. However, it still exhibits lag and excessive smoothing when capturing high-frequency variations and complex temporal patterns. In contrast, the PLE model, especially when combined with the PatchTST reconstruction module, better captures subtle oscillations and high-frequency disturbances while also improving long-range dependency modeling. Therefore, while the MMoE model may outperform PLE in certain MTL contexts, the RFE-PLE architecture shows superior performance when dealing with data characterized by complex temporal dependencies.

Table 3 shows the comparison between the common MMoE model and the proposed RFE-PLE framework. Overall, the RFE-PLE model achieves better forecasting performance, with average MAE reduced by 7.05% and RMSE reduced by 7.50% compared with MMoE.

4.2.4. Experimental Results Arising from Use of the Proposed Model

In this experiment, the main forecasting framework—integrating the PatchTST-BiLSTM reconstruction module with the RFE-PLE MTL structure—is employed for PV power prediction. To more effectively capture both high-frequency fluctuations and long-range temporal dependencies in PV power series, this architecture is designed to leverage PatchTST’s strong ability for sequence reconstruction and RFE-PLE’s balanced MTL.

As illustrated in Figure 8, the proposed model demonstrates outstanding performance across all the component predictions. Specifically, for Cluster 1, Cluster 2, and Cluster 3, the model achieves consistently high accuracy. Notably, it exhibits strong responsiveness to high-frequency disturbances, enabling timely capture and precise forecasting of power fluctuations while avoiding common prediction lag issues.

As illustrated in Figure 9, the aggregated power forecasts from the proposed model also show superior accuracy. An analysis using the MAE, RMSE, and coefficient of determination (R²) confirms that the model outperforms the comparison baselines in terms of all the metrics. In particular, the proposed approach yields lower MAE and RMSE values, especially for high-frequency variations and short-term abrupt changes, while achieving higher R² scores, indicating a closer fit to the actual PV power trends. These results confirm that each component retained in the model contributes meaningfully to its predictive ability, as observed from the ablation comparisons.

In summary, the proposed model exhibits strong temporal modeling capacity and superior capture ability of high-frequency fluctuations in multicomponent PV power prediction. It maintains both accuracy and stability in the face of complex power dynamics, showing pronounced advantages in short-term forecasting and handling of rapid disturbances.

4.2.5. Summary of Vertical Comparative Experiments

Herein, three distinct experimental configurations were evaluated to assess the performance of different models in PV power forecasting. Specifically, the proposed PLE model is compared with three baseline models—No PatchTST PLE, No RFE PLE, and MMoE—using three evaluation metrics: the MAE, the RMSE, and the coefficient of determination (R²). The results are illustrated in Figure 10.

As shown in Figure 10, the proposed PLE model consistently outperforms the baseline models across all the metrics, underscoring its superior ability to capture both the trend and high-frequency fluctuations in PV power.

From the MAE perspective, the proposed PLE achieves 0.242 in spring, 0.199 in autumn, and an annual average of 0.220, all of which are the lowest among the compared models. This finding indicates a significant advantage in modeling both overall and local PV power variations. In contrast, the PatchTST model without the secondary reconstruction module leads to an average MAE increase to 0.237, with a notable increase to 0.271 in spring, reflecting a weaker capacity to capture high-frequency disturbances. The No-RFE model suffers from feature redundancy, resulting in the highest average MAE of 0.260 among the four models. While the MMoE model performs relatively well in autumn (MAE of 0.175), it records a much higher MAE of 0.293 in spring, with an overall average of 0.234, still above the proposed PLE.

In terms of the RMSE, the proposed PLE again demonstrates superior performance, with an average RMSE of 0.290, outperforming PatchTST (0.293), No RFE (0.303), and MMoE (0.304). Notably, the No RFE model records a spring RMSE of 0.351, and the MMoE model fares even worse in spring at 0.370, underscoring its limited ability to handle highly fluctuating periods and sudden load changes.

In summary, the proposed PLE model achieves the best performance across all key indicators (MAE, RMSE), confirming the crucial role of the PatchTST reconstruction module and RFE feature selection in improving the prediction accuracy, reducing high-frequency errors, and improving model generalizability. These findings also indicate that although the MMoE shows certain advantages in multi-task collaborative learning, it remains less effective than the more targeted PLE structure for forecasting PV power, which is characterized by strong nonlinearity and high temporal dependence.

4.3. Comparison Between the Proposed Model and Other Models

The further to validate the effectiveness of the proposed model in PV power forecasting, several representative MTL architectures were selected as benchmark models. These include MMoE_LSTM_Attention, Shared_LSTM, Shared_Transformer, MIMO, MTL-CNN-LSTM, MTL-Attention-LSTM, and BiLSTM + Attention. Experiments were conducted under three mainstream signal decomposition methods—CEEMDAN, STL, and VMD—to assess the adaptability and robustness of each model to different data preprocessing approaches. Model performance is evaluated via the MAE and RMSE across both the spring and autumn datasets. The detailed results are presented in Table 4. Here, the Average denotes the mean value of the Spring and Autumn results, and the Reduction indicates the relative percentage decrease arising from use of the proposed method compared with each baseline method in terms of the average MAE and RMSE.

As shown in Table 4, the proposed method demonstrates advantages in both accuracy and seasonal adaptability when compared with nine mainstream PV power forecasting methods (including traditional machine learning, single-task, and multi-task deep learning models). It achieves the lowest average MAE (0.22) and RMSE (0.29) among those methods compared. Specifically, compared to the MTL_Attention_LSTM method, the MAE is reduced by 45.9% and RMSE by 44.6%, while compared to the Shared_Transformer method, MAE is reduced by 24.1% and RMSE by 14.7%. The proposed method maintains low errors in both spring (MAE = 0.24, RMSE = 0.31) and autumn (MAE = 0.20, RMSE = 0.27), with a seasonal error difference of only 0.04. This avoids the seasonal adaptability imbalance observed in most other methods (e.g., the BiLSTM_Attention method shows an MAE difference of 0.46 between spring and autumn), thereby achieving PV power forecasting with higher accuracy and improved seasonal adaptability.

In Table 5, a comparison of the predictive results of the proposed method at different sites and during different seasons is presented. By calculating the average values of MAE and RMSE, it can be concluded that the proposed method demonstrates good stability.

The predictions obtained under the CEEMDAN, STL, and VMD decompositions are also plotted, as shown in Figure 11.

From the spring experimental results in Table 4, it can be observed that traditional shared-structure models, such as Shared_LSTM and Shared_Transformer, generally exhibit lower prediction accuracy, with Shared_LSTM showing large errors under all decomposition methods. In contrast, the proposed model achieves the best or second-best MAE and RMSE values across all three decomposition methods, particularly under CEEMDAN decomposition, where the MAE and RMSE reach 0.242 and 0.312, respectively—significantly outperforming the other methods.

In the autumn experimental results (Table 4), the proposed model likewise demonstrates a consistent performance advantage. Notably, under VMD, the MAE and RMSE are 0.223 and 0.273, respectively, surpassing those of the MMoE, MTL-series, and attention-based enhanced models. Notably, although MIMO performs relatively well under certain decomposition settings, its overall stability is inferior to that of the proposed method.

In summary, the proposed PatchTST-BiLSTM-improved PLE model exhibits superior generalizability and modeling performance across different seasons and multiple decomposition scenarios. These results validate the effectiveness of the PatchTST structure in capturing temporal coupling relationships, as well as the adaptability and accuracy advantages of the RFE-PLE framework in multi-task decoupling modeling.

5. Conclusions

This study addresses the challenges of nonstationarity, multi-frequency disturbances, and high-frequency detail modeling in PV power series, and proposes a forecasting method that integrates CEEMDAN decomposition, RCMSE clustering, PatchTST-BiLSTM reconstruction, and an RFE-PLE MTL framework.

By employing CEEMDAN decomposition and RCMSE clustering, the power series can be initially decoupled, thereby avoiding the blind treatment of multiscale components in traditional approaches. On this basis, the PatchTST-BiLSTM reconstruction enhances the joint modeling of local disturbances and long-term dependencies, capturing the potential coupling relationships among multi-frequency components. Furthermore, the introduction of the RFE-PLE framework that integrates feature selection with hierarchical expert collaboration, significantly alleviates the problem of negative transfer in MTL.
Empirical results on the Alice Springs PV power station dataset demonstrate a significant improvement in forecasting accuracy. Compared with the raw data, the introduction of PatchTST reconstruction reduces the average MAE from 0.23 to 0.22 (a reduction of 6.27%) and the RMSE from 0.30 to 0.29 (a reduction of 3.34%). With the addition of RFE-based feature selection, the average RMSE decreases from 0.31 to 0.29 (a reduction of 5.13%).
Comparative experiments further validate the superiority of the proposed method. Relative to the MMoE framework, the RFE-PLE structure reduces the average MAE by 7.05% and the RMSE by 7.50%. In the CEEMDAN decomposition scenario, the proposed method achieves an average MAE of 0.22 and RMSE of 0.29, which are 45.9% and 44.6% lower than those of the MTL-Attention-LSTM model, respectively. It also outperforms Random Forest (MAE 0.32, RMSE 0.40) and MIMO (MAE 0.38, RMSE 0.48), achieving superior predictive performance.

Funding

This research received no external funding.

Data Availability Statement

Data will be made available on request.

Acknowledgments

The authors thank Xinzi Han from the School of Electrical Engineering, Southwest Jiaotong University, for her help and support in checking the formatting and layout of this paper.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Abbreviations

The following abbreviations are used in this manuscript:

PV	Photovoltaic
PLE	Progressive Layered Extraction
MMoE	Multi-gate Mixture-of-Experts
MTL	Multi-task Learning
RFE	Recursive Feature Elimination
CEEMDAN	Complete Ensemble Empirical Mode Decomposition with Adaptive Noise
STL	Seasonal-trend Decomposition Procedure based on Loess
VMD	Variational Mode Decomposition
RCMSE	Refined Composite Multiscale Entropy
K-Means	K-Means Clustering Algorithm
PatchTST	Patch Time Series Transformer
BiLSTM	Bidirectional Long Short-Term Memory
MLP	Multi-layer Perceptron
MAE	Mean Absolute Error
RMSE	Root Mean Square Error
R²	Coefficient of Determination
IMF	Intrinsic Mode Function

References

Ferkous, K.; Guermoui, M.; Menakh, S.; Bellaour, A.; Boulmaiz, T. A novel learning approach for short-term photovoltaic power forecasting—A review and case studies. Eng. Appl. Artif. Intell. 2024, 133, 108502. [Google Scholar] [CrossRef]
Khouili, O.; Hanine, M.; Louzazni, M.; Flores, M.A.; Villena, E.G.; Ashraf, I. Evaluating the impact of deep learning approaches on solar and photovoltaic power forecasting: A systematic review. Energy Strategy Rev. 2025, 59, 101735. [Google Scholar] [CrossRef]
Alcañiz, A.; Grzebyk, D.; Ziar, H.; Isabella, O. Trends and gaps in photovoltaic power forecasting with machine learning: A comprehensive review. Energy Rep. 2023, 9, 447–471. [Google Scholar] [CrossRef]
Sarmas, E.; Spiliotis, E.; Stamatopoulos, E.; Marinakis, V.; Doukas, H. Short-term photovoltaic power forecasting using meta-learning and numerical weather prediction independent LSTM models. Renew. Energy 2023, 216, 118997. [Google Scholar] [CrossRef]
Mayer, M.J.; Gróf, G. Extensive comparison of physical models for photovoltaic power forecasting. Appl. Energy 2021, 283, 116239. [Google Scholar] [CrossRef]
Hu, Z.; Gao, Y.; Ji, S.; Mae, M.; Imaizumi, T. Improved multistep ahead photovoltaic power prediction model based on LSTM and self-attention with weather forecast data. Appl. Energy 2024, 359, 122709. [Google Scholar] [CrossRef]
Zhu, R.; Li, T.; Tang, B. Research on short-term photovoltaic power generation forecasting model based on multi-strategy improved squirrel search algorithm and support vector machine. Sci. Rep. 2024, 14, 14348. [Google Scholar] [CrossRef]
Yang, M.; Zhao, M.; Liu, D.; Ma, M.; Su, X. Improved random forest method for ultra-short-term prediction of the output power of a photovoltaic cluster. Front. Energy Res. 2021, 9, 749367. [Google Scholar] [CrossRef]
Ağır, T.T. Estimation of daily photovoltaic power one day ahead with hybrid Deep Learning and Machine Learning models. Energy Sci. Eng. 2025, 13, 1478–1491. [Google Scholar] [CrossRef]
Yu, J.; Li, X.; Yang, L.; Li, L.; Huang, Z.; Shen, K.; Yang, X.; Yang, X.; Xu, Z.; Zhang, D.; et al. Deep learning models for PV power forecasting: Review. Energies 2024, 17, 3973. [Google Scholar] [CrossRef]
Dimitriadis, C.N.; Passalis, N.; Georgiadis, M.C. A deep learning framework for photovoltaic power forecasting in multiple interconnected countries. Sustain. Energy Technol. Assess. 2025, 77, 104330. [Google Scholar] [CrossRef]
Mouloud, L.A.; Kheldoun, A.; Oussidhoum, S.; Alharbi, H.; Alotaibi, S.; Alzahrani, T.; Agajie, T.F.-. Seasonal quantile forecasting of solar photovoltaic power using Q-CNN-GRU. Sci. Rep. 2025, 15, 27270. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Zhang, Z.; Xu, W.; Li, Y.; Niu, G. Short-term photovoltaic power forecasting using a Bi-LSTM neural network optimized by hybrid algorithms. Sustainability 2025, 17, 5277. [Google Scholar] [CrossRef]
Zhu, H.; Wang, Y.; Wu, J.; Zhang, X. A regional distributed photovoltaic power generation forecasting method based on grid division and TCN-BiLSTM. Renew. Energy 2026, 256, 123935. [Google Scholar] [CrossRef]
Ren, X.; Zhang, F.; Sun, Y.; Liu, Y. A Novel dual-channel temporal convolutional network for photovoltaic power forecasting. Energies 2024, 17, 698. [Google Scholar] [CrossRef]
Kim, J.; Obregon, J.; Park, H.; Jung, J.-Y. Multi-step photovoltaic power forecasting using transformer and recurrent neural networks. Renew. Sustain. Energy Rev. 2024, 200, 114479. [Google Scholar] [CrossRef]
Zhao, X.-. A novel digital-twin approach based on transformer for photovoltaic power prediction. Sci. Rep. 2024, 14, 26661. [Google Scholar] [CrossRef]
Liu, Y.; Wang, J.; Song, L.; Liu, Y.; Shen, L. Enhanced short-term PV power forecasting via a hybrid modified CEEMDAN-jellyfish search-optimized BiLSTM model. Energies 2025, 18, 3581. [Google Scholar] [CrossRef]
Zhai, C.; He, X.; Cao, Z.; Abdou-Tankari, M.; Wang, Y.; Zhang, M. Photovoltaic power forecasting based on VMD-SSA transformer: Multidimensional analysis of dataset length, weather mutation and forecast accuracy. Energy 2025, 324, 135971. [Google Scholar] [CrossRef]
Guermoui, M.; Fezzani, A.; Mohamed, Z.; Rabehi, A.; Ferkous, K.; Bailek, N.; Bouallit, S.; Riche, A.; Bajaj, M.; Mohammadi, S.A.D.; et al. An analysis of case studies for advancing photovoltaic power forecasting throughMulti-scale fusion techniques. Sci. Rep. 2024, 14, 6653. [Google Scholar] [CrossRef]
Han, S.; Qiao, Y.; Yan, J.; Liu, Y.; Li, L.; Wang, Z. Mid-to-long term wind and photovoltaic power generation prediction based on copula function and long short term memory network. Appl. Energy 2019, 239, 181–191. [Google Scholar] [CrossRef]
Gao, X.; Guo, W.; Mei, C.; Sha, J.; Guo, Y.; Sun, H. Short-term wind power forecasting based on SSA-VMD-LSTM. Energy Rep. 2023, 9 (Suppl. S10), 335–344. [Google Scholar] [CrossRef]
Agga, A.; Abbou, A.; Labbadi, M.; El Houm, Y.; Ou Ali, I.H. CNN-LSTM: An efficient hybrid deep learning architecture for predicting short-term photovoltaic power production. Electr. Power Syst. Res. 2022, 208, 107908. [Google Scholar] [CrossRef]
Abdelkader, D.; Fouzi, H.; Khaldi, B.; Sun, Y. Graph neural network-based spatiotemporal prediction of photovoltaic power: A comparative study. Neural Comput. Appl. 2025, 37, 4769–4795. [Google Scholar] [CrossRef]
Bashir, T.; Wang, H.; Tahir, M.; Zhang, Y. Wind and solar power forecasting based on hybrid CNN-ABiLSTM, CNN-Transformer-MLP models. Renew. Energy. 2025, 239, 122055. [Google Scholar] [CrossRef]
Chai, M.; Xia, F.; Hao, S.; Peng, D.; Cui, C.; Liu, W. PV power predictionbased on LSTM with adaptive hyperparameter adjustment. IEEE Access 2019, 7, 115473–115486. [Google Scholar] [CrossRef]
Yu, Y.; Niu, T.; Wang, J.; Jiang, H. Intermittent solar power hybrid forecasting system based on pattern recognition and feature extraction. Energy Convers. Manag. 2023, 277, 116579. [Google Scholar] [CrossRef]
Sun, S.; Wang, S.; Zhang, G.; Zheng, J. A decomposition-clustering-ensemble learning approach for solar radiation forecasting. Sol. Energy 2018, 163, 189–199. [Google Scholar] [CrossRef]
Sun, H.; Cui, Q.; Wen, J.; Kou, L.; Ke, W. Short-term wind power prediction method based on CEEMDAN-GWO-Bi-LSTM. Energy Rep. 2024, 11, 1487–1502. [Google Scholar] [CrossRef]
Khan, A.H.H.; Wang, Y.C. Drift-diffusion modeling-guided interface optimization in BaHfS₃ chalcogenide perovskite solar cells. Sol. Energy Mater. Sol. Cells 2026, 294, 113889. [Google Scholar] [CrossRef]
Hu, T.; Mo, Z.; Zhang, Z. Multi-task pointwise mutual information learning for bearing remaining useful life cross-domain imbalanced regression. IEEE Internet Things J. 2025, 15, 30415–30425. [Google Scholar] [CrossRef]
Tang, H.; Liu, J.; Zhao, M.; Gong, X. Progressive layered extraction: A novel multi-task learning model for personalized recommendations. In Proceedings of the 14th ACM Conference on Recommender Systems, Online, 22–26 September 2020; pp. 269–278. [Google Scholar] [CrossRef]
DKASC Solar Data Portal. Alice Springs Photovoltaic Generation Data. 2018. Available online: https://dkasolarcentre.com.au/download?Location=yulara (accessed on 12 August 2025).
National Solar Radiation Database. Available online: https://nsrdb.nrel.gov/ (accessed on 12 August 2025).
Xu, H.; Wu, Q.; Wen, J.; Yang, Z. Joint bidding and pricing for electricity retailers based on multi-task deep reinforcement learning. Int. J. Electr. Power Energy Syst. 2022, 138, 107897. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, D.; Wulamu, A. A multitask learning model with multiperspective attention and its application in recommendation. Comput. Intell. Neurosci. 2021, 2021, 8550270. [Google Scholar] [CrossRef]
Zheng, R.; Chen, J.; Ma, M.; Huang, L. Fused acoustic and text encoding for multimodal bilingual pretraining and speech translation. In Proceedings of the 38th International Conference on Machine Learning, PMLR. Online, 18–24 July 2021; Volume 139, pp. 12736–12746. Available online: https://proceedings.mlr.press/v139/zheng21a.html (accessed on 12 August 2025).
Jiang, P.; Nie, Y.; Wang, J.; Huang, X. Multivariable short-term electricity price forecasting using artificial intelligence and multi-input multi-output scheme. Energy Econ. 2023, 117, 106471. [Google Scholar] [CrossRef]
Lodhi, E.; Dahmani, N.; Bukhari, S.M.S.; Gyawali, S.; Thapa, S.; Qiu, L.; Zafar, M.H.; Akhtar, N. Enhancing microgrid forecasting accuracy with SAQ-MTCLSTM: A self-adjusting quantized multi-task ConvLSTM for optimized solar power and load demand predictions. Energy Convers. Manag. X 2024, 24, 100767. [Google Scholar] [CrossRef]
Liu, M.; Wang, X.; Zhong, Z. Ultra-Short-term photovoltaic power prediction Based on BiLSTM with wavelet decomposition and dual attention mechanism. Electronics 2025, 14, 306. [Google Scholar] [CrossRef]
Zhu, M.; Liu, J.; Ji, J. Electrocardiogram Signal Classification Based on Bidirectional LSTM and Multi-Task Temporal Attention. J. Comput. Sci. Technol. 2025, 40, 1401–1413. [Google Scholar] [CrossRef]

Figure 1. The PatchTST coupling network structure.

Figure 2. Framework of the proposed modeling methodology.

Figure 3. The PLE decoupling model for PV power subsequences across different frequency bands.

Figure 4. Comparative analysis of the reconstructed and original data for the spring season. (a) Comparison between the reconstructed data and original data for Component 1. (b) Comparison between the reconstructed data and original data for Component 2. (c) Comparison between the reconstructed data and original data for Component 3.

Figure 5. Comparison of total power prediction curves in spring and autumn under two schemes, with and without PatchTST processing. (a) Comparison of total power prediction curves in spring for Scheme a and Scheme b. (b) Comparison of total power prediction curves in autumn for Scheme a and Scheme b.

Figure 6. Comparison of total power prediction curves in spring and autumn via the PLE model without RFE feature selection. (a) Total power prediction comparison in spring using the PLE model without RFE. (b) Total power prediction comparison in autumn using the PLE model without RFE.

Figure 7. Comparison of total power prediction curves in spring and autumn via the MMoE model. (a) Total power prediction comparison in spring via the MMoE model. (b) Total power prediction comparison in autumn via the MMoE model.

Figure 8. Spring-season prediction comparison for each component via the proposed model.

Figure 9. Comparison of spring (a) and autumn (b) PV power predictions via the proposed model.

Figure 10. Performance comparison of different models in vertical comparative experiments. (a) MAEs and RMSEs of the predicted results. (b) Scatter plots for the predicted results.

Figure 11. Comparison of different algorithmic models under different decomposition methods. (a) Comparison of the MAEs for different models and decomposition methods in spring. (b) Comparison of the RMSEs for different models and decomposition methods in spring. (c) Comparison of the MAEs for different models and decomposition methods in autumn. (d) Comparison of the RMSEs for different models and decomposition methods in autumn.

Table 1. Comparison of total power prediction results between the reconstructed data and original data.

Prediction	Error	Without PatchTST	With PatchTST	Reduction Rate (%)
9 days of stable days (3.1, 3.2, 3.3, 3.6, 9.1, 9.2, 9.3, 9.4, 9.6)	MAE (kW)	0.21	0.20	4.57
	RMSE (kW)	0.28	0.28	0
5 days of volatile days (3.4, 3.5, 3.7, 9.5, 9.7)	MAE (kW)	0.27	0.25	8.48
5 days of volatile days (3.4, 3.5, 3.7, 9.5, 9.7)	RMSE (kW)	0.31	0.31	0
Average	MAE (kW)	0.23	0.22	6.27
Average	RMSE (kW)	0.30	0.29	3.34

Table 2. Comparison of total power prediction results between models trained with and without RFE feature selection.

Prediction	Error	Without RFE	With RFE	Reduction Rate (%)
9 days of stable days (3.1, 3.2, 3.3, 3.6, 9.1, 9.2, 9.3, 9.4, 9.6)	MAE (kW)	0.20	0.20	0
	RMSE (kW)	0.28	0.28	0
5 days of volatile days (3.4, 3.5, 3.7, 9.5, 9.7)	MAE (kW)	0.26	0.24	6.39
5 days of volatile days (3.4, 3.5, 3.7, 9.5, 9.7)	RMSE (kW)	0.35	0.31	11.53
Average	MAE (kW)	0.22	0.22	2.19
Average	RMSE (kW)	0.31	0.29	5.13

Table 3. Comparison of total power prediction results between the RFE-PLE and MMoE model.

Prediction	Error	MMoE	RFE-PLE	Reduction Rate (%)
9 days of stable days (3.1, 3.2, 3.3, 3.6, 9.1, 9.2, 9.3, 9.4, 9.6)	MAE (kW)	0.21	0.20	4.12
	RMSE (kW)	0.29	0.28	3.73
5 days of volatile days (3.4, 3.5, 3.7, 9.5, 9.7)	MAE (kW)	0.27	0.24	10.12
5 days of volatile days (3.4, 3.5, 3.7, 9.5, 9.7)	RMSE (kW)	0.35	0.31	11.83
Average	MAE (kW)	0.24	0.22	7.05
Average	RMSE (kW)	0.32	0.29	7.50

Table 4. Comparison of the predictions from different models under CEEMDAN decomposition.

Prediction Methods	MAE			RMSE
Prediction Methods	Spring	Autumn	Average	Spring	Autumn	Average
SVR	0.63	0.43	0.53	0.89	0.59	0.74
Random Forest	0.35	0.28	0.32	0.45	0.35	0.40
Shared_LSTM [35] (2022)	1.24	1.01	1.13	1.39	1.14	1.27
MMoE_LSTM_Attention [36] (2021)	0.61	0.21	0.41	0.69	0.26	0.48
Shared_Transformer [37] (2021)	0.41	0.17	0.29	0.46	0.21	0.34
MIMO [38] (2023)	0.39	0.36	0.38	0.47	0.48	0.48
MTL_CNN_LSTM [39] (2024)	0.30	0.33	0.32	0.39	0.41	0.40
BiLSTM_Attention [40] (2025)	1.11	0.65	0.88	1.23	0.78	1.01
MTL_Attention_LSTM [41] (2025)	0.47	0.35	0.41	0.57	0.48	0.53
Proposed	0.24	0.20	0.22	0.31	0.27	0.29

Table 5. Comparison of the proposed method from different stations and different seasons.

Different Stations (Proposed)	MAE			RMSE
Different Stations (Proposed)	Spring	Autumn	Average	Spring	Autumn	Average
52-Site_33-REC	0.16	0.17	0.16	0.20	0.20	0.20
56-Site_30-Q-CELLS	0.18	0.17	0.18	0.22	0.21	0.22
93-Site_8-Kaneka	0.28	0.13	0.21	0.34	0.18	0.26
91-Site_1A-Trina	0.24	0.20	0.22	0.31	0.27	0.29

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qu, Y. PatchTST Coupled Reconstruction RFE-PLE Multitask Forecasting Method Based on RCMSE Clustering for Photovoltaic Power. Electronics 2025, 14, 4613. https://doi.org/10.3390/electronics14234613

AMA Style

Qu Y. PatchTST Coupled Reconstruction RFE-PLE Multitask Forecasting Method Based on RCMSE Clustering for Photovoltaic Power. Electronics. 2025; 14(23):4613. https://doi.org/10.3390/electronics14234613

Chicago/Turabian Style

Qu, Yiyang. 2025. "PatchTST Coupled Reconstruction RFE-PLE Multitask Forecasting Method Based on RCMSE Clustering for Photovoltaic Power" Electronics 14, no. 23: 4613. https://doi.org/10.3390/electronics14234613

APA Style

Qu, Y. (2025). PatchTST Coupled Reconstruction RFE-PLE Multitask Forecasting Method Based on RCMSE Clustering for Photovoltaic Power. Electronics, 14(23), 4613. https://doi.org/10.3390/electronics14234613

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PatchTST Coupled Reconstruction RFE-PLE Multitask Forecasting Method Based on RCMSE Clustering for Photovoltaic Power

Abstract

1. Introduction

1.1. Current Status and Analysis of PV Power Forecasting Research

1.2. Study Contribution and Paper Layout

2. Coupled Feature Extraction and Feature Selection of Decomposed Power Subsequences

2.1. Decomposition and Aggregation of Power Data

2.1.1. CEEMDAN Decomposition of Power Signals

2.1.2. K-Means Clustering Based on RCMSE

2.2. PatchTST–BiLSTM Coupled Reconstruction

2.2.1. PatchTST Encoder: Local Perception and Temporal Enhancement

2.2.2. BiLSTM Decoder: Bidirectional Coupling and Feature Reconstruction

3. Modeling Process

3.1. Model Framework

3.2. RFE–PLE Multi-Task Predictive Model

4. Case Study

4.1. Numerical Example

4.2. Analysis of the Effectiveness of the Proposed Predictive Model

4.2.1. Comparative Analysis of the PatchTST Reconstruction Model

4.2.2. PLE Experiments Using Reconstructed Power Subsequences Without RFE Feature Selection

4.2.3. Replacing the RFE-PLE with the Original MMoE Model

4.2.4. Experimental Results Arising from Use of the Proposed Model

4.2.5. Summary of Vertical Comparative Experiments

4.3. Comparison Between the Proposed Model and Other Models

5. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI