1. Introduction
The global energy landscape is undergoing a profound transition driven by the escalating challenges of climate change [
1]. Photovoltaic (PV) power generation has emerged as a cornerstone of this strategy, leveraged by its modularity and diminishing component costs [
2]. However, large-scale PV integration brings significant instability to modern power grids. PV output is inherently random, volatile and non-stationary due to uncontrollable factors (solar irradiance and ambient temperature), which imposes significant stress on grid dispatch operations and often causes supply–demand imbalances and equipment overloading [
3]. Consequently, developing high-precision and high-efficiency forecasting models is a critical prerequisite for the reliable operation of sustainable power systems.
PV power forecasting technology has evolved from fundamental physical mechanisms to advanced deep learning paradigms [
4]. While recurrent neural networks (RNNs) like Long Short-Term Memory (LSTM) [
5] and Gated Recurrent Unit (GRU) [
6] can capture nonlinear temporal correlations, they are susceptible to gradient vanishing in long-sequence modeling. Transformer-based architectures, such as Informer and Autoformer, have utilized sparse attention to model global dependencies but suffer from a quadratic computational complexity of
, which restricts their application in high-resolution tasks requiring ultra-long historical windows [
7,
8]. Furthermore, convolutional frameworks like SCINet [
9] and TimesNet [
10] excel at local feature extraction but often utilize single-stream architectures that struggle to balance global efficiency with robust perception of sudden power fluctuations, such as those caused by cloud occlusion events.
Existing PV forecasting methods still have obvious limitations that our HDCF-Mamba aims to solve:
Transformer-based models (Informer [
7] and Autoformer [
11]) rely on self-attention mechanisms with
quadratic complexity, which limits their application in long-sequence, high-resolution PV forecasting tasks; although they can capture long-range dependencies, their inefficiency makes them unsuitable for practical grid dispatch scenarios.
Linear decomposition models (e.g., DLinear [
12]) employ a simple trend-seasonal decomposition architecture using linear layers but lack the non-linear mapping capability to capture complex atmospheric dynamics, leading to significant errors where sudden weather-induced fluctuations override seasonal patterns.
Hybrid CNN-RNN models (e.g., CNN-GRU-LSTM [
13]) integrate local feature extraction (CNN) and sequential modeling (RNN) but fail to achieve effective fusion of global trends and local fluctuations, leading to suboptimal performance in non-stationary PV power sequences with both high-frequency spikes and long-term trends.
Traditional RNNs (LSTM/GRU) [
5,
6] suffer from gradient vanishing in long-sequence modeling, while CNN-based models (SCINet [
9]) excel at local features but lack efficient global modeling capability.
Thus, we propose HDCF-Mamba, an innovative heterogeneous dual-branch framework that splits temporal modeling into global trend extraction and local detail detection. We integrate Mamba’s linear-complexity state space mechanism for global modeling and design an Multi-Kernel Filter Unit-based multi-scale module for local fluctuation detection, and fuse the two branches via a novel channel fusion module. The paper’s core contributions are the heterogeneous dual-branch architecture, linear-complexity global modeling, and multi-scale robust perception for PV-specific fluctuations.
To clarify the exact novelty of our approach relative to existing hybrid models, the main contributions of this work are summarized as follows:
- 1.
A Synergetic Heterogeneous Paradigm with Three Pillars: Unlike traditional CNN-Transformer hybrids that suffer from quadratic complexity or simple CNN-RNN concatenations that fail to fuse features effectively, we propose a tripartite synergistic framework. The core innovation lies not in any single module, but in their coordinated interaction:
Pillar 1—Global Modeling (Mamba SSM) Provides efficient long-range dependency capture, serving as the “macroscopic” observer of seasonal trends.
Pillar 2—Local Perception (MSFEM) Acts as the “microscopic” sensor, using multi-scale Multi-Kernel Filter Units to detect abrupt, high-frequency fluctuations caused by weather events.
Pillar 3—Heterogeneous Fusion (HDCF) Functions as the “integrator,” employing temporal channel concatenation to dynamically align and fuse the disparate feature spaces from the first two pillars.
This three-pillar design fundamentally resolves the feature distribution gap between long-term trends and short-term volatility that plagues existing models.
- 2.
MSFEM with Hierarchical Multi-Scale Resampling: We design the Multi-Scale Feature Extraction Module (MSFEM) based on Multi-Kernel Filter Unit-style multi-kernel convolutions. Its novelty lies in its hierarchical resampling pipeline (with strides ) that decouples temporal dynamics into fine, medium, and coarse granularities, specifically addressing the smoothening effect of global models and ensuring that weather-driven power ramps are preserved.
- 3.
Empirical Validation of Synergy: Through comprehensive ablation studies (
Section 4.3), we empirically demonstrate that the synergistic integration of these three pillars is essential—removing any one pillar leads to significant performance degradation, confirming that the whole is greater than the sum of its parts.
The remainder of this paper is organized as follows:
Section 2 reviews related work in PV power forecasting, including physical models, statistical methods, and deep learning approaches, with a focus on their limitations in handling multi-scale dynamics.
Section 3 details the proposed HDCF-Mamba framework, including the Mamba global branch, the MSFEM local branch with TimeBlock, and the HDCF fusion mechanism.
Section 4 presents the experimental setup, datasets, evaluation metrics, and implementation details, followed by comprehensive quantitative results, ablation studies, and computational efficiency analysis.
Section 5 concludes the paper with a summary of findings, discussion of limitations, and directions for future work.
2. Related Work
Photovoltaic (PV) power forecasting is a challenging time-series task due to the strong intermittency, high variability, and non-stationarity of PV generation under changing weather conditions. PV output is driven by multiple stochastic meteorological factors, such as solar irradiance, ambient temperature, and humidity [
14]. Accurate forecasts are critical for secure and stable power-grid operation and also support electricity-market participation and the optimal scheduling of energy storage systems. Therefore, precise and efficient PV power forecasting is a key enabler for large-scale integration of renewable energy into power systems [
15]. Existing studies can be broadly grouped into three categories: physics-based methods, traditional statistical models [
16], and deep learning approaches, which have become increasingly prevalent in recent years.
2.1. Physical Model
Physics-based PV power forecasting methods model the generation process using meteorological principles and the PV energy conversion mechanism. They estimate PV output by explicitly relating physical variables (e.g., irradiance and temperature) to power generation through analytical formulations or numerical simulations. A key advantage of this paradigm is strong interpretability, which provides a physically grounded basis for system design and fault diagnosis [
17]. However, these models heavily depend on the quality of meteorological forecast inputs [
18] and prediction errors in weather variables (particularly under rapidly changing cloud or aerosol conditions) can significantly degrade forecasting accuracy, limiting robustness in ultra-short-term scenarios. In addition, developing high-fidelity physical models often involves complex differential equations and computationally intensive numerical procedures, and the calibration process is typically cumbersome and lacks self-adaptation, which hinders generalization and direct deployment across different sites and operating conditions.
2.2. Statistical and Machine Learning Models
To overcome the strong dependence of physical models on explicit mechanisms and site-specific calibration, early PV power forecasting studies widely explored statistical and conventional machine learning approaches. Representative statistical models, such as AutoRegressive Integrated Moving Average (ARIMA) [
19] and Generalized Autoregressive Conditional Heteroskedasticity (GARCH) [
20], estimate PV output by fitting temporal patterns in historical PV power and meteorological time series and extrapolating them to future horizons. Classic machine learning models (e.g., Support Vector Regression, Random Forest, and Gradient Boosting Decision Trees) further improve accuracy by learning non-linear mappings from meteorological inputs and time-related variables to PV power.
These methods typically offer fast inference and modest computational cost, which makes them practical for relatively simple short-term forecasting settings. However, their performance degrades on complex PV sequences for three main reasons. First, they have limited capacity to represent strong non-linearity and non-stationarity under rapidly changing weather. Second, they rely heavily on manual feature engineering (e.g., meteorological feature construction) and model-order/hyperparameter selection, and lack end-to-end representation learning ability [
21]. Third, they often struggle to jointly capture short-term fluctuations and long-range, multi-scale temporal dependencies, resulting in reduced robustness under highly variable conditions such as cloudy or rainy days.
2.3. Deep Learning Models
Deep learning has become the dominant paradigm for PV power forecasting [
3,
4] due to its strong non-linear modeling capacity and end-to-end representation learning. RNN-based models, such as Long Short-Term Memory (LSTM) [
5] and Gated Recurrent Unit (GRU) [
6], have been widely used to model temporal dependencies, but they often struggle with long-range sequences and may suffer from vanishing gradients.
The emergence of Transformer-based models, such as Informer [
7], Autoformer [
11], and iTransformer [
8], introduced the self-attention mechanism to capture long-range global dependencies. Despite their success, the
quadratic complexity of attention remains a significant bottleneck for high-resolution forecasting. To challenge the dominance of Transformers, DLinear [
12] was proposed as a competitive “simple” alternative. By employing a trend-seasonal decomposition architecture through moving average kernels and linear layers, DLinear achieves
efficiency and avoids the temporal confusion inherent in position embeddings. However, the purely linear mapping of DLinear is fundamentally limited in PV forecasting, as it cannot model the complex, high-order non-linear fluctuations caused by sudden weather-driven events like cloud occlusion.
Recently, Mamba-style sequence models have emerged as promising alternatives for long-sequence time-series forecasting, featuring linear complexity (
) with selective state updates. Nevertheless, their applications to PV forecasting remain limited, and the literature has rarely examined how to integrate such global sequence modeling with explicit local feature extraction. In parallel, CNN-based forecasting models (e.g., SCINet [
9] and TimesNet [
10]) are effective at capturing local patterns and multi-scale structures, but may be less sensitive to abrupt PV power variations (e.g., extreme weather-induced drops) and often face a trade-off between global dependency modeling and local robustness. This context is further illustrated in
Table 1, which summarizes the state-of-the-art numerical results (in terms of MAE and RMSE) from key literature using representative models on various PV power forecasting datasets.
Overall, many existing deep learning approaches adopt single-stream architectures, which makes it difficult to jointly achieve efficient long-range modeling, robust local fluctuation handling, and effective multi-scale representation learning, indicating the need for heterogeneous fusion frameworks. Motivated by this gap, HDCF-Mamba combines Mamba-based sequence modeling with multi-scale convolutional modules to better balance global efficiency and local robustness for PV power forecasting.
3. Methodology
The proposed HDCF-Mamba framework adopts a heterogeneous dual-branch architecture to concurrently address global trend modeling and local fluctuation perception in photovoltaic (PV) sequences. By decoupling sequence modeling into two parallel subtasks, the framework reconciles the conflict between computational efficiency and multi-scale feature robustness.
3.1. Design Philosophy of HDCF-Mamba
Existing hybrid forecasting models typically treat global and local features as additive components. However, in PV forecasting, global seasonal trends and local meteorological shocks belong to different feature domains.
The Mamba branch acts as the macroscopic observer, maintaining long-term state memory. The MSFEM branch acts as the microscopic sensor, capturing multi-scale local variances. The HDCF module serves as the integrator, which is the most critical novelty of this work. It dynamically weights the contribution of each branch based on the input volatility, ensuring the model can switch focus between stable clear-sky patterns and volatile cloudy-day fluctuations.
3.2. General Framework
The core innovation of HDCF-Mamba lies in its heterogeneous dual-branch architecture (as shown in
Figure 1), which fundamentally differs from existing Transformer-based models (Informer [
7] and Autoformer [
11]) and hybrid CNN-RNN models that adopt
self-attention for global modeling. We integrate the Mamba Selective State Space mechanism to achieve O(L) linear complexity, enabling efficient long-range dependency modeling without sacrificing accuracy. Compared with hybrid CNN-RNN models that simply combine CNN and RNN modules (but fail to fuse global/local features effectively), our framework splits temporal modeling into two parallel subtasks: global trend extraction (via Mamba) and multi-scale local fluctuation detection (via Multi-Kernel Filter Unit-based Multi-Scale TimeBlock). The two branches are fused via the novel High-Dimensional Channel Fusion (HDCF) module—unlike simple concatenation or addition in existing hybrid models—to maximize heterogeneous feature complementarity. This design balances computational efficiency (from Mamba) and robust local perception (from TimeBlock), addressing the intrinsic trade-off that plagues existing approaches.
3.3. Data Preprocessing and Feature Embedding
This module provides the structural foundation for the numerical stability and feature perception capabilities of the HDCF-Mamba model. To mitigate the challenges posed by non-uniform physical dimensions (e.g., the discrepancy between irradiance and temperature scales) and the inherent lack of temporal context in raw sequences, we implement a unified preprocessing pipeline based on statistical normalization and high-dimensional DataEmbedding.
3.3.1. Statistical Normalization
PV power data typically exhibit diverse units and high-amplitude fluctuations, which can impede convergence and lead to gradient instability during training. To eliminate the influence of heterogeneous physical units, we apply feature-wise Z-score standardization. Each input variable (e.g., solar power, temperature, and horizontal irradiance) is normalized independently based on its specific mean and standard deviation calculated from the training set, ensuring that the model training is not biased by variable scales.
where
and
are the mean and standard deviation
calculated per feature dimension from the training set. This feature-wise normalization ensures that disparate meteorological variables—such as module temperature (in °C), humidity (in %), and ambient temperature (in °C)—are each mapped to a consistent numerical scale individually, rather than being pooled together. This approach preserves the distinct statistical properties of each variable while facilitating faster model convergence.
In addition to Z-Score standardization, a rigorous preprocessing pipeline is applied to ensure data quality and temporal continuity for PV power sequences. First, outlier removal is performed to eliminate abnormal values that violate physical constraints (e.g., negative PV power, irradiance values exceeding the theoretical maximum). Second, for missing values in the time series caused by occasional monitoring interruptions, linear interpolation is adopted based on adjacent valid temporal samples to maintain the integrity of sequential data. Third, a moving average filter is applied to the processed sequence to suppress random high-frequency noise induced by environmental interference, while preserving the intrinsic fluctuation characteristics of PV power. All preprocessing steps are implemented on the training set and consistently applied to the validation and test sets to avoid data leakage, laying a solid foundation for stable model training.
3.3.2. High-Dimensional Feature Embedding
To facilitate the model’s perception of seasonality and periodicity, we utilize a multi-component DataEmbedding layer to project normalized inputs into a model-dimensional (
) latent manifold. This process integrates numerical power values with discrete temporal markers (e.g., hourly and daily timestamps) to capture deep contextual information. The embedded feature vector
E is constructed through the summation of the projected target sequence and positional encodings:
In this formulation, denotes the input target power sequence, while and represent the learnable weight matrix and bias vector of the projection layer, respectively. The function signifies the positional encoding utilized to capture temporal dependencies within the auxiliary temporal covariates . The resulting representation E serves as the “contextual anchor” and the unified starting point for both the global Mamba-based and local MSFEM branches.
3.4. Multi-Scale Convolutional Feature Extraction Module (MSFEM)
A fundamental challenge in photovoltaic (PV) power forecasting is the capture of high-frequency local fluctuations, such as short-term power swings induced by sudden cloud occlusion events. Conventional single-kernel convolutional architectures often fail to perceive these multi-time-scale features effectively. To resolve this, we propose the MSFEM, which decouples temporal dynamics through multi-scale resampling and enhances local dependency modeling via the TimeBlock structure.
3.4.1. Multi-Scale Transformation and Resampling
The MSFEM processes (as shown in
Figure 2) the embedded feature representation
through a parallel resampling architecture designed to extract features across various temporal resolutions. As formalized in Algorithm 1, the module employs
N parallel branches utilizing one-dimensional Average Pooling (AvgPool1d) for downsampling and linear interpolation (Upsample Linear) for sequence reconstruction.
Specifically, while the kernel size
remains constant, we utilize incrementally increasing strides
to yield progressively coarser sequence representations. This hierarchical approach allows the model to analyze the PV sequence through fine, medium, and coarse granularities. The downsampled representations are restored to the original sequence length
L via linear interpolation, as defined by the following operations:
To aggregate these multi-granularity views, we introduce the Multi-View Channel Aggregation Functional (KCh), which performs explicit tensor-level concatenation along the channel dimension
C, resulting in a stacked high-dimensional feature map
:
| Algorithm 1: Multi-Scale Convolutional Feature Extraction Module (MSFEM) |
1 input:
2 output:
3 Initialization:
4 1:
5 2:
6 3:
7 For in S:
8 4:
9 5: 10 6: 11 end for: 12 7: 13 8: 14 return Y |
3.4.2. TimeBlock Feature Enhancement
The
TimeBlock serves as the core processing engine of the MSFEM, illustrated in
Figure 3. It is engineered to perform deep local dependency modeling while maintaining the integrity of the feature manifold. The
TimeBlock processes the stacked multi-scale features
through a residual architecture featuring Inception-style 2D convolutions. It consists of two Multi-Kernel Filter Unit-style blocks with a GELU activation in between:
where
.
Each Inception stage applies six parallel convolutions with a kernel size of and a padding of 1. These parallel kernels capture diverse local features, which are then integrated via mean-pooling across the kernel dimension. The first block expands the channel dimension from 3 to , while the second compresses it back to 3. The output maintains the same shape as the input .
To prepare the local features for fusion with the global Mamba branch,
is first reshaped to collapse the channel and feature dimensions:
followed by a linear projection to reduce the dimensionality back to
:
where
and
are learnable parameters. This transformed representation
Y is then fused with the global Mamba branch output via the HDCF module (
Section 3.6).
3.5. Mamba State Space Model
To address the efficiency bottleneck induced by the quadratic computational complexity (O(L2)) of Transformer-based architectures when processing long-term photovoltaic (PV) data, this study incorporates the Mamba Selective State Space Model (SSM) as the core engine for global temporal modeling, with the implementation strictly based on the open-source mamba-ssm library (
https://github.com/state-spaces/mamba (accessed on 2 November 2025))—the standard industrial implementation of Mamba SSM for time-series modeling with hardware-aware parallel computation support. The Mamba branch in our HDCF-Mamba framework adopts a single Mamba block lightweight design (without stacked Mamba layers), as the Mamba branch is only responsible for capturing global long-range temporal dependencies, while local multi-scale feature extraction and heterogeneous feature fusion are undertaken by the MSFEM branch and HDCF module, respectively. A single Mamba block is sufficient to capture the macroscopic temporal trends of PV sequences while ensuring the overall computational efficiency of the framework. The structural advantage of Mamba lies in its Selective Scan Mechanism, which allows the system to dynamically adjust its information compression strategy—effectively remembering or forgetting specific states based on the input content. This capability is critical for capturing highly selective and unstructured long-term dependencies within non-stationary PV sequences.
3.5.1. Selective State Space Mechanism
The Selective State Space Mechanism (SSM) employed in this work is a recently advanced sequence modeling paradigm that efficiently captures long-range dependencies with linear complexity O(L). Unlike traditional self-attention models (such as Informer and Autoformer) that rely on pairwise token interaction and suffer from quadratic complexity , SSM models temporal dependencies through a recursive state transition process, which avoids expensive full-sequence attention calculation.
By using a selective scan strategy, SSM adaptively focuses on meaningful long-range information while suppressing trivial noise, making it especially suitable for photovoltaic series with strong non-stationarity and multi-scale fluctuations. This linear complexity structure significantly reduces both computational cost and memory usage, enabling the model to process ultra-long input sequences efficiently without performance degradation.
The M-SSM branch operates as a parallel path for global feature extraction, processing the complete embedded representation , which fuses normalized encoder inputs with comprehensive temporal markers. Unlike traditional SSMs with fixed parameters, the Mamba module introduces an input-dependent selection mechanism. This allows the SSM parameters () and the time step to be functions of the input, facilitating a content-aware compression of the historical context.
Similar to the formulation in the original Mamba paper [
23], the state evolution and sequence mapping are described by the discretized state-space equations:
where
represents the latent state manifold and
denotes the input at time
t. Through
Hardware-Aware parallel computation, Mamba successfully optimizes the complexity of modeling long-range correlations to a strictly linear
paradigm. All key hyperparameters of the Mamba block (e.g., model width, state size, expansion factor) are optimized via the Optuna framework and unified across all PV dataset experiments to ensure consistency and reproducibility. The hyperparameter settings are quantitatively detailed in
Section 4.1, and are fully consistent with the experimental code implementation.
3.5.2. Global Correlation Capture
By leveraging selective state updates, the M-SSM branch achieves an optimal balance between computational efficiency and predictive accuracy. This branch fundamentally overcomes the efficiency barriers inherent in sparse attention mechanisms, enabling high-precision forecasting even with ultra-long historical windows (e.g., 720-step inputs). The resulting output feature Z represents the macroscopic trend and global temporal correlation of the PV power generation, providing a stable backbone for the subsequent heterogeneous fusion stage.
3.6. Heterogeneous Feature Fusion and Prediction
This module fuses heterogeneous features from the dual-branch encoders. It aligns local dynamic details from the MSFEM with global macroscopic trends from the M-SSM, and constructs a comprehensive joint feature space to improve forecasting accuracy.
3.6.1. Heterogeneous Feature Alignment and Fusion
To fully leverage the complementary strengths of both branches, the HDCF mechanism employs temporal concatenation to integrate the global and local representations. We concatenate the local features
and the global features
along the temporal axis to form a composite sequence
H with length
:
This operation maximizes the complementarity between the two branches, ensuring that high-frequency local fluctuations and long-range dependencies are preserved in parallel.
3.6.2. Two-Step Linear Projection Decoder
The final prediction is generated via an efficient two-step decoder that maps the latent representation
H back to the physical output space. It is important to clarify that our framework employs a
direct multi-step forecasting strategy rather than a recursive (roll-out) approach. The decoder directly maps the fused representation
to the entire target horizon
T in a single forward pass (Equation (
12)), producing
where each time step from 1 to
T is predicted simultaneously. This approach avoids error accumulation that plagues recursive methods and enables the model to learn temporal dependencies across the entire forecast horizon holistically.
Step 1: Temporal Mapping. We apply a parametric temporal projection,
, to transform the sequence length from
to the target prediction horizon
T:
where
is the temporal projection kernel.
Step 2: Physical Space Reconstruction. Finally, the Target Manifold Reconstruction Operator,
, maps the features from the latent
dimension to the physical PV power dimension
:
By utilizing this purely linear decoder, we maintain the efficiency gains achieved by the Mamba branch while ensuring high tracking capability across varying prediction horizons.
3.7. Multi-Horizon Prediction and Loss Function
The HDCF-Mamba model adopts a direct multi-step (DMS) forecasting strategy. Formally, the fused features from the HDCF mechanism are mapped directly to the target output via a final linear projection layer, where H denotes the entire prediction horizon. Unlike recursive strategies that feed the prediction of the previous step back into the model to generate the next, our direct approach predicts all H steps in a single forward pass. This effectively eliminates cumulative error propagation, which is a critical advantage for long-term PV power forecasting.
The model is optimized by minimizing the Mean Squared Error (MSE) loss over the entire horizon:
where
N denotes the number of samples in a mini-batch, and
and
represent the ground truth and the predicted PV power at future step
i, respectively.
4. Experiment
4.1. Experimental Setup
To rigorously validate the performance and generalization of the proposed HDCF-Mamba framework, we designed a comprehensive experimental suite encompassing diverse geographical climates, standardized metrics, and automated optimization procedures.
The proposed HDCF-Mamba model and all baseline models were implemented using the PyTorch 2.0.1 library on a Linux server (Ubuntu 22.04 LTS). The hardware environment primarily consisted of an Intel Core i9-13900K CPU (Intel Corporation, Santa Clara, CA, USA) and an NVIDIA GeForce RTX 4090 GPU with 24GB VRAM (NVIDIA Corporation, Santa Clara, CA, USA). To ensure high-performance execution of the Mamba selective state space mechanism, the CUDA 11.8 toolkit and the mamba_ssm (v1.1.1) library were utilized.
4.1.1. Datasets and Environmental Context
To verify the real-world applicability and generalization of HDCF-Mamba, we use four real PV operational datasets (including publicly available benchmark datasets and actual monitoring data from photovoltaic power stations in different climatic regions). These datasets characterize different environmental stochasticities and operational conditions, ensuring the model’s robustness in practical applications. The detailed information of each real dataset is as follows:
Extreme Continental Desert PV (ECD-PV) [
24]: Actual on-site monitoring data from a photovoltaic power station in the desert region of Northwestern China, collected by industrial-grade irradiance, temperature and power sensors with 15-minute time resolution. The dataset records 12 months of real PV power and meteorological data, with high irradiance and extreme diurnal temperature variations, capturing severe power fluctuations typical of desert climates—reflecting the actual operation of PV stations in extreme environmental conditions.
Laboratory Standard Performance PV (LSP-PV) [
21]: Sourced from an international research laboratory, this dataset represents stable meteorological conditions and high-precision calibrated sensor monitoring.
Arid Plateau System PV (APS-PV) [
25]: Derived from a plant in Northwestern China, reflecting the unique Loess Plateau geomorphology and semi-arid climatic impacts on power output.
Public Standard Benchmark PV (PSB-PV) [
26]: A consolidated multi-station dataset from a World Scientific Intelligence Competition used to evaluate the model’s cross-site generality. Key exogenous variables across these datasets include critical meteorological factors such as module temperature, air temperature, and humidity, as illustrated in
Figure 4.
4.1.2. Evaluation Metrics
To quantify the discrepancy between the ground-truth target
Y and the model prediction
, we employ four standard statistical benchmarks [
27]:
The selected benchmark models cover five diverse architectural paradigms in PV power forecasting [
28]: classic recurrent neural networks (LSTM, GRU) [
5,
6], CNN-based multi-scale models (SCINet) [
9], modern Transformer variants (iTransformer), recent hybrid deep learning models (WPMixer) [
22], and traditional CNN-RNN hybrids (CNN-GRU-LSTM [
13]). This ensures a comprehensive and diverse comparative evaluation across mainstream technical routes.
4.1.3. Implementation Details
The framework is implemented using the PyTorch library and optimized via the Adam optimizer with a starting learning rate of and hyperparameters . To ensure numerical stability and mitigate overfitting, we introduce a ReduceLROn-Plateau scheduler with a decay factor of 0.5 and an early stopping mechanism with a patience of four epochs. Furthermore, we utilize the Optuna framework for automated hyperparameter optimization, including the model dimensionality (model width—), batch size, and all key hyperparameters of the Mamba branch in the HDCF-Mamba framework. This automated tuning has been empirically verified to yield more stable and accurate performance compared to manual heuristic settings.
Key hyperparameter configuration of the Mamba branch: The Mamba block adopts a unified optimal hyperparameter configuration across all four PV datasets (ECD-PV, LSP-PV, APS-PV, and PSB-PV), with the core settings as follows. (1) Model width (): the input/output feature dimension of the Mamba block, consistent with the global feature dimension of the HDCF-Mamba framework to achieve feature alignment between dual branches. (2) State size (): 32 (the dimension of the latent state space in Mamba SSM, determining the capability of capturing long-term temporal dependencies). (3) Expansion factor (expand): 2 (the dimension expansion ratio of the feed-forward network in the Mamba block, with the inner dimension ). (4) Convolution kernel size (): 4 (the kernel size of the depth-wise convolution in the Mamba block for local temporal correlation capture). (5) Dropout rate: 0.1 (consistent with the dropout configuration of the entire HDCF-Mamba framework to alleviate overfitting and improve generalization ability).
4.2. Quantitative Results and Analysis
The empirical validation of HDCF-Mamba is conducted through a multi-dimensional analysis of its forecasting accuracy and tracking capabilities [
29] across varying temporal horizons and historical windows.
4.2.1. Overall Performance and Temporal Sensitivity Analysis
The experimental results demonstrate the robust performance of our proposed framework across all evaluated datasets. As illustrated in
Figure 5, we observe a characteristic increase in prediction error (MSE and MAE) as the prediction horizon (
P) extends from 24 to 512 steps, which is consistent with the inherent uncertainty in long-term stochastic processes. A key finding is that HDCF-Mamba’s performance degrades much more slowly than all baseline models [
30].
Furthermore, our analysis reveals a vital correlation between historical sequence length (
L) and forecasting precision. When the input length is extended to
or
, the error metrics exhibit a consistent downward trend (
Figure 5), confirming that the framework effectively leverages the
linear complexity of the Mamba branch to extract valuable long-range contextual dependencies without the computational penalties of
attention mechanisms [
31,
32]. The visual tracking results on the ECD-PV dataset [
24] further corroborate this, showing that even at high-resolution horizons, the model accurately aligns with the underlying periodic fluctuations and suppresses the impact of high-frequency noise.
4.2.2. Benchmarking Against State-of-the-Art (SOTA) Models
To rigorously evaluate the superiority of HDCF-Mamba, we compare it against diverse architectural [
33] paradigms, including RNNs (GRU [
6]), linear decomposition models (DLinear [
12]), hybrid models (CNN-GRU-LSTM [
13]), and modern deep learning frameworks (SCINet [
9], iTransformer [
8]). As summarized in
Table 2 and
Table 3, our model achieves state-of-the-art performance across nearly all benchmarks.
Comparison with Linear Decomposition Baselines: Following recent trends in time-series forecasting, we included DLinear [
12] to verify whether complex non-linear modeling is necessary. While DLinear shows surprisingly competitive performance on stable seasonal trends by decomposing sequences into trend and seasonal components, it struggles to capture the high-frequency, non-linear fluctuations inherent in PV power data. HDCF-Mamba outperforms DLinear by an average of 12.2% and 16.5% in MAE and RMSE, respectively, demonstrating that the global–local feature synergy is essential for volatile energy datasets.
Comparison with Modern Transformers and CNNs: Against advanced models [
34] such as iTransformer [
8] and SCINet [
9], HDCF-Mamba maintains a consistent edge. While SCINet [
9] shows competitive results on the APS-PV dataset [
25] due to its localized interaction design, HDCF-Mamba’s dual-branch architecture proves more adaptable to the extreme volatility of the ECD-PV [
24] and PSB-PV datasets [
26]. This suggests that the synergetic integration of Selective State Space Modeling and Multi-Scale Feature decoupling [
35] successfully addresses the specific challenges of non-stationary PV data that single-stream models fail to capture.
4.2.3. Performance on Strong Non-Stationary/Multi-Scale Fluctuation Dataset
To verify the model’s adaptability to complex real-world scenarios, we conduct a targeted analysis on the ECD-PV dataset, which exhibits strong non-stationarity and multi-scale fluctuations caused by abrupt cloud occlusion and extreme temperature changes. Based on the results in
Table 3, HDCF-Mamba maintains significant superiority over all diverse benchmark paradigms on this dataset: it reduces MAE by 8% compared with SCINet [
9] (CNN paradigm), 11.4% compared with iTransformer [
8] (Transformer paradigm [
36]), and 7.2% compared with WPMixer [
22] (recent hybrid paradigm). This demonstrates that the heterogeneous dual-branch design of HDCF-Mamba effectively balances global trend modeling and local fluctuation capture, outperforming single-paradigm models in handling the core challenges of non-stationary PV power sequences.
4.2.4. Computational Efficiency Analysis
To verify the theoretical advantages of HDCF-Mamba regarding linear complexity in long-sequence forecasting, a comprehensive performance benchmark was conducted using an NVIDIA RTX 4090 GPU with a batch size of 32. As summarized in
Table 4, HDCF-Mamba demonstrates a superior balance between predictive performance and computational resource utilization across three key metrics: training duration, inference latency, and peak GPU memory consumption.
By leveraging the Selective State Space Mechanism (SSM), the proposed model successfully overcomes the
computational bottleneck inherent in traditional self-attention mechanisms. Specifically, at an input sequence length of
, HDCF-Mamba requires only 1242 MB of GPU memory—representing a 56.5% reduction in memory overhead compared to iTransformer—while maintaining a consistent inference latency of 12.5 ms/batch [
37,
38]. These empirical results indicate that although the MSFEM module introduces additional multi-scale feature extraction operations [
39], the overall framework effectively mitigates the computational burden of long-range dependency modeling. Consequently, HDCF-Mamba proves to be highly scalable and robust for high-resolution PV power forecasting [
40], offering significant advantages for practical industrial deployment in modern power grids.
In summary, the comparative analysis confirms that HDCF-Mamba establishes a new benchmark for multi-scale PV forecasting [
41], offering superior accuracy and robustness while maintaining high computational efficiency across diverse environmental conditions.
4.3. Ablation Study
To rigorously verify the contribution of each innovative component within the HDCF-Mamba framework, we conducted a systematic ablation study using a single-variable principle. This analysis aims to validate three core hypotheses: (1) the necessity of the Mamba Selective State Space Mechanism for efficient long-range dependency modeling; (2) the role of Multi-Scale Resampling in decoupling multi-frequency temporal features; and (3) the effectiveness of the TimeBlock in capturing localized, high-frequency fluctuations.
4.3.1. Experimental Design and Configurations
We constructed three ablated variants by systematically removing or replacing core modules, comparing their performance against the full HDCF-Mamba baseline across the LSP-PV [
21] and ECD-PV datasets [
24]. The configurations are defined as follows:
w/o Mamba: Removes the selective state space branch, relying solely on convolutional local modeling.
w/o Multi-Scale: Disables the parallel resampling architecture, forcing the local branch to operate at a single temporal resolution.
w/o TimeBlock: Replaces the multi-kernel filter units with standard linear layers to evaluate the impact of fine-grained local feature extraction.
4.3.2. Quantitative Results and Discussion
The results of the ablation study, summarized in
Table 5 and
Table 6, confirm that all three modules are indispensable for achieving state-of-the-art (SOTA) performance.
Impact of Mamba: Removing the Mamba module resulted in the most significant performance degradation. For instance, on the
ECD-PV dataset [
24], the MAE increased from
0.198 to
0.216, representing a substantial loss in predictive accuracy. This confirms that Mamba’s selective state space mechanism is vital for efficiently capturing the macroscopic trends and long-term dependencies inherent in solar cycles.
Impact of Multi-Scale Resampling: The w/o Multi-Scale variant showed a moderate decline in performance across both datasets (e.g., MAE increased to
0.215 on
ECD-PV [
24]). This underscores the necessity of multi-scale decoupling for processing the multi-frequency nature of PV data, where low-frequency seasonal trends and high-frequency weather patterns must be modeled simultaneously.
Impact of TimeBlock: The removal of the TimeBlock module led to a notable reduction in the model’s ability to track abrupt, instantaneous power variations, such as those caused by sudden cloud occlusion events. On the
LSP-PV dataset [
21], the MAE rose from
0.180 to
0.190, highlighting the TimeBlock’s critical role in enhancing local perception and structural robustness.
In conclusion, the ablation study empirically demonstrates that the synergetic integration of these three modules allows HDCF-Mamba to balance global modeling efficiency with local fluctuation robustness, a feat unattainable by any of the ablated variants.
4.4. Limitations and Applicability
Although HDCF-Mamba demonstrates superior performance in balancing efficiency and accuracy, several limitations should be noted for its practical deployment:
- 1.
Impact of Data Completeness: The dual-branch fusion mechanism relies on the synchronization of historical power and meteorological features. In cases of severe sensor failure resulting in missing irradiance or temperature inputs, the local branch (MSFEM) may struggle to provide accurate multi-scale refinements, potentially affecting the overall forecasting stability.
- 2.
Sensitivity to Sampling Resolution: The MSFEM module is optimized with fixed down-sampling strides (). While effective for high-resolution data (e.g., 5-min intervals), its multi-scale advantage may diminish for extremely low-resolution datasets (e.g., hourly averages), where the abrupt fluctuations are already smoothed out during the averaging process.
- 3.
Short-Window Trade-offs: The core advantage of the Mamba branch is its linear complexity for long-sequence modeling. For scenarios with very short look-back windows (), the computational gains over traditional models are negligible, and the model’s structural complexity might lead to slight overfitting compared to simpler linear baselines.
Future research will focus on developing adaptive resampling mechanisms and integrating missing data imputation techniques to further enhance the model’s robustness.
5. Conclusions
This research proposed HDCF-Mamba, an innovative heterogeneous dual-branch framework designed to reconcile the intrinsic trade-off between global modeling efficiency and local perception robustness in multi-scale photovoltaic (PV) power forecasting. By synergetically integrating the Mamba Selective State Space mechanism with a Multi-Kernel Filter Unit-based Multi-Scale TimeBlock module, the architecture effectively bridges long-range macroscopic temporal dependencies and high-frequency non-stationary dynamics. Mathematically, our framework surmounts the quadratic computational bottleneck characteristic of traditional Transformer-based forecasters, achieving linear complexity, which facilitates the utilization of ultra-long historical sequences for high-resolution prediction tasks. Extensive empirical evaluations across diverse real-world datasets, including ECD-PV and APS-PV, demonstrate that HDCF-Mamba consistently establishes new state-of-the-art (SOTA) benchmarks. Specifically, compared to advanced models, HDCF-Mamba achieves a reduction in Mean Absolute Error (MAE) of up to 11.4% over iTransformer and 8% over SCINet, while significantly outperforming RNN-based and hybrid architectures such as CNN-GRU-LSTM and WPMixer. These results confirm that our model provides a computationally sustainable and highly precise solution for modern smart grid operations, contributing a vital technological cornerstone for the ongoing global energy transition.
Future work will focus on two directions: (1) optimizing the model’s lightweight design for edge computing in small-scale PV stations, and (2) extending the framework to multi-task PV forecasting (e.g., simultaneous prediction of power output and solar irradiance). We also plan to validate the model on more global PV datasets to further improve its cross-region generalization ability.