HDCF-Mamba: Bridging Global Dependencies and Local Dynamics for Multi-Scale PV Forecasting

Shi, Wenzhuo; Zhao, Hongtian; Deng, Siyin; Sun, Aojie

doi:10.3390/en19051315

Open AccessArticle

HDCF-Mamba: Bridging Global Dependencies and Local Dynamics for Multi-Scale PV Forecasting

by

Wenzhuo Shi

,

Hongtian Zhao

^*,

Siyin Deng

and

Aojie Sun

College of Mathematics and System Sciences, Xinjiang University, Urumqi 830046, China

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(5), 1315; https://doi.org/10.3390/en19051315

Submission received: 5 February 2026 / Revised: 25 February 2026 / Accepted: 3 March 2026 / Published: 5 March 2026

Download

Browse Figures

Versions Notes

Abstract

The inherent randomness, high volatility, and non-stationarity of photovoltaic (PV) power generation pose substantial threats to the stability of modern power grids. Developing high-precision forecasting models is essential for grid operation, yet conventional architectures often encounter a performance bottleneck: they struggle to simultaneously achieve high computational efficiency for long-range dependency modeling and robust perception for local, abrupt fluctuations. To address these limitations, this paper proposes HDCF-Mamba, a novel forecasting framework that resolves the feature distribution gap between long-range trends and short-term volatility. The core innovation lies in the Heterogeneous Dual-branch Cross-Fusion (HDCF) mechanism, which enables the synergetic integration of a Mamba-based global branch and a Multi-Kernel Filter Unit-based multi-scale local branch. Specifically, we integrate the Mamba Selective State Space Mechanism into the global branch to efficiently capture long-term dependencies with

O (L)

linear complexity, fundamentally overcoming the quadratic computational bottleneck of Transformers. Meanwhile, the Multi-Scale Feature Extraction Module (MSFEM) acts as a local compensator to capture high-frequency power fluctuations caused by transient weather changes. Unlike simple hybrid models that rely on linear addition, our HDCF design utilizes a temporal concatenation mechanism to ensure non-linear alignment of these heterogeneous features. Extensive experiments on four real-world PV operational datasets (including publicly available benchmark datasets and actual photovoltaic power station monitoring data: ECD-PV, LSP-PV, APS-PV, and PSB-PV) demonstrate that HDCF-Mamba consistently outperforms state-of-the-art models, achieving a reduction in Mean Absolute Error (MAE) of up to 11.4% compared to iTransformer and 8% compared to SCINet, while maintaining superior computational efficiency.

Keywords:

photovoltaic (PV) power forecasting; time series; Mamba; multi-scale features; TimeBlock

1. Introduction

The global energy landscape is undergoing a profound transition driven by the escalating challenges of climate change [1]. Photovoltaic (PV) power generation has emerged as a cornerstone of this strategy, leveraged by its modularity and diminishing component costs [2]. However, large-scale PV integration brings significant instability to modern power grids. PV output is inherently random, volatile and non-stationary due to uncontrollable factors (solar irradiance and ambient temperature), which imposes significant stress on grid dispatch operations and often causes supply–demand imbalances and equipment overloading [3]. Consequently, developing high-precision and high-efficiency forecasting models is a critical prerequisite for the reliable operation of sustainable power systems.

PV power forecasting technology has evolved from fundamental physical mechanisms to advanced deep learning paradigms [4]. While recurrent neural networks (RNNs) like Long Short-Term Memory (LSTM) [5] and Gated Recurrent Unit (GRU) [6] can capture nonlinear temporal correlations, they are susceptible to gradient vanishing in long-sequence modeling. Transformer-based architectures, such as Informer and Autoformer, have utilized sparse attention to model global dependencies but suffer from a quadratic computational complexity of

O (L^{2})

, which restricts their application in high-resolution tasks requiring ultra-long historical windows [7,8]. Furthermore, convolutional frameworks like SCINet [9] and TimesNet [10] excel at local feature extraction but often utilize single-stream architectures that struggle to balance global efficiency with robust perception of sudden power fluctuations, such as those caused by cloud occlusion events.

Existing PV forecasting methods still have obvious limitations that our HDCF-Mamba aims to solve:

Transformer-based models (Informer [7] and Autoformer [11]) rely on self-attention mechanisms with $O (L^{2})$ quadratic complexity, which limits their application in long-sequence, high-resolution PV forecasting tasks; although they can capture long-range dependencies, their inefficiency makes them unsuitable for practical grid dispatch scenarios.
Linear decomposition models (e.g., DLinear [12]) employ a simple trend-seasonal decomposition architecture using linear layers but lack the non-linear mapping capability to capture complex atmospheric dynamics, leading to significant errors where sudden weather-induced fluctuations override seasonal patterns.
Hybrid CNN-RNN models (e.g., CNN-GRU-LSTM [13]) integrate local feature extraction (CNN) and sequential modeling (RNN) but fail to achieve effective fusion of global trends and local fluctuations, leading to suboptimal performance in non-stationary PV power sequences with both high-frequency spikes and long-term trends.
Traditional RNNs (LSTM/GRU) [5,6] suffer from gradient vanishing in long-sequence modeling, while CNN-based models (SCINet [9]) excel at local features but lack efficient global modeling capability.

Thus, we propose HDCF-Mamba, an innovative heterogeneous dual-branch framework that splits temporal modeling into global trend extraction and local detail detection. We integrate Mamba’s linear-complexity state space mechanism for global modeling and design an Multi-Kernel Filter Unit-based multi-scale module for local fluctuation detection, and fuse the two branches via a novel channel fusion module. The paper’s core contributions are the heterogeneous dual-branch architecture, linear-complexity global modeling, and multi-scale robust perception for PV-specific fluctuations.

To clarify the exact novelty of our approach relative to existing hybrid models, the main contributions of this work are summarized as follows:

1.

A Synergetic Heterogeneous Paradigm with Three Pillars: Unlike traditional CNN-Transformer hybrids that suffer from quadratic complexity or simple CNN-RNN concatenations that fail to fuse features effectively, we propose a tripartite synergistic framework. The core innovation lies not in any single module, but in their coordinated interaction:

Pillar 1—Global Modeling (Mamba SSM) Provides efficient $O (L)$ long-range dependency capture, serving as the “macroscopic” observer of seasonal trends.
Pillar 2—Local Perception (MSFEM) Acts as the “microscopic” sensor, using multi-scale Multi-Kernel Filter Units to detect abrupt, high-frequency fluctuations caused by weather events.
Pillar 3—Heterogeneous Fusion (HDCF) Functions as the “integrator,” employing temporal channel concatenation to dynamically align and fuse the disparate feature spaces from the first two pillars.

This three-pillar design fundamentally resolves the feature distribution gap between long-term trends and short-term volatility that plagues existing models.

2.

MSFEM with Hierarchical Multi-Scale Resampling: We design the Multi-Scale Feature Extraction Module (MSFEM) based on Multi-Kernel Filter Unit-style multi-kernel convolutions. Its novelty lies in its hierarchical resampling pipeline (with strides

S \in {2, 4, 8}

) that decouples temporal dynamics into fine, medium, and coarse granularities, specifically addressing the smoothening effect of global models and ensuring that weather-driven power ramps are preserved.

3.

Empirical Validation of Synergy: Through comprehensive ablation studies (Section 4.3), we empirically demonstrate that the synergistic integration of these three pillars is essential—removing any one pillar leads to significant performance degradation, confirming that the whole is greater than the sum of its parts.

The remainder of this paper is organized as follows: Section 2 reviews related work in PV power forecasting, including physical models, statistical methods, and deep learning approaches, with a focus on their limitations in handling multi-scale dynamics. Section 3 details the proposed HDCF-Mamba framework, including the Mamba global branch, the MSFEM local branch with TimeBlock, and the HDCF fusion mechanism. Section 4 presents the experimental setup, datasets, evaluation metrics, and implementation details, followed by comprehensive quantitative results, ablation studies, and computational efficiency analysis. Section 5 concludes the paper with a summary of findings, discussion of limitations, and directions for future work.

2. Related Work

Photovoltaic (PV) power forecasting is a challenging time-series task due to the strong intermittency, high variability, and non-stationarity of PV generation under changing weather conditions. PV output is driven by multiple stochastic meteorological factors, such as solar irradiance, ambient temperature, and humidity [14]. Accurate forecasts are critical for secure and stable power-grid operation and also support electricity-market participation and the optimal scheduling of energy storage systems. Therefore, precise and efficient PV power forecasting is a key enabler for large-scale integration of renewable energy into power systems [15]. Existing studies can be broadly grouped into three categories: physics-based methods, traditional statistical models [16], and deep learning approaches, which have become increasingly prevalent in recent years.

2.1. Physical Model

Physics-based PV power forecasting methods model the generation process using meteorological principles and the PV energy conversion mechanism. They estimate PV output by explicitly relating physical variables (e.g., irradiance and temperature) to power generation through analytical formulations or numerical simulations. A key advantage of this paradigm is strong interpretability, which provides a physically grounded basis for system design and fault diagnosis [17]. However, these models heavily depend on the quality of meteorological forecast inputs [18] and prediction errors in weather variables (particularly under rapidly changing cloud or aerosol conditions) can significantly degrade forecasting accuracy, limiting robustness in ultra-short-term scenarios. In addition, developing high-fidelity physical models often involves complex differential equations and computationally intensive numerical procedures, and the calibration process is typically cumbersome and lacks self-adaptation, which hinders generalization and direct deployment across different sites and operating conditions.

2.2. Statistical and Machine Learning Models

To overcome the strong dependence of physical models on explicit mechanisms and site-specific calibration, early PV power forecasting studies widely explored statistical and conventional machine learning approaches. Representative statistical models, such as AutoRegressive Integrated Moving Average (ARIMA) [19] and Generalized Autoregressive Conditional Heteroskedasticity (GARCH) [20], estimate PV output by fitting temporal patterns in historical PV power and meteorological time series and extrapolating them to future horizons. Classic machine learning models (e.g., Support Vector Regression, Random Forest, and Gradient Boosting Decision Trees) further improve accuracy by learning non-linear mappings from meteorological inputs and time-related variables to PV power.

These methods typically offer fast inference and modest computational cost, which makes them practical for relatively simple short-term forecasting settings. However, their performance degrades on complex PV sequences for three main reasons. First, they have limited capacity to represent strong non-linearity and non-stationarity under rapidly changing weather. Second, they rely heavily on manual feature engineering (e.g., meteorological feature construction) and model-order/hyperparameter selection, and lack end-to-end representation learning ability [21]. Third, they often struggle to jointly capture short-term fluctuations and long-range, multi-scale temporal dependencies, resulting in reduced robustness under highly variable conditions such as cloudy or rainy days.

2.3. Deep Learning Models

Deep learning has become the dominant paradigm for PV power forecasting [3,4] due to its strong non-linear modeling capacity and end-to-end representation learning. RNN-based models, such as Long Short-Term Memory (LSTM) [5] and Gated Recurrent Unit (GRU) [6], have been widely used to model temporal dependencies, but they often struggle with long-range sequences and may suffer from vanishing gradients.

The emergence of Transformer-based models, such as Informer [7], Autoformer [11], and iTransformer [8], introduced the self-attention mechanism to capture long-range global dependencies. Despite their success, the

O (L^{2})

quadratic complexity of attention remains a significant bottleneck for high-resolution forecasting. To challenge the dominance of Transformers, DLinear [12] was proposed as a competitive “simple” alternative. By employing a trend-seasonal decomposition architecture through moving average kernels and linear layers, DLinear achieves

O (L)

efficiency and avoids the temporal confusion inherent in position embeddings. However, the purely linear mapping of DLinear is fundamentally limited in PV forecasting, as it cannot model the complex, high-order non-linear fluctuations caused by sudden weather-driven events like cloud occlusion.

Recently, Mamba-style sequence models have emerged as promising alternatives for long-sequence time-series forecasting, featuring linear complexity (

O (L)

) with selective state updates. Nevertheless, their applications to PV forecasting remain limited, and the literature has rarely examined how to integrate such global sequence modeling with explicit local feature extraction. In parallel, CNN-based forecasting models (e.g., SCINet [9] and TimesNet [10]) are effective at capturing local patterns and multi-scale structures, but may be less sensitive to abrupt PV power variations (e.g., extreme weather-induced drops) and often face a trade-off between global dependency modeling and local robustness. This context is further illustrated in Table 1, which summarizes the state-of-the-art numerical results (in terms of MAE and RMSE) from key literature using representative models on various PV power forecasting datasets.

Overall, many existing deep learning approaches adopt single-stream architectures, which makes it difficult to jointly achieve efficient long-range modeling, robust local fluctuation handling, and effective multi-scale representation learning, indicating the need for heterogeneous fusion frameworks. Motivated by this gap, HDCF-Mamba combines Mamba-based sequence modeling with multi-scale convolutional modules to better balance global efficiency and local robustness for PV power forecasting.

3. Methodology

The proposed HDCF-Mamba framework adopts a heterogeneous dual-branch architecture to concurrently address global trend modeling and local fluctuation perception in photovoltaic (PV) sequences. By decoupling sequence modeling into two parallel subtasks, the framework reconciles the conflict between computational efficiency and multi-scale feature robustness.

3.1. Design Philosophy of HDCF-Mamba

Existing hybrid forecasting models typically treat global and local features as additive components. However, in PV forecasting, global seasonal trends and local meteorological shocks belong to different feature domains.

The Mamba branch acts as the macroscopic observer, maintaining long-term state memory. The MSFEM branch acts as the microscopic sensor, capturing multi-scale local variances. The HDCF module serves as the integrator, which is the most critical novelty of this work. It dynamically weights the contribution of each branch based on the input volatility, ensuring the model can switch focus between stable clear-sky patterns and volatile cloudy-day fluctuations.

3.2. General Framework

The core innovation of HDCF-Mamba lies in its heterogeneous dual-branch architecture (as shown in Figure 1), which fundamentally differs from existing Transformer-based models (Informer [7] and Autoformer [11]) and hybrid CNN-RNN models that adopt

O (L^{2})

self-attention for global modeling. We integrate the Mamba Selective State Space mechanism to achieve O(L) linear complexity, enabling efficient long-range dependency modeling without sacrificing accuracy. Compared with hybrid CNN-RNN models that simply combine CNN and RNN modules (but fail to fuse global/local features effectively), our framework splits temporal modeling into two parallel subtasks: global trend extraction (via Mamba) and multi-scale local fluctuation detection (via Multi-Kernel Filter Unit-based Multi-Scale TimeBlock). The two branches are fused via the novel High-Dimensional Channel Fusion (HDCF) module—unlike simple concatenation or addition in existing hybrid models—to maximize heterogeneous feature complementarity. This design balances computational efficiency (from Mamba) and robust local perception (from TimeBlock), addressing the intrinsic trade-off that plagues existing approaches.

3.3. Data Preprocessing and Feature Embedding

This module provides the structural foundation for the numerical stability and feature perception capabilities of the HDCF-Mamba model. To mitigate the challenges posed by non-uniform physical dimensions (e.g., the discrepancy between irradiance and temperature scales) and the inherent lack of temporal context in raw sequences, we implement a unified preprocessing pipeline based on statistical normalization and high-dimensional DataEmbedding.

3.3.1. Statistical Normalization

PV power data typically exhibit diverse units and high-amplitude fluctuations, which can impede convergence and lead to gradient instability during training. To eliminate the influence of heterogeneous physical units, we apply feature-wise Z-score standardization. Each input variable (e.g., solar power, temperature, and horizontal irradiance) is normalized independently based on its specific mean and standard deviation calculated from the training set, ensuring that the model training is not biased by variable scales.

X_{enc}^{(d)} (t) = \frac{X_{enc}^{(d)} (t) - μ^{(d)}}{σ^{(d)}}, for each feature dimension d = 1, \dots, D,

(1)

where

μ^{(d)}

and

σ^{(d)}

are the mean and standard deviation calculated per feature dimension from the training set. This feature-wise normalization ensures that disparate meteorological variables—such as module temperature (in °C), humidity (in %), and ambient temperature (in °C)—are each mapped to a consistent numerical scale individually, rather than being pooled together. This approach preserves the distinct statistical properties of each variable while facilitating faster model convergence.

In addition to Z-Score standardization, a rigorous preprocessing pipeline is applied to ensure data quality and temporal continuity for PV power sequences. First, outlier removal is performed to eliminate abnormal values that violate physical constraints (e.g., negative PV power, irradiance values exceeding the theoretical maximum). Second, for missing values in the time series caused by occasional monitoring interruptions, linear interpolation is adopted based on adjacent valid temporal samples to maintain the integrity of sequential data. Third, a moving average filter is applied to the processed sequence to suppress random high-frequency noise induced by environmental interference, while preserving the intrinsic fluctuation characteristics of PV power. All preprocessing steps are implemented on the training set and consistently applied to the validation and test sets to avoid data leakage, laying a solid foundation for stable model training.

3.3.2. High-Dimensional Feature Embedding

To facilitate the model’s perception of seasonality and periodicity, we utilize a multi-component DataEmbedding layer to project normalized inputs into a model-dimensional (

d_{m o d e l}

) latent manifold. This process integrates numerical power values with discrete temporal markers (e.g., hourly and daily timestamps) to capture deep contextual information. The embedded feature vector E is constructed through the summation of the projected target sequence and positional encodings:

E = X_{target} W_{v} + b_{v} + f_{PE} (X_{mark}) .

(2)

In this formulation,

X_{t a r g e t}

denotes the input target power sequence, while

W_{v}

and

b_{v}

represent the learnable weight matrix and bias vector of the projection layer, respectively. The function

f_{P E} (\cdot)

signifies the positional encoding utilized to capture temporal dependencies within the auxiliary temporal covariates

X_{m a r k}

. The resulting representation E serves as the “contextual anchor” and the unified starting point for both the global Mamba-based and local MSFEM branches.

3.4. Multi-Scale Convolutional Feature Extraction Module (MSFEM)

A fundamental challenge in photovoltaic (PV) power forecasting is the capture of high-frequency local fluctuations, such as short-term power swings induced by sudden cloud occlusion events. Conventional single-kernel convolutional architectures often fail to perceive these multi-time-scale features effectively. To resolve this, we propose the MSFEM, which decouples temporal dynamics through multi-scale resampling and enhances local dependency modeling via the TimeBlock structure.

3.4.1. Multi-Scale Transformation and Resampling

The MSFEM processes (as shown in Figure 2) the embedded feature representation

E \in R^{L \times d_{m o d e l}}

through a parallel resampling architecture designed to extract features across various temporal resolutions. As formalized in Algorithm 1, the module employs N parallel branches utilizing one-dimensional Average Pooling (AvgPool1d) for downsampling and linear interpolation (Upsample Linear) for sequence reconstruction.

Specifically, while the kernel size

K = 7

remains constant, we utilize incrementally increasing strides

S_{i} \in {2, 4, 8}

to yield progressively coarser sequence representations. This hierarchical approach allows the model to analyze the PV sequence through fine, medium, and coarse granularities. The downsampled representations are restored to the original sequence length L via linear interpolation, as defined by the following operations:

E_{agg, i}^{'} = {AvgPool}_{K, S_{i}} (E) \in R^{L^{'} \times d_{model}},

(3)

M_{i} = UpsampleLinear (E_{agg, i}^{'}) \in R^{L \times d_{model}} .

(4)

To aggregate these multi-granularity views, we introduce the Multi-View Channel Aggregation Functional (KCh), which performs explicit tensor-level concatenation along the channel dimension C, resulting in a stacked high-dimensional feature map

M_{s t a c k}

:

M_{stack} = KCh ({M_{i}}_{i = 0}^{2}) \in R^{B \times 3 \times L \times d_{m o d e l}} .

(5)

Algorithm 1: Multi-Scale Convolutional Feature Extraction Module (MSFEM)

1 input:

E \in R^{B \times L \times d_{model}}

2 output:

Y \in R^{B \times L \times d_{model}}

3 Initialization:
4 1:

K \leftarrow {7}

5 2:

S \leftarrow {2, 4, 8}

6 3:

M \leftarrow \emptyset

7 For $S_{i}$ in S:
8 4:

E_{agg, i}^{'} \leftarrow {AvgPool}_{K, S_{i}} (E)

9 5:

M_{i} \leftarrow UpsampleLinear (E_{agg, i}^{'})

10 6:

M \leftarrow M \cup {M_{i}}

11 end for:
12 7:

M_{stack} \leftarrow ConcatChannel (M)

13 8:

Y \leftarrow TimeBlock (M_{stack})

14 return Y

3.4.2. TimeBlock Feature Enhancement

The TimeBlock serves as the core processing engine of the MSFEM, illustrated in Figure 3. It is engineered to perform deep local dependency modeling while maintaining the integrity of the feature manifold. The TimeBlock processes the stacked multi-scale features

M_{stack} \in R^{B \times 3 \times L \times d_{model}}

through a residual architecture featuring Inception-style 2D convolutions. It consists of two Multi-Kernel Filter Unit-style blocks with a GELU activation in between:

M_{out} = LayerNorm (M_{stack} + ConvBlock (M_{stack})),

(6)

where

ConvBlock (\cdot) = Multi-Kernel Filter {Unit}_{2} (GELU (Multi-Kernel Filter {Unit}_{1} (\cdot)))

.

Each Inception stage applies six parallel

2 D

convolutions with a kernel size of

3 \times 3

and a padding of 1. These parallel kernels capture diverse local features, which are then integrated via mean-pooling across the kernel dimension. The first block expands the channel dimension from 3 to

d_{ff}

, while the second compresses it back to 3. The output

M_{out}

maintains the same shape as the input

M_{stack}

.

To prepare the local features for fusion with the global Mamba branch,

M_{out}

is first reshaped to collapse the channel and feature dimensions:

M_{fused}^{'} = Reshape (M_{out}) \in R^{B \times L \times (3 \cdot d_{model})},

(7)

followed by a linear projection to reduce the dimensionality back to

d_{model}

:

M_{fused} = ReLU (LayerNorm (M_{fused}^{'} \cdot W_{proj} + b_{proj})) \in R^{B \times L \times d_{model}} .

(8)

where

W_{proj} \in R^{(3 \cdot d_{model}) \times d_{model}}

and

b_{proj} \in R^{d_{model}}

are learnable parameters. This transformed representation Y is then fused with the global Mamba branch output via the HDCF module (Section 3.6).

3.5. Mamba State Space Model

To address the efficiency bottleneck induced by the quadratic computational complexity (O(L2)) of Transformer-based architectures when processing long-term photovoltaic (PV) data, this study incorporates the Mamba Selective State Space Model (SSM) as the core engine for global temporal modeling, with the implementation strictly based on the open-source mamba-ssm library (https://github.com/state-spaces/mamba (accessed on 2 November 2025))—the standard industrial implementation of Mamba SSM for time-series modeling with hardware-aware parallel computation support. The Mamba branch in our HDCF-Mamba framework adopts a single Mamba block lightweight design (without stacked Mamba layers), as the Mamba branch is only responsible for capturing global long-range temporal dependencies, while local multi-scale feature extraction and heterogeneous feature fusion are undertaken by the MSFEM branch and HDCF module, respectively. A single Mamba block is sufficient to capture the macroscopic temporal trends of PV sequences while ensuring the overall computational efficiency of the framework. The structural advantage of Mamba lies in its Selective Scan Mechanism, which allows the system to dynamically adjust its information compression strategy—effectively remembering or forgetting specific states based on the input content. This capability is critical for capturing highly selective and unstructured long-term dependencies within non-stationary PV sequences.

3.5.1. Selective State Space Mechanism

The Selective State Space Mechanism (SSM) employed in this work is a recently advanced sequence modeling paradigm that efficiently captures long-range dependencies with linear complexity O(L). Unlike traditional self-attention models (such as Informer and Autoformer) that rely on pairwise token interaction and suffer from quadratic complexity

O (L^{2})

, SSM models temporal dependencies through a recursive state transition process, which avoids expensive full-sequence attention calculation.

By using a selective scan strategy, SSM adaptively focuses on meaningful long-range information while suppressing trivial noise, making it especially suitable for photovoltaic series with strong non-stationarity and multi-scale fluctuations. This linear complexity structure significantly reduces both computational cost and memory usage, enabling the model to process ultra-long input sequences efficiently without performance degradation.

The M-SSM branch operates as a parallel path for global feature extraction, processing the complete embedded representation

E_{m a m b a} \in R^{B \times L \times d_{m o d e l}}

, which fuses normalized encoder inputs

X_{e n c}

with comprehensive temporal markers. Unlike traditional SSMs with fixed parameters, the Mamba module introduces an input-dependent selection mechanism. This allows the SSM parameters (

A, B, C

) and the time step

Δ

to be functions of the input, facilitating a content-aware compression of the historical context.

Similar to the formulation in the original Mamba paper [23], the state evolution and sequence mapping are described by the discretized state-space equations:

h_{t} = A h_{t - 1} + B x_{t},

(9)

y_{t} = C h_{t} + D x_{t},

(10)

where

h_{t}

represents the latent state manifold and

x_{t}

denotes the input at time t. Through Hardware-Aware parallel computation, Mamba successfully optimizes the complexity of modeling long-range correlations to a strictly linear

O (L)

paradigm. All key hyperparameters of the Mamba block (e.g., model width, state size, expansion factor) are optimized via the Optuna framework and unified across all PV dataset experiments to ensure consistency and reproducibility. The hyperparameter settings are quantitatively detailed in Section 4.1, and are fully consistent with the experimental code implementation.

3.5.2. Global Correlation Capture

By leveraging selective state updates, the M-SSM branch achieves an optimal balance between computational efficiency and predictive accuracy. This branch fundamentally overcomes the efficiency barriers inherent in sparse attention mechanisms, enabling high-precision forecasting even with ultra-long historical windows (e.g., 720-step inputs). The resulting output feature Z represents the macroscopic trend and global temporal correlation of the PV power generation, providing a stable backbone for the subsequent heterogeneous fusion stage.

3.6. Heterogeneous Feature Fusion and Prediction

This module fuses heterogeneous features from the dual-branch encoders. It aligns local dynamic details from the MSFEM with global macroscopic trends from the M-SSM, and constructs a comprehensive joint feature space to improve forecasting accuracy.

3.6.1. Heterogeneous Feature Alignment and Fusion

To fully leverage the complementary strengths of both branches, the HDCF mechanism employs temporal concatenation to integrate the global and local representations. We concatenate the local features

M_{f u s e d}

and the global features

M_{s s m}

along the temporal axis to form a composite sequence H with length

2 L

:

H = J_{Temporal} (M_{fused}, M_{ssm}) \in R^{2 L \times d_{m o d e l}} .

(11)

This operation maximizes the complementarity between the two branches, ensuring that high-frequency local fluctuations and long-range dependencies are preserved in parallel.

3.6.2. Two-Step Linear Projection Decoder

The final prediction is generated via an efficient two-step decoder that maps the latent representation H back to the physical output space. It is important to clarify that our framework employs a direct multi-step forecasting strategy rather than a recursive (roll-out) approach. The decoder directly maps the fused representation

H \in R^{2 L \times d_{model}}

to the entire target horizon T in a single forward pass (Equation (12)), producing

Y_{pred} \in R^{T \times D_{out}}

where each time step from 1 to T is predicted simultaneously. This approach avoids error accumulation that plagues recursive methods and enables the model to learn temporal dependencies across the entire forecast horizon holistically.

Step 1: Temporal Mapping. We apply a parametric temporal projection, $Ψ_{T i m e M a p}$ , to transform the sequence length from $2 L$ to the target prediction horizon T:

$H^{*} = Ψ_{TimeMap} (H; W) \in R^{T \times d_{m o d e l}},$

(12)

$Ψ_{TimeMap} (H; W) \equiv Trans [W_{proj} \cdot Trans (H) \oplus b_{proj}],$

(13)

where $W_{p r o j} \in R^{T \times 2 L}$ is the temporal projection kernel.
Step 2: Physical Space Reconstruction. Finally, the Target Manifold Reconstruction Operator, $R_{T a r g e t}$ , maps the features from the latent $d_{m o d e l}$ dimension to the physical PV power dimension $D_{o u t}$ :

$Y_{pred} = R_{Target} (H^{*}; Ω) \in R^{T \times D_{out}},$

(14)

$R_{Target} (H^{*}; Ω) \equiv H^{*} \cdot W_{out} \oplus b_{out} .$

(15)

By utilizing this purely linear decoder, we maintain the

O (L)

efficiency gains achieved by the Mamba branch while ensuring high tracking capability across varying prediction horizons.

3.7. Multi-Horizon Prediction and Loss Function

The HDCF-Mamba model adopts a direct multi-step (DMS) forecasting strategy. Formally, the fused features from the HDCF mechanism are mapped directly to the target output

Y_{pred} \in R^{T \times D_{out}}

via a final linear projection layer, where H denotes the entire prediction horizon. Unlike recursive strategies that feed the prediction of the previous step back into the model to generate the next, our direct approach predicts all H steps in a single forward pass. This effectively eliminates cumulative error propagation, which is a critical advantage for long-term PV power forecasting.

The model is optimized by minimizing the Mean Squared Error (MSE) loss over the entire horizon:

L_{MSE} = \frac{1}{N} \sum_{i = 1}^{N} {(Y_{i} - {\hat{Y}}_{i})}^{2},

(16)

where N denotes the number of samples in a mini-batch, and

y_{i}

and

{\hat{y}}_{i}

represent the ground truth and the predicted PV power at future step i, respectively.

4. Experiment

4.1. Experimental Setup

To rigorously validate the performance and generalization of the proposed HDCF-Mamba framework, we designed a comprehensive experimental suite encompassing diverse geographical climates, standardized metrics, and automated optimization procedures.

The proposed HDCF-Mamba model and all baseline models were implemented using the PyTorch 2.0.1 library on a Linux server (Ubuntu 22.04 LTS). The hardware environment primarily consisted of an Intel Core i9-13900K CPU (Intel Corporation, Santa Clara, CA, USA) and an NVIDIA GeForce RTX 4090 GPU with 24GB VRAM (NVIDIA Corporation, Santa Clara, CA, USA). To ensure high-performance execution of the Mamba selective state space mechanism, the CUDA 11.8 toolkit and the mamba_ssm (v1.1.1) library were utilized.

4.1.1. Datasets and Environmental Context

To verify the real-world applicability and generalization of HDCF-Mamba, we use four real PV operational datasets (including publicly available benchmark datasets and actual monitoring data from photovoltaic power stations in different climatic regions). These datasets characterize different environmental stochasticities and operational conditions, ensuring the model’s robustness in practical applications. The detailed information of each real dataset is as follows:

Extreme Continental Desert PV (ECD-PV) [24]: Actual on-site monitoring data from a photovoltaic power station in the desert region of Northwestern China, collected by industrial-grade irradiance, temperature and power sensors with 15-minute time resolution. The dataset records 12 months of real PV power and meteorological data, with high irradiance and extreme diurnal temperature variations, capturing severe power fluctuations typical of desert climates—reflecting the actual operation of PV stations in extreme environmental conditions.
Laboratory Standard Performance PV (LSP-PV) [21]: Sourced from an international research laboratory, this dataset represents stable meteorological conditions and high-precision calibrated sensor monitoring.
Arid Plateau System PV (APS-PV) [25]: Derived from a plant in Northwestern China, reflecting the unique Loess Plateau geomorphology and semi-arid climatic impacts on power output.
Public Standard Benchmark PV (PSB-PV) [26]: A consolidated multi-station dataset from a World Scientific Intelligence Competition used to evaluate the model’s cross-site generality. Key exogenous variables across these datasets include critical meteorological factors such as module temperature, air temperature, and humidity, as illustrated in Figure 4.

4.1.2. Evaluation Metrics

To quantify the discrepancy between the ground-truth target Y and the model prediction

\hat{Y}

, we employ four standard statistical benchmarks [27]:

MSE = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2},

(17)

MAE = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|,

(18)

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}},

(19)

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}} .

(20)

The selected benchmark models cover five diverse architectural paradigms in PV power forecasting [28]: classic recurrent neural networks (LSTM, GRU) [5,6], CNN-based multi-scale models (SCINet) [9], modern Transformer variants (iTransformer), recent hybrid deep learning models (WPMixer) [22], and traditional CNN-RNN hybrids (CNN-GRU-LSTM [13]). This ensures a comprehensive and diverse comparative evaluation across mainstream technical routes.

4.1.3. Implementation Details

The framework is implemented using the PyTorch library and optimized via the Adam optimizer with a starting learning rate of

1 \times 10^{- 4}

and hyperparameters

β_{1} = 0.9,

β_{2} = 0.999

. To ensure numerical stability and mitigate overfitting, we introduce a ReduceLROn-Plateau scheduler with a decay factor of 0.5 and an early stopping mechanism with a patience of four epochs. Furthermore, we utilize the Optuna framework for automated hyperparameter optimization, including the model dimensionality (model width—

d_{m o d e l}

), batch size, and all key hyperparameters of the Mamba branch in the HDCF-Mamba framework. This automated tuning has been empirically verified to yield more stable and accurate performance compared to manual heuristic settings.

Key hyperparameter configuration of the Mamba branch: The Mamba block adopts a unified optimal hyperparameter configuration across all four PV datasets (ECD-PV, LSP-PV, APS-PV, and PSB-PV), with the core settings as follows. (1) Model width (

d_{m o d e l}

): the input/output feature dimension of the Mamba block, consistent with the global feature dimension of the HDCF-Mamba framework to achieve feature alignment between dual branches. (2) State size (

d_{s t a t e}

): 32 (the dimension of the latent state space in Mamba SSM, determining the capability of capturing long-term temporal dependencies). (3) Expansion factor (expand): 2 (the dimension expansion ratio of the feed-forward network in the Mamba block, with the inner dimension

d_{i n n e r} = d_{m o d e l} e x p a n d = 256

). (4) Convolution kernel size (

d_{c o n v}

): 4 (the kernel size of the depth-wise convolution in the Mamba block for local temporal correlation capture). (5) Dropout rate: 0.1 (consistent with the dropout configuration of the entire HDCF-Mamba framework to alleviate overfitting and improve generalization ability).

4.2. Quantitative Results and Analysis

The empirical validation of HDCF-Mamba is conducted through a multi-dimensional analysis of its forecasting accuracy and tracking capabilities [29] across varying temporal horizons and historical windows.

4.2.1. Overall Performance and Temporal Sensitivity Analysis

The experimental results demonstrate the robust performance of our proposed framework across all evaluated datasets. As illustrated in Figure 5, we observe a characteristic increase in prediction error (MSE and MAE) as the prediction horizon (P) extends from 24 to 512 steps, which is consistent with the inherent uncertainty in long-term stochastic processes. A key finding is that HDCF-Mamba’s performance degrades much more slowly than all baseline models [30].

Furthermore, our analysis reveals a vital correlation between historical sequence length (L) and forecasting precision. When the input length is extended to

L = 336

or

L = 720

, the error metrics exhibit a consistent downward trend (Figure 5), confirming that the framework effectively leverages the

O (L)

linear complexity of the Mamba branch to extract valuable long-range contextual dependencies without the computational penalties of

O (L^{2})

attention mechanisms [31,32]. The visual tracking results on the ECD-PV dataset [24] further corroborate this, showing that even at high-resolution horizons, the model accurately aligns with the underlying periodic fluctuations and suppresses the impact of high-frequency noise.

4.2.2. Benchmarking Against State-of-the-Art (SOTA) Models

To rigorously evaluate the superiority of HDCF-Mamba, we compare it against diverse architectural [33] paradigms, including RNNs (GRU [6]), linear decomposition models (DLinear [12]), hybrid models (CNN-GRU-LSTM [13]), and modern deep learning frameworks (SCINet [9], iTransformer [8]). As summarized in Table 2 and Table 3, our model achieves state-of-the-art performance across nearly all benchmarks.

Comparison with Linear Decomposition Baselines: Following recent trends in time-series forecasting, we included DLinear [12] to verify whether complex non-linear modeling is necessary. While DLinear shows surprisingly competitive performance on stable seasonal trends by decomposing sequences into trend and seasonal components, it struggles to capture the high-frequency, non-linear fluctuations inherent in PV power data. HDCF-Mamba outperforms DLinear by an average of 12.2% and 16.5% in MAE and RMSE, respectively, demonstrating that the global–local feature synergy is essential for volatile energy datasets.
Comparison with Modern Transformers and CNNs: Against advanced models [34] such as iTransformer [8] and SCINet [9], HDCF-Mamba maintains a consistent edge. While SCINet [9] shows competitive results on the APS-PV dataset [25] due to its localized interaction design, HDCF-Mamba’s dual-branch architecture proves more adaptable to the extreme volatility of the ECD-PV [24] and PSB-PV datasets [26]. This suggests that the synergetic integration of Selective State Space Modeling and Multi-Scale Feature decoupling [35] successfully addresses the specific challenges of non-stationary PV data that single-stream models fail to capture.

4.2.3. Performance on Strong Non-Stationary/Multi-Scale Fluctuation Dataset

To verify the model’s adaptability to complex real-world scenarios, we conduct a targeted analysis on the ECD-PV dataset, which exhibits strong non-stationarity and multi-scale fluctuations caused by abrupt cloud occlusion and extreme temperature changes. Based on the results in Table 3, HDCF-Mamba maintains significant superiority over all diverse benchmark paradigms on this dataset: it reduces MAE by 8% compared with SCINet [9] (CNN paradigm), 11.4% compared with iTransformer [8] (Transformer paradigm [36]), and 7.2% compared with WPMixer [22] (recent hybrid paradigm). This demonstrates that the heterogeneous dual-branch design of HDCF-Mamba effectively balances global trend modeling and local fluctuation capture, outperforming single-paradigm models in handling the core challenges of non-stationary PV power sequences.

4.2.4. Computational Efficiency Analysis

To verify the theoretical advantages of HDCF-Mamba regarding linear complexity in long-sequence forecasting, a comprehensive performance benchmark was conducted using an NVIDIA RTX 4090 GPU with a batch size of 32. As summarized in Table 4, HDCF-Mamba demonstrates a superior balance between predictive performance and computational resource utilization across three key metrics: training duration, inference latency, and peak GPU memory consumption.

By leveraging the Selective State Space Mechanism (SSM), the proposed model successfully overcomes the

O (L^{2})

computational bottleneck inherent in traditional self-attention mechanisms. Specifically, at an input sequence length of

L = 720

, HDCF-Mamba requires only 1242 MB of GPU memory—representing a 56.5% reduction in memory overhead compared to iTransformer—while maintaining a consistent inference latency of 12.5 ms/batch [37,38]. These empirical results indicate that although the MSFEM module introduces additional multi-scale feature extraction operations [39], the overall framework effectively mitigates the computational burden of long-range dependency modeling. Consequently, HDCF-Mamba proves to be highly scalable and robust for high-resolution PV power forecasting [40], offering significant advantages for practical industrial deployment in modern power grids.

In summary, the comparative analysis confirms that HDCF-Mamba establishes a new benchmark for multi-scale PV forecasting [41], offering superior accuracy and robustness while maintaining high computational efficiency across diverse environmental conditions.

4.3. Ablation Study

To rigorously verify the contribution of each innovative component within the HDCF-Mamba framework, we conducted a systematic ablation study using a single-variable principle. This analysis aims to validate three core hypotheses: (1) the necessity of the Mamba Selective State Space Mechanism for efficient long-range dependency modeling; (2) the role of Multi-Scale Resampling in decoupling multi-frequency temporal features; and (3) the effectiveness of the TimeBlock in capturing localized, high-frequency fluctuations.

4.3.1. Experimental Design and Configurations

We constructed three ablated variants by systematically removing or replacing core modules, comparing their performance against the full HDCF-Mamba baseline across the LSP-PV [21] and ECD-PV datasets [24]. The configurations are defined as follows:

w/o Mamba: Removes the selective state space branch, relying solely on convolutional local modeling.
w/o Multi-Scale: Disables the parallel resampling architecture, forcing the local branch to operate at a single temporal resolution.
w/o TimeBlock: Replaces the multi-kernel filter units with standard linear layers to evaluate the impact of fine-grained local feature extraction.

4.3.2. Quantitative Results and Discussion

The results of the ablation study, summarized in Table 5 and Table 6, confirm that all three modules are indispensable for achieving state-of-the-art (SOTA) performance.

Impact of Mamba: Removing the Mamba module resulted in the most significant performance degradation. For instance, on the ECD-PV dataset [24], the MAE increased from 0.198 to 0.216, representing a substantial loss in predictive accuracy. This confirms that Mamba’s selective state space mechanism is vital for efficiently capturing the macroscopic trends and long-term dependencies inherent in solar cycles.
Impact of Multi-Scale Resampling: The w/o Multi-Scale variant showed a moderate decline in performance across both datasets (e.g., MAE increased to 0.215 on ECD-PV [24]). This underscores the necessity of multi-scale decoupling for processing the multi-frequency nature of PV data, where low-frequency seasonal trends and high-frequency weather patterns must be modeled simultaneously.
Impact of TimeBlock: The removal of the TimeBlock module led to a notable reduction in the model’s ability to track abrupt, instantaneous power variations, such as those caused by sudden cloud occlusion events. On the LSP-PV dataset [21], the MAE rose from 0.180 to 0.190, highlighting the TimeBlock’s critical role in enhancing local perception and structural robustness.

In conclusion, the ablation study empirically demonstrates that the synergetic integration of these three modules allows HDCF-Mamba to balance global modeling efficiency with local fluctuation robustness, a feat unattainable by any of the ablated variants.

4.4. Limitations and Applicability

Although HDCF-Mamba demonstrates superior performance in balancing efficiency and accuracy, several limitations should be noted for its practical deployment:

1.: Impact of Data Completeness: The dual-branch fusion mechanism relies on the synchronization of historical power and meteorological features. In cases of severe sensor failure resulting in missing irradiance or temperature inputs, the local branch (MSFEM) may struggle to provide accurate multi-scale refinements, potentially affecting the overall forecasting stability.
2.: Sensitivity to Sampling Resolution: The MSFEM module is optimized with fixed down-sampling strides ( $S = {2, 4, 8}$ ). While effective for high-resolution data (e.g., 5-min intervals), its multi-scale advantage may diminish for extremely low-resolution datasets (e.g., hourly averages), where the abrupt fluctuations are already smoothed out during the averaging process.
3.: Short-Window Trade-offs: The core advantage of the Mamba branch is its $O (L)$ linear complexity for long-sequence modeling. For scenarios with very short look-back windows ( $L < 48$ ), the computational gains over traditional models are negligible, and the model’s structural complexity might lead to slight overfitting compared to simpler linear baselines.

Future research will focus on developing adaptive resampling mechanisms and integrating missing data imputation techniques to further enhance the model’s robustness.

5. Conclusions

This research proposed HDCF-Mamba, an innovative heterogeneous dual-branch framework designed to reconcile the intrinsic trade-off between global modeling efficiency and local perception robustness in multi-scale photovoltaic (PV) power forecasting. By synergetically integrating the Mamba Selective State Space mechanism with a Multi-Kernel Filter Unit-based Multi-Scale TimeBlock module, the architecture effectively bridges long-range macroscopic temporal dependencies and high-frequency non-stationary dynamics. Mathematically, our framework surmounts the quadratic

O (L^{2})

computational bottleneck characteristic of traditional Transformer-based forecasters, achieving

O (L)

linear complexity, which facilitates the utilization of ultra-long historical sequences for high-resolution prediction tasks. Extensive empirical evaluations across diverse real-world datasets, including ECD-PV and APS-PV, demonstrate that HDCF-Mamba consistently establishes new state-of-the-art (SOTA) benchmarks. Specifically, compared to advanced models, HDCF-Mamba achieves a reduction in Mean Absolute Error (MAE) of up to 11.4% over iTransformer and 8% over SCINet, while significantly outperforming RNN-based and hybrid architectures such as CNN-GRU-LSTM and WPMixer. These results confirm that our model provides a computationally sustainable and highly precise solution for modern smart grid operations, contributing a vital technological cornerstone for the ongoing global energy transition.

Future work will focus on two directions: (1) optimizing the model’s lightweight design for edge computing in small-scale PV stations, and (2) extending the framework to multi-task PV forecasting (e.g., simultaneous prediction of power output and solar irradiance). We also plan to validate the model on more global PV datasets to further improve its cross-region generalization ability.

Author Contributions

Conceptualization, W.S. and H.Z.; methodology, W.S.; software, W.S.; validation, W.S., H.Z., S.D. and A.S.; formal analysis, W.S.; investigation, W.S.; resources, W.S.; data curation, W.S.; writing—original draft preparation, W.S.; writing—review and editing, H.Z.; visualization, W.S.; supervision, H.Z.; project administration, W.S.; and funding acquisition, H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Xinjiang Uygur Autonomous Region Natural Science Foundation Youth Project (No. 2025D01C276), by the Natural Science Foundation of Xinjiang Uygur Autonomous Region (No. 2025D14015), and the Xinjiang Tianchi Talent Program “Robust Perception and Restoration of Low-Quality Visual Content” (No. 51052501848), and the university-level computing platform of Xinjiang University’s Computing and Data Center.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data used in this study are not publicly available due to confidentiality agreements with the data provider. However, data can be made available by the corresponding author (H.Z.) upon reasonable request and with permission from the relevant authorities.

Acknowledgments

The authors would like to thank the reviewers for their insightful comments and useful suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Azrour, M.; Guezzaz, A.; Jabbour, S. Smart Technologies for a Sustainable Environment, 1st ed.; CRC Press: Boca Raton, FL, USA, 2026. [Google Scholar]
Cong, Z.; Chen, Y.T.; Wang, Z.Z.; Deng, J.C.; Chen, H.; Zhan, S.; Zhang, D.X. Greedy compensatory agent based day-ahead photovoltaic power forecasting with a simplified deep Q-network. Renew. Energy 2026, 256, 124616. [Google Scholar] [CrossRef]
Wang, X.Y.; Ma, W.P. A hybrid deep learning model with an optimal strategy based on improved VMD and transformer for short-term photovoltaic power forecasting. Energy 2024, 295, 131071. [Google Scholar] [CrossRef]
Mellit, A.; Massi Pavan, A.; Ogliari, E.; Leva, S.; Lughi, V. Advanced Methods for Photovoltaic Output Power Forecasting: A Review. Appl. Sci. 2020, 10, 487. [Google Scholar] [CrossRef]
Al -Selwi, S.M.; Hassan, M.F.; Abdulkadir, S.J.; Muneer, A.; Sumiea, E.H.; Alqushaibi, A.; Ragab, M.G. RNN-LSTM: From applications to modeling techniques and beyond-Systematic review. J. King Saud Univ.-Comput. Inf. Sci. 2024, 36, 102068. [Google Scholar] [CrossRef]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
Zhou, H.Y.; Zhang, S.H.; Peng, J.Q.; Zhang, S.; Li, J.X.; Xiong, H.; Zhang, W.C. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. arXiv 2020, arXiv:2012.07436. [Google Scholar] [CrossRef]
Liu, Y.; Hu, T.G.; Zhang, H.R.; Wu, H.X.; Wang, S.Y.; Ma, L.T.; Long, M.S. iTransformer: Inverted Transformers Are Effective for Time Series Forecasting. In Proceedings of the 2024 International Conference on Learning Representations (ICLR 2024), Vienna, Austria, 7–11 May 2024. [Google Scholar]
Liu, M.H.; Zeng, A.L.; Chen, M.X.; Xu, Z.J.; Lai, Q.X.; Ma, L.N.; Xu, Q. SCINet: Time Series Modeling and Forecasting with Sample Convolution and Interaction. Adv. Neural Inf. Process. Syst. 2022, 35, 5816–5828. [Google Scholar]
Wu, H.X.; Hu, T.G.; Liu, Y.; Zhou, H.; Wang, J.M.; Long, M.S. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In Proceedings of the 2023 International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
Wu, H.X.; Xu, J.; Wang, J.H.; Wang, J.M.; Long, M.S. Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
Zeng, A.L.; Chen, M.X.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? arXiv 2022, arXiv:2205.13504. [Google Scholar]
Ahmed, M.R.; Islam, S.; Islam, A.K.M.; Shatabda, S. An ensemble 1D-CNN-LSTM-GRU model with data augmentation for speech emotion recognition. Expert Syst. Appl. 2023, 218, 119633. [Google Scholar] [CrossRef]
Hashemi, B. Computational Intelligence-Based Photovoltaic System Performance Modeling in Snow Conditions. Ph.D. Thesis, Département D’informatique et D’ingénierie, Université du Québec en Outaouais (UQO), Gatineau, QC, Canada, 2023. [Google Scholar]
Zhao, Y.M.; Zhang, C.J.; Wan, C.S.; Du, D.; Huang, J.; Li, W.T. Power Dispatch Stability Technology Based on Multi-Energy Complementary Alliances. Mathematics 2025, 13, 2091. [Google Scholar] [CrossRef]
Guermoui, M.; Bouchouicha, K.; Bailek, N.; Bol, J.W. Forecasting intra-hour variance of photovoltaic power using a new integrated model. Energy Convers. Manag. 2021, 245, 114569. [Google Scholar] [CrossRef]
Qian, K.; Deng, Y.; Li, Z.Y.; Wen, X.L. A liquid neural network with physical evolution for variable continuous time series prediction. J. Comput. Sci. 2026, 94, 102757. [Google Scholar] [CrossRef]
Blanc, P.; Remund, J.; Vallance, L. Short-term solar power forecasting based on satellite images. In Renewable Energy Forecasting: From Models to Applications; Woodhead Publishing Series in Energy; Woodhead Publishing: Cambridge, UK, 2017; pp. 179–198. [Google Scholar]
Chodakowska, E.; Nazarko, J.; Nazarko, L.; Rabayah, H.S.; Abendeh, R.; Alawneh, R. ARIMA Models in Solar Radiation Forecasting in Different Geographic Locations. Energies 2023, 16, 5029. [Google Scholar] [CrossRef]
Bollerslev, T. Generalized autoregressive conditional heteroskedasticity. J. Econom. 1986, 31, 307–327. [Google Scholar] [CrossRef]
Bright, J.M.; Killinger, S.; Engerer, N.A. Data article: Distributed PV power data for three cities in Australia. J. Renew. Sustain. Energy 2019, 11, 035504. [Google Scholar] [CrossRef]
Murad, M.M.N.; Aktukmak, M.; Yilmaz, Y. WPMixer: Efficient Multi-Resolution Mixing for Long-Term Time Series Forecasting. In Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI 2025), Philadelphia, PA, USA, 25 February–4 March 2025. [Google Scholar]
Liu, Y.; Ji, Y.; Ren, Q.; Shi, B.; Liu, N.; Lu, M.; Wu, N. An Innovative Small-Target Detection Approach Against Information Attenuation: Fusing Enhanced Programmable Gradient Information and a Novel Mamba Module. Sensors 2025, 25, 2117. [Google Scholar] [CrossRef] [PubMed]
Qiao, J.J.; Yan, M.; Liu, Y.Q.; Zhang, L.; Wu, Y.; Chen, Y.Y.; Shao, W. Deep Learning for photovoltaic station identification and its effect on vegetation spatial aggregation. Natl. Remote Sens. Bull. 2025, 29, 3312–3326. [Google Scholar]
Huang, Z.W.; Huang, J.; Min, J.T. SSA-LSTM: Short-Term Photovoltaic Power Prediction Based on Feature Matching. Energies 2022, 15, 7806. [Google Scholar] [CrossRef]
Multi-Site Photovoltaic Power Forecasting Dataset for the 3rd World Scientific Intelligence Competition. Available online: http://competition.sais.com.cn/competitionDetail/532315 (accessed on 2 March 2026).
Jung, S.G.; Jung, G.; Cole, J.M. Gradient boosted and statistical feature selection workflow for materials property predictions. J. Chem. Phys. 2023, 159, 194106. [Google Scholar] [CrossRef]
Sun, Q.H.; Yan, F.; Sun, W.Q.; Zhou, T.Q. DWT-Former: Fusing wavelet-based multi-scale features and transformer-based temporal representations for photovoltaic power forecasting. Energy 2025, 341, 139283. [Google Scholar] [CrossRef]
Chen, Q.F.; Li, Z.; Li, W.J.; Guo, Y.P.; An, J.Q.; She, J.H. Multi-step prediction of blast furnace permeability index based on multi-time-scale analysis. ISA Trans. 2025, in press. [Google Scholar] [CrossRef]
Xu, D.M.; Zeng, Q.Q.; Wang, W.C.; Zhang, X.T.; Zang, H.F. RMC: Advancing daily runoff forecasting with a unified cross-scale deep learning approach. J. Hydrol. 2026, 665, 134722. [Google Scholar] [CrossRef]
Linguraru, M.G.; Dou, Q.; Feragen, A.; Giannarou, S.; Glocker, B.; Lekadir, K.; Schnabel, J.A. (Eds.) Medical Image Computing and Computer Assisted Intervention—MICCAI 2024: 27th International Conference, Marrakesh, Morocco, October 6–10, 2024, Proceedings, Part III; Springer Nature: Cham, Switzerland, 2024. [Google Scholar]
Luo, Q.; Wang, J.F.; Gao, M.Y.; He, Z.W.; Yang, Y.X.; Zhou, H.T. Multiple Mechanisms to Strengthen the Ability of YOLOv5s for Real-Time Identification of Vehicle Type. Electronics 2022, 11, 2586. [Google Scholar] [CrossRef]
Teng, Y.K.; Shan, G.C. Interpretable machine learning for materials discovery: Predicting CO₂ adsorption properties of metal-organic frameworks. APL Mater. 2024, 12, 081115. [Google Scholar] [CrossRef]
Zhou, H.M.; Chen, J.G.; Niu, X.L.; Dai, Z.G.; Qin, L.; Ma, L.S.; Li, J.C.; Su, Y.M.; Wu, Q. Identification of leaf diseases in field crops based on improved ShuffleNetV2. Front. Plant Sci. 2024, 15, 1342123. [Google Scholar] [CrossRef]
Xu, Y.B.; Liu, D.S.; Wu, T.; Lin, J.J.; Chen, Y.H. AMSP-Net: Adaptive multi-scale patch network for long time-series forecasting. Knowl.-Based Syst. 2026, 334, 115083. [Google Scholar] [CrossRef]
Zhang, Y.; Sun, J.X.; Sun, Y.M.; Liu, S.D.; Wang, C.Q. Lightweight object detection based on split attention and linear transformation. J. Zhejiang Univ. (Eng. Sci.) 2023, 67, 1195–1204. [Google Scholar]
Grover, S. Time-Series Representation Learning through Dynamic Temporal Reordering and Test-Time Adaptation. Master’s Thesis, Queen’s University, Kingston, ON, Canada, 2025. [Google Scholar]
Varshney, D.; Nagrath, P.; Vashishtha, S.; de Albuquerque, V.H.C. (Eds.) Generative Artificial Intelligence: Technology and Applications, 1st ed.; CRC Press: Boca Raton, FL, USA, 2025. [Google Scholar]
Wang, D.Z.; Wang, Y.L.; Yang, F.; Xu, L.Y.; Zhang, Y.N.; Chen, Y.R.; Liao, N. A Soft Sensor with Light and Efficient Multi-scale Feature Method for Multiple Sampling Rates in Industrial Processing. Mach. Intell. Res. 2024, 21, 400–410. [Google Scholar] [CrossRef]
Ding, F.Q.; Xu, C.; Liu, H.; Lyu, C.D.; Yang, G.Q.; Xiong, H.L.; Zhou, H.C. Global-local coherency contrastive learning for context-aware time series forecasting. Knowl.-Based Syst. 2026, 331, 114745. [Google Scholar] [CrossRef]
Shao, X.R.; Huang, R.Z.; Qin, Y.B. EnergyFormer: Dual-supervised adaptive multi-scale transformer for multistep energy forecasting. Expert Syst. Appl. 2025, 305, 130750. [Google Scholar] [CrossRef]

Figure 1. Overview of the HDCF-Mamba architecture. A dual-branch design combining the Mamba global branch (

O (L)

complexity) and MSFEM local branch with TimeBlocks. Input:

X \in R^{B \times L \times D}

. Output:

\hat{Y} \in R^{B \times T \times 1}

. Inputs are normalized per feature.

Figure 1. Overview of the HDCF-Mamba architecture. A dual-branch design combining the Mamba global branch (

O (L)

complexity) and MSFEM local branch with TimeBlocks. Input:

X \in R^{B \times L \times D}

. Output:

\hat{Y} \in R^{B \times T \times 1}

. Inputs are normalized per feature.

Figure 2. Operational flow of the Multi-Scale Feature Extraction Module (MSFEM). Illustrates the hierarchical resampling pipeline—comprising Permutation, Downsampling (AvgPool), and Upsampling—designed to extract temporal features across fine, medium, and coarse granularities to capture multi-frequency dynamics.

Figure 3. Detailed structure of the TimeBlock within the MSFEM branch. The TimeBlock (left) employs multiple parallel 2D convolutional kernels with kernel size 3 × 3 for deep local dependency modeling. The right side shows the feature transformation after the TimeBlock: the 4D output

M_{out} \in R^{B \times 3 \times L \times d_{model}}

is reshaped to

Y \in R^{B \times L \times d_{model}}

, which will later be fused with the Mamba global branch output via the HDCF module.

Figure 3. Detailed structure of the TimeBlock within the MSFEM branch. The TimeBlock (left) employs multiple parallel 2D convolutional kernels with kernel size 3 × 3 for deep local dependency modeling. The right side shows the feature transformation after the TimeBlock: the 4D output

M_{out} \in R^{B \times 3 \times L \times d_{model}}

is reshaped to

Y \in R^{B \times L \times d_{model}}

, which will later be fused with the Mamba global branch output via the HDCF module.

Figure 4. Characterization of stochastic meteorological inputs. Temporal variation of module temperature, humidity, and ambient temperature from one of the PV datasets, reflecting the environmental volatility and non-stationarity that drive inherent PV power intermittency.

Figure 5. Performance sensitivity of HDCF-Mamba to historical input length (L) on four real-world PV datasets. Demonstrates the consistent optimization of MSE, RMSE, and $R^{2}$ as L scales from 48 to 720 steps, validating the efficiency of Mamba’s

O (L)

mechanism in leveraging long-range contextual dependencies. The results clearly illustrate that HDCF-Mamba outperforms diverse benchmark models in tracking multi-scale fluctuations, especially in the strong non-stationary scenarios of the ECD-PV dataset.

Figure 5. Performance sensitivity of HDCF-Mamba to historical input length (L) on four real-world PV datasets. Demonstrates the consistent optimization of MSE, RMSE, and $R^{2}$ as L scales from 48 to 720 steps, validating the efficiency of Mamba’s

O (L)

mechanism in leveraging long-range contextual dependencies. The results clearly illustrate that HDCF-Mamba outperforms diverse benchmark models in tracking multi-scale fluctuations, especially in the strong non-stationary scenarios of the ECD-PV dataset.

Table 1. Summary of numerical results from key literature in PV power forecasting. Where multiple results were reported, we present the best-performing configuration on similar datasets for fair comparison.

Reference	Model	MAE	RMSE	Dataset
[5]	LSTM	0.272–0.320	0.475–0.594	Various PV sites
[6]	GRU	0.221–0.376	0.395–0.597	Various PV sites
[9]	SCINet	0.115–0.278	0.216–0.431	APS-PV, ECD-PV
[8]	iTransformer	0.143–0.312	0.259–0.464	APS-PV, ECD-PV
[22]	WPMixer	0.132–0.270	0.252–0.424	APS-PV, ECD-PV
[7]	Informer	0.143–0.399	0.259–0.538	Various benchmarks

Table 2. Performance benchmarking against RNN and hybrid architectures across four real-world PV datasets. The best results are in red; second-best are underlined in blue. HDCF-Mamba consistently yields the lowest error metrics (MAE, RMSE) and highest

R^{2}

, demonstrating superior accuracy in reconciling randomness and non-stationarity across diverse geographical climates.

Table 2. Performance benchmarking against RNN and hybrid architectures across four real-world PV datasets. The best results are in red; second-best are underlined in blue. HDCF-Mamba consistently yields the lowest error metrics (MAE, RMSE) and highest

R^{2}

, demonstrating superior accuracy in reconciling randomness and non-stationarity across diverse geographical climates.

Models	HDCF-Mamba (Our)			CNN-GRU-LSTM [13]			WPMixer [22]			DLinear [12]
Models	MAE	RMSE	$R^{2}$	MAE	RMSE	$R^{2}$	MAE	RMSE	$R^{2}$	MAE	RMSE	$R^{2}$
ECD-PV [24]	0.198	0.367	0.867	$0.219$	$0.397$	$0.845$	0.270	0.424	0.822	0.320	0.532	0.653
LSP-PV [21]	0.180	0.354	0.818	$0.196$	$0.355$	$0.816$	0.225	0.430	0.731	0.272	0.475	0.671
APS-PV [25]	0.118	0.219	0.953	0.133	0.264	0.931	$0.132$	$0.252$	$0.938$	0.187	0.371	0.865
PSB-PV [26]	0.287	0.452	0.815	0.399	0.593	0.688	$0.288$	$0.470$	$0.799$	0.400	0.652	0.624

Table 3. Comparative analysis with state-of-the-art (SOTA) forecasting frameworks. The best results are in red; second-best are underlined in blue. The dual-branch design ensures adaptability to extreme volatility, outperforming

O (L^{2})

attention-based models. Note that SCINet achieves a slight edge on the APS-PV dataset due to its specialized localized interaction design.

Table 3. Comparative analysis with state-of-the-art (SOTA) forecasting frameworks. The best results are in red; second-best are underlined in blue. The dual-branch design ensures adaptability to extreme volatility, outperforming

O (L^{2})

attention-based models. Note that SCINet achieves a slight edge on the APS-PV dataset due to its specialized localized interaction design.

Models	HDCF-Mamba (Our)			GRU [6]			SCINet (2022) [9]			iTransformer [8]
Models	MAE	RMSE	$R^{2}$	MAE	RMSE	$R^{2}$	MAE	RMSE	$R^{2}$	MAE	RMSE	$R^{2}$
ECD-PV [24]	0.198	0.367	0.867	0.376	0.597	0.649	$0.278$	$0.431$	$0.817$	0.312	0.464	0.787
LSP-PV [21]	0.180	0.354	0.818	0.221	0.395	0.772	$0.201$	$0.386$	$0.783$	0.254	0.467	0.683
APS-PV [25]	$0.118$	$0.219$	$0.953$	0.144	0.268	0.929	0.115	0.216	0.956	0.143	0.259	0.934
PSB-PV [26]	0.287	0.452	0.815	$0.300$	$0.499$	$0.780$	0.432	0.593	0.681	0.455	0.639	0.629

Table 4. Computational efficiency benchmark (Input Length

L = 720

, Prediction Horizon

H = 96

).

Table 4. Computational efficiency benchmark (Input Length

L = 720

, Prediction Horizon

H = 96

).

Model	Training (s/Epoch)	Inference (ms/Batch)	GPU Memory (MB)	Complexity
HDCF-Mamba (Ours)	18.4	12.5	1242	$O (L)$
iTransformer	26.8	18.2	2860	$O (L^{2})$
Autoformer	32.1	24.5	3410	$O (L log L)$
SCINet	15.2	10.8	1150	$O (L)$
Informer	29.4	21.3	3120	$O (L log L)$

Table 5. Ablation study on the stable LSP-PV dataset. The best results are in red; second-best are underlined in blue. These results verify the structural necessity of the Mamba branch, multi-scale resampling, and TimeBlock. The “Full Model” achieves optimal stability through the synergetic integration of global and local features.

Model	MAE	MSE	RMSE	$R^{2}$
Full Model	0.180	0.125	0.354	0.818
w/o Mamba	0.216	0.156	0.395	0.773
w/o Multi-scale	$0.187$	$0.129$	$0.360$	0.797
w/o TimeBlock	0.190	0.132	0.363	$0.807$

Table 6. Ablation analysis for the high-volatility ECD-PV dataset. The best results are in red; second-best are underlined in blue. The substantial performance degradation in the “w/o Mamba” variant objectively validates the indispensability of the selective state space mechanism for capturing macroscopic trends in extreme desert climates.

Model	MAE	MSE	RMSE	$R^{2}$
Full Model	0.198	0.134	0.367	0.867
w/o Mamba	0.216	$0.140$	$0.368$	$0.858$
w/o Multi-scale	$0.215$	0.149	0.386	0.853
w/o TimeBlock	0.217	0.144	0.380	0.857

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shi, W.; Zhao, H.; Deng, S.; Sun, A. HDCF-Mamba: Bridging Global Dependencies and Local Dynamics for Multi-Scale PV Forecasting. Energies 2026, 19, 1315. https://doi.org/10.3390/en19051315

AMA Style

Shi W, Zhao H, Deng S, Sun A. HDCF-Mamba: Bridging Global Dependencies and Local Dynamics for Multi-Scale PV Forecasting. Energies. 2026; 19(5):1315. https://doi.org/10.3390/en19051315

Chicago/Turabian Style

Shi, Wenzhuo, Hongtian Zhao, Siyin Deng, and Aojie Sun. 2026. "HDCF-Mamba: Bridging Global Dependencies and Local Dynamics for Multi-Scale PV Forecasting" Energies 19, no. 5: 1315. https://doi.org/10.3390/en19051315

APA Style

Shi, W., Zhao, H., Deng, S., & Sun, A. (2026). HDCF-Mamba: Bridging Global Dependencies and Local Dynamics for Multi-Scale PV Forecasting. Energies, 19(5), 1315. https://doi.org/10.3390/en19051315

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HDCF-Mamba: Bridging Global Dependencies and Local Dynamics for Multi-Scale PV Forecasting

Abstract

1. Introduction

2. Related Work

2.1. Physical Model

2.2. Statistical and Machine Learning Models

2.3. Deep Learning Models

3. Methodology

3.1. Design Philosophy of HDCF-Mamba

3.2. General Framework

3.3. Data Preprocessing and Feature Embedding

3.3.1. Statistical Normalization

3.3.2. High-Dimensional Feature Embedding

3.4. Multi-Scale Convolutional Feature Extraction Module (MSFEM)

3.4.1. Multi-Scale Transformation and Resampling

3.4.2. TimeBlock Feature Enhancement

3.5. Mamba State Space Model

3.5.1. Selective State Space Mechanism

3.5.2. Global Correlation Capture

3.6. Heterogeneous Feature Fusion and Prediction

3.6.1. Heterogeneous Feature Alignment and Fusion

3.6.2. Two-Step Linear Projection Decoder

3.7. Multi-Horizon Prediction and Loss Function

4. Experiment

4.1. Experimental Setup

4.1.1. Datasets and Environmental Context

4.1.2. Evaluation Metrics

4.1.3. Implementation Details

4.2. Quantitative Results and Analysis

4.2.1. Overall Performance and Temporal Sensitivity Analysis

4.2.2. Benchmarking Against State-of-the-Art (SOTA) Models

4.2.3. Performance on Strong Non-Stationary/Multi-Scale Fluctuation Dataset

4.2.4. Computational Efficiency Analysis

4.3. Ablation Study

4.3.1. Experimental Design and Configurations

4.3.2. Quantitative Results and Discussion

4.4. Limitations and Applicability

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI