1. Introduction
Against the backdrop of profound restructuring of the global energy mix [
1] and large-scale integration of renewable energy [
2], Integrated Energy Systems (IESs) [
3] have emerged as a critical technical pathway for achieving efficient, low-carbon, and resilient energy operation. By organically coupling multiple energy carriers—electricity, heating, and cooling—and increasingly extending to freshwater production within the water–energy nexus paradigm [
4,
5], IES mitigate short-term volatility and supply–demand uncertainty in power [
6], heat [
7], and cooling [
8] subsystems, while dynamic routing and energy storage improve overall utilization efficiency [
9] and operational economic performance. In modern energy infrastructure, accurate load forecasting is indispensable for stochastic dispatching and multi-timescale scheduling optimization [
10,
11], data-driven grid security assessment [
10], and flexibility resource allocation [
11]. How to extract high-resolution decision support from large-scale, multi-source heterogeneous information has, therefore, become a key research topic in power and energy engineering.
In practice, IES load forecasting faces several intertwined challenges. First, IESs encompass multiple load types (e.g., electricity, heating, cooling) and involve coordinated data acquisition from diverse channels, including sensor networks [
12], distributed energy monitoring systems, and building energy management platforms. The resulting multi-source data exhibit pronounced heterogeneity in granularity, timeliness, and credibility, demanding effective information fusion strategies—as demonstrated even in electromechanical diagnostics [
13]—and robust data quality assurance techniques [
14] as reliable baselines. Second, along the temporal dimension, load time series display intra-day periodic fluctuations, weekly cycles, seasonal patterns, and long-term trends, inducing time-shift characteristics and inertia-driven delays [
15] that resemble the complex volatility patterns observed in environmental time-series forecasting [
16]. Third, along the spatial dimension, different load types form multi-level coupling through the energy network topology, giving rise to cross-carrier coupled transfer effects [
3]. These intertwined temporal and spatial complexities pose fundamental questions regarding multi-scale feature extraction, coupling-mechanism modeling, and robust handling of data quality that require deeper exploration and innovative solutions.
In recent years, IES load forecasting has evolved from foundational architectural innovations to increasingly sophisticated system-level applications. Early methods largely relied on traditional machine learning (ML). Idowu et al. [
17] employed support vector machines, regression trees, and feedforward neural networks for building heat load forecasting. While effective for simpler tasks, these methods degrade markedly in accuracy and generalization when confronted with complex spatiotemporal dependencies and multi-energy coupling. To address this, Rodrigues et al. [
18] introduced a method combining functional clustering with ensemble learning; however, computational efficiency remains a challenge as data scale grows. Tan et al. [
19] proposed a joint forecasting model based on multi-task learning and least squares support vector machines (LSSVM), improving accuracy by sharing weights across electricity, heat, cooling, and gas. Subsequently, Alsharekh et al. [
20] presented an approach combining evolutionary algorithms with data decomposition and wavelet transforms for short-term load forecasting (STLF), though limitations persist for high-dimensional complex data. Overall, both traditional statistical models [
21] and classical ML approaches [
22] often struggle to simultaneously capture multimodal features and long-range dependencies, leading to insufficient feature exploitation and poor generalization in high-dimensional nonlinear settings [
23].
Deep learning, with its powerful nonlinear fitting and hierarchical feature extraction capabilities, has become the mainstream paradigm for IES load forecasting. A critical line of research focuses on spatiotemporal feature modeling. Chen et al. [
24] proposed a multi-scale convolutional neural network (CNN) combined with long short-term memory (LSTM) networks, achieving strong performance in multi-energy load forecasting but leaving practical issues of computational efficiency and scalability unresolved. Zhao et al. [
25] introduced a multi-step residential load forecasting method based on a graph attention mechanism and a Transformer model, which improved multi-step accuracy by fusing spatiotemporal graph information. Banerjee et al. [
26] proposed a Spatial–Temporal Synchronous Graph Transformer network (STSGT), which synchronously captures spatial and temporal dependencies via multi-head self-attention operating on a synchronous spatiotemporal graph; however, this framework was originally designed for epidemiological forecasting and does not address the unique multi-energy coupling present in IES.
Another important direction addresses multi-scale and multi-task learning for capturing cross-energy-carrier correlations. Song et al. [
27] proposed a multi-stage LSTM federated forecasting method that jointly models multi-load interactions under multi-time-scale settings, markedly improving accuracy. More recently, Song et al. [
28] transformed multi-energy load forecasting into a hierarchical multi-task learning problem with spatiotemporal attention, designing a gated temporal convolutional network to analyze coupling relationships among energy sources. While this approach effectively improves forecasting accuracy through task hierarchy, it processes temporal and spatial dependencies in a separate manner, potentially missing the fine-grained interactions that arise from the topological structure of energy consumption units. Although multi-scale feature extraction techniques from adjacent domains—such as dynamic trend fusion for traffic prediction [
29] and wavelet-guided frequency decomposition for fault prediction [
30]—offer transferable methodological insights, they have not been specifically tailored to the multi-energy coupling inherent in IES. More critically, the recurrent memory mechanisms underlying most existing IES forecasting models face fundamental limitations. Recent advances in extended recurrent architectures, such as xLSTM [
31] with exponential gating and stabilized memory mixing, have demonstrated improved sequence modeling in language tasks; neural ordinary differential equation (ODE) approaches [
32] and log-domain gating stabilization [
31] have also been explored for gradient stabilization in deep recurrent networks. However, none of these innovations have been specifically designed for the multi-source heterogeneous data fusion, spatiotemporal topology encoding, and multi-energy coupling requirements unique to IES load forecasting.
Despite significant progress, several critical limitations persist in the state of the art. (1) Existing data representation methods for IES typically flatten multi-source heterogeneous inputs into simple vector concatenations, failing to encode the functional heterogeneity of energy consumption units and their geospatial distribution topology. (2) While multi-scale convolutional and Transformer-based architectures have shown promise, they have not been jointly optimized to exploit both fine-grained local structures and broad global associations simultaneously within a unified framework. (3) Recurrent models such as LSTM and GRU suffer from gradient vanishing/explosion in long-horizon settings. Although recent advances—including xLSTM [
31] with exponential gating and novel memory structures, neural ODE approaches [
32] with continuous-time dynamics, and log-domain gating stabilization—have partially addressed these issues in general sequence modeling, they have not been specifically designed for the multi-energy coupling, heterogeneous data fusion, and numerical stability requirements particular to long-horizon IES load forecasting, leaving a critical methodological gap. (4) Ablation studies in prior works are often incomplete, and the contribution of individual components (e.g., topology encoding, multi-scale fusion, and memory mechanisms) is rarely quantified independently.
To address these limitations, we propose a Multi-Scale Spatiotemporal Fusion with Steady-State Memory-Driven Network (MSTF-SMDN) tailored for real-world IES load forecasting tasks. Considering the functional heterogeneity of energy consumption units and the geospatial distribution topology, we design a Spatiotemporal Topology Encoder (STG) that maps multi-source heterogeneous time series into a tensorized multi-energy spatiotemporal topological representation via fuzzy classification and multi-scale proximity ranking, enabling unified embedding across temporal and spatial dimensions. We then construct a MultiScale Hybrid Convolver (MSHC) that combines depthwise separable convolutions with dynamic channel fusion to simultaneously capture fine-grained local structures and broad global associations at multiple scales. Furthermore, we develop a Temporal Segmentation Transformer (TST) and a Steady-State Exponentially Gated Memory Unit (SEGM), proposing a global–local collaborative forecasting algorithm that captures long-range temporal dependencies through Transformer-based attention and models local steady-state dynamics through numerically stabilized exponential gating. Extensive experiments on a public real-world dataset demonstrate that MSTF-SMDN achieves consistent and substantial improvements over both classical and state-of-the-art baselines: compared to the strongest baseline (TimesNet), our method reduces cooling load RMSE by 16.09%, heating load RMSE by 12.97%, and electric load RMSE by 6.14%, while achieving R2 values of 0.99435, 0.98701, and 0.96722 for the three load types, respectively.
The main contributions are summarized as follows:
We design a Spatiotemporal Topology Encoder that overcomes the limitations of conventional flat data representations by reconstructing multi-source heterogeneous data into a tensorized spatiotemporal topological representation. Through fuzzy functional classification and multi-scale spatial proximity ranking, the encoder preserves both the functional attributes and geospatial distribution of energy consumption units, capturing the dynamic evolution of load data across temporal and spatial dimensions.
We propose a MultiScale Hybrid Convolver that integrates depthwise separable convolutions with dynamic channel fusion to synchronously extract deep interactions between local details and global patterns across multiple scales (3 × 3, 5 × 5, 7 × 7), significantly enhancing the multidimensionality and discriminative power of load feature representations.
We integrate a Temporal Segmentation Transformer with a Steady-State Exponentially Gated Memory Unit, where the former captures global temporal dependencies via multi-head self-attention on time-series segments and the latter models local steady-state dynamics via log-domain exponential gating that provably stabilizes gradient propagation. Their synergy yields a global–local jointly optimized forecasting model with notable improvements in accuracy and stability.
4. Conclusions
We propose a multi-scale spatiotemporal fusion and steady-state memory-driven IES load forecasting method (MSTF-SMDN) by designing four tightly coupled modules: a Spatiotemporal Topology Encoder (STG), a MultiScale Hybrid Convolver (MSHC), a Temporal Segmentation Transformer (TST), and a Steady-State Exponentially Gated Memory Unit (SEGM). The STG addresses the challenge of unified spatiotemporal embedding of multi-source heterogeneous data through fuzzy functional classification and multi-scale proximity ranking. The MSHC extracts fine-grained local structures and broad global associations at multiple scales via parallel heterogeneous kernels (3 × 3, 5 × 5, 7 × 7) with dynamic channel fusion. The TST captures long-range temporal dependencies through segment-level multi-head self-attention, while the SEGM models local steady-state dynamics via log-domain exponential gating that provably stabilizes gradient propagation.
Extensive experiments on the Arizona State University Campus Metabolism dataset demonstrate that MSTF-SMDN achieves consistent and substantial improvements over five representative baselines. Compared to the strongest baseline (TimesNet), MSTF-SMDN reduces cooling load RMSE by 16.09%, heating load RMSE by 12.97%, and electric load RMSE by 6.14%, while achieving R2 values of 0.99435, 0.98701, and 0.96722, respectively. Ablation studies confirm that each module contributes meaningfully: the MSHC provides the largest single-module improvement (↓20.75% cooling RMSE), while the SEGM outperforms standard LSTM by ↓18.08% on heating load. Furthermore, replacing the learned adaptive fusion with equal-weight averaging increases electric and cooling RMSE by and , respectively, validating the necessity of the 1 × 1 convolution-based adaptive channel fusion mechanism. A sensitivity analysis of the TST segment length reveals that achieves the best cooling load RMSE (290.04) while maintaining near-optimal performance on electric and heating loads, confirming it as the optimal multi-energy trade-off.
Despite these promising results, several limitations remain. First, the current evaluation is limited to a single campus-scale dataset located in the hot-arid climate zone of Tempe, Arizona; the claim of broad applicability should be tempered by the recognition that location-specific factors—including climate zone, building stock composition, occupancy culture, and energy pricing structures—can significantly affect load patterns and model transferability. Generalization to larger-scale, multi-district IES across diverse climatic and socioeconomic contexts requires further validation on geographically varied datasets. Second, the model currently adopts a single-step forecasting strategy with a 15 min horizon; extension to multi-step and multi-horizon forecasting (e.g., 1 h, day-ahead) warrants investigation, as many practical IES scheduling and dispatch tasks rely on longer lead times. Third, the computational cost of the full pipeline may limit deployment on resource-constrained edge devices. Fourth, the current uncertainty quantification relies on post-hoc MC-Dropout rather than a probabilistic framework integrated into training.
Future work will pursue the following directions: (1) validating the framework on multiple geographically and climatically diverse datasets to establish robustness across different IES configurations and weather regimes; (2) incorporating federated or privacy-preserving learning frameworks to enable multi-site model training under data-security constraints; (3) enhancing responsiveness to sudden load fluctuations and extreme events through online adaptation mechanisms; (4) optimizing computational efficiency via model compression and knowledge distillation to support real-time deployment; (5) extending the framework to multi-horizon forecasting with integrated probabilistic training (e.g., deep ensembles, Bayesian layers) for principled uncertainty quantification in risk-aware energy dispatch; (6) broadening the scope of the integrated energy system to include freshwater production under the water–energy nexus, thereby capturing desalination and water-treatment demands that influence overall system sizing and optimization.