Next Article in Journal
Cybersecurity of Unmanned Aerial Vehicles from a Control Systems Perspective: A Review
Next Article in Special Issue
Resilience-Oriented Energy Management of Networked Microgrids: A Case Study from Lombok, Indonesia
Previous Article in Journal
Development of Crawling and Knowledge Graph Technologies for Tracking Organized Sexual Offenses on Social Media X
Previous Article in Special Issue
Enhancing Distribution Network Resilience Using Genetic Algorithms
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Multi-Scale and Adaptive Multi-Period Deep Learning with Compression-Fusion Attention for Cold Storage Load Prediction

1
College of Mathematics and Computer Science, Shantou University, Shantou 515063, China
2
College of Law, Shantou University, Shantou 515063, China
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(1), 160; https://doi.org/10.3390/electronics15010160
Submission received: 9 December 2025 / Revised: 25 December 2025 / Accepted: 27 December 2025 / Published: 29 December 2025

Abstract

Accurate load forecasting is essential for energy-efficient scheduling in cold storage facilities, where cooling demand is shaped by strong periodicity, nonlinear temporal dynamics, and irregular operational disturbances. Traditional statistical and machine-learning models struggle with these multi-scale variations, and existing deep learning approaches often rely on fixed receptive fields or fail to extract adaptive periodic structures. This study introduces MA-CFAN, a multi-scale and adaptive multi-period forecasting framework that integrates temporal decomposition, dynamic frequency-period identification, and a newly designed Compression-Fusion Attention Block (CFABlock) for cross-period representation learning. The architecture leverages FFT-derived adaptive periods to capture seasonal-trend components and employs compression-fusion attention to enhance feature discrimination across temporal scales. Furthermore, this work provides the first systematic evaluation of state-of-the-art forecasting models, including Informer, Autoformer, iTransformer, TimesNet, DLinear, and TimeMixer, to the domain of cold storage load prediction. Experiments on real operational data from a logistics center in Jinan, China, demonstrate that MA-CFAN consistently outperforms all baselines, reducing average MSE and MAE by up to 19.3% and 14.8%, respectively.

1. Introduction

With the accelerating penetration of renewable energy and the growing importance of demand-side flexibility in modern power systems, the controllability and responsiveness of load-side resources have become critical to maintaining secure and efficient grid operation [1]. As one of the most energy-intensive components of the cold-chain logistics sector, cold storage facilities exhibit considerable flexibility potential. By regulating refrigeration cycles and switching between freezing and insulation modes, cold storage can provide short-term peak shaving and load shifting [2], enabling participation in demand response programs, load aggregation, and virtual power plant applications. The accuracy of cold storage load forecasting directly affects the feasibility of scheduling strategies, operational safety margins, and overall response performance, thereby determining whether demand-side resources can reliably contribute to system regulation [3]. However, due to the diversity and uncertainty inherent in cold storage operations, load profiles often display strong intermittency, multi-timescale coupling, and substantial operational noise, making accurate forecasting a highly challenging task [4].
The intermittency of cold storage loads originates from compressor on-off cycling [4,5], chamber-specific temperature regulation policies, and irregular logistics activities. Operational peaks such as inbound, outbound, and picking processes can induce sharp short-term fluctuations [6,7], whereas nighttime and insulation phases exhibit sustained low-load periods. Such instability prevents models relying on stationarity assumptions from capturing critical temporal patterns. Moreover, cold storage loads encompass multiple superimposed cycles—from minute-level compressor switching [8], to hourly logistics rhythms, daily business routines, and long-term seasonal variations—making single-timescale models insufficient for learning complex periodic dependencies [9,10]. Compounding these challenges is the prevalence of high-frequency noise generated by temperature deadbands, equipment wear, ambient temperature variations, and frequent door openings. These factors cause effective temporal patterns to become sparse, while noisy and redundant components dominate the sequence, ultimately degrading model generalization and robustness. Therefore, developing forecasting models capable of capturing multi-period structures while effectively suppressing noise is essential for practical cold storage scheduling.
Time-series forecasting has long served as an important methodology in power systems, industrial control, finance, and meteorology, progressing through traditional statistical models, machine learning models, and deep learning approaches [11]. Classical statistical models such as ARIMA and SARIMA have been widely employed in load forecasting [12,13,14] due to their clear mathematical formulation of trends, cycles, and autocorrelations. Nonetheless, their reliance on linearity and stationarity assumptions limits their ability to handle non-stationary, nonlinear, and multi-scale industrial load patterns, while requiring heavy feature engineering. Machine learning models such as SVM [15], random forests (RF) [16], and gradient boosting decision trees (GBDT) [17] enhance nonlinear modeling capacity through kernel functions and ensemble techniques, but still depend on manually constructed window features and struggle to capture long-range dependencies or multi-cycle interactions. Furthermore, in high-noise industrial scenarios, their lack of sequence context modeling reduces predictive robustness. Deep learning has advanced time-series forecasting by enabling automatic pattern extraction from raw sequences [18,19,20,21] Recurrent neural networks (RNNs) and their variants LSTM and GRU address gradient issues through gating mechanisms and perform recursive temporal modeling. Nevertheless, their sequential computation limits parallelization and hampers their ability to model long-range and multi-scale dependencies, especially under noisy and irregular load conditions. Temporal Convolutional Networks (TCN) [22] introduce dilated convolutions to expand the receptive field efficiently, achieving strong stability and training efficiency in industrial applications. Yet, convolutional architectures lack explicit mechanisms for modeling cross-scale periodic interactions, limiting their applicability in scenarios with stacked multi-period patterns.
Transformer-based forecasters have recently advanced long-sequence prediction through innovations such as sparse attention (Informer [23]), decomposition-based trend–season separation (Autoformer [24]), enhanced temporal encoding (iTransformer [25]), and patch/channel decoupling (PatchTST [26]). Frequency-structured models like TimesNet [27] and TimeMixer [28], as well as lightweight linear models like DLinear [29], further boost multi-scale pattern extraction. In contrast, the cold-storage load exhibits domain-specific complexities that fundamentally violate the assumptions underlying these models. Compressor on–off switching produces nonstationary micro-cycles that drift in both amplitude and duration; logistics-driven operations introduce abrupt, high-magnitude impulses that obscure stable periodic patterns; and chamber-level heterogeneity generates asynchronous cycles with inconsistent phase alignment. These factors result in aliasing and period fragmentation, causing existing multi-scale or multi-period modules to either detect incorrect periodicities, overfit to noise-induced pseudo-cycles, or misalign key temporal dependencies across different scales. Consequently, methods that rely on fixed decomposition schemes, static kernel periods, or rigid inter-scale interactions often fail to construct a coherent representation of the true load dynamics.
Motivated by these observations, this work proposes the multi-scale and Adaptive Multi-Period with Compression–Fusion Attention Network (MA-CFAN), which tackles the above challenges at their root. The MA-CFAN begins with a multi-resolution frequency-domain probing module that adaptively identifies latent and drifting periodicities rather than assuming fixed or global periods, enabling faithful extraction of compressor-level micro-cycles, operational mid-range rhythms, and long-term thermal inertia trends. To address noise-induced pattern dilution, a Compression–Fusion Attention (CFA) mechanism is introduced to compress redundant high-frequency activations within the attention space and selectively amplify time steps that truly govern future loads, substantially improving robustness in the presence of impulsive logistics events and switching noise. Moreover, MA-CFAN incorporates a period-weight aggregation module that dynamically evaluates the reliability of different periodic subspaces and performs cross-period fusion only when consistent dependencies exist, thereby mitigating phase misalignment and aliasing effects. Together, these components form a domain-tailored architecture capable of reconstructing stable, interpretable multi-scale structures from the intrinsically unstable and intermittently perturbed cold-storage load signals. The main contributions of this study are threefold.
  • We propose MA-CFAN, a forecasting model tailored to cold storage load characteristics, capable of jointly modeling multi-scale dependencies and suppressing operational noise through adaptive period mining and compression-fusion attention.
  • We establish a systematic benchmark of state-of-the-art deep time-series models under real cold storage conditions, providing methodological references for future research.
  • Using real operational data, we demonstrate the model’s superior forecasting performance and its potential applications in demand response and coordinated cold storage cluster scheduling. Overall, this work offers a new pathway for cold storage load forecasting and contributes to the broader development of intelligent control strategies for industrial flexible loads.
The remainder of this paper is organized as follows. Section 2 describes the overall architecture of the proposed MA-CFAN model, including its multi-scale design, adaptive period extraction, and compression-fusion attention mechanism. Section 3 introduces the dataset, baseline models, and experimental setup. Section 4 presents the experimental results, comparative analysis, and ablation studies. Finally, Section 5 concludes the paper and discusses limitations and directions for future research.

2. Materials and Methods

This section introduces the overall architecture and technical pathway of the proposed Multi-Scale and Adaptive Multi-Period Compression-Fusion Attention Network (MA-CFAN). Unlike existing forecasting models that struggle with drifting micro-cycles, noisy pseudo-periods, and misaligned multi-scale dependencies, MA-CFAN is explicitly designed to overcome these domain-specific challenges and reconstruct stable temporal structures from highly volatile cold-storage load sequences. The method targets several inherent difficulties in cold storage load forecasting, including multi-scale temporal patterns, multi-period structures, trend-seasonality entanglement, and substantial operational noise. As illustrated in Figure 1, the framework integrates multi-resolution temporal processing, adaptive period extraction in the frequency domain, and unified trend-season modeling within a coherent architecture. The model comprises three major components: (1) a multi-scale input projection module composed of a multi-scale processing layer and an embedding layer; (2) the CFABlock layer, which performs parallel modeling of temporal dependencies across different temporal scales; and (3) an output prediction head that aggregates features from all scales to generate the final forecasting results.

2.1. Problem Definition

The task of cold storage load forecasting investigated in this study can be formalized as follows. Given a historical window of length T, containing past cold storage power measurements, equipment operational states (e.g., compressor status, pump pressure), cold storage operation indicators (e.g., door-opening events), ambient temperature, internal sensor measurements, and temperature setpoint boundaries, the goal is to train a predictive function F ( · ) that estimates the cold storage power over the next T time steps. This can be expressed as:
[ X t T + 1 , , X t ] F ( · ) [ P t + 1 , , P t + T ] ,
where X i R 1 × N represents the multivariate input at time step i, and N denotes the number of input features. In our dataset, N = 57 (details are provided in the Section 3.1). The output P j R 1 denotes the predicted cold storage power at future time step j.

2.2. Multi-Scale and Multi-Period Design

Cold storage load exhibits inherent multi-scale temporal characteristics, wherein different temporal resolutions emphasize distinct structural properties. Fine-grained sequences capture rapid fluctuations and localized patterns, whereas coarse-grained sequences reveal long-term trends and smooth variations [30]. As illustrated in Figure 2, hourly load data manifest strong high-frequency oscillations, daily sequences highlight clearer trend components, and weekly sequences show that most high-frequency variations have been smoothed out. These behaviors arise from the combined influence of multiple interacting factors—such as compressor on-off cycling, product loading and unloading operations, ambient temperature variation, and circadian rhythms—which collectively induce multi-scale structures and rich multi-periodicity in cold storage loads. Moreover, real-world data are further perturbed by stochastic disturbances and dynamic control strategies, making robust modeling even more challenging.
To address these characteristics, we design a multi-scale parallel adaptive period segmentation module. Specifically, the raw load sequence is first downsampled to obtain representations at different temporal resolutions, capturing both fine and coarse patterns. Although down sampling may smooth short-term fluctuations at coarser resolutions, this effect is mitigated by the parallel multi-scale architecture of MA-CFAN. Fine-grained sequences remain fully preserved and are processed independently, ensuring that short-term dynamics are not discarded. The final prediction head aggregates representations from all scales, allowing both high-frequency fluctuations and long-term trends to jointly inform the forecasting result. Then, instead of retaining all Fourier components, we apply FFT to each scale and extract only the top-k frequency components with the highest amplitudes. Alternative time-frequency analysis methods such as wavelet transforms or learnable spectral filters were considered. Despite this, FFT-based period identification was adopted due to its computational efficiency, robustness to noise, and clear physical interpretability, which are particularly advantageous in industrial cold storage scenarios. Exploring learnable spectral representations is identified as a promising direction for future work. In this study, the number of dominant frequencies k is treated as a small hyperparameter and is selected from the range 3–5 based on validation performance and physical interpretability of cold storage operations. Empirically, this range is sufficient to capture compressor micro-cycles, mid-term operational rhythms, and long-term thermal trends. Sensitivity analysis shows that MA-CFAN is robust to moderate variations in k, as the subsequent amplitude-normalized aggregation mechanism assigns lower weights to less informative frequency components, thereby reducing the risk of overfitting or information dilution. This strategy focuses the model on the most informative periodic signals while suppressing noise-dominated frequencies that may cause overfitting. At the same time, selecting multiple dominant frequencies avoids the information loss that would arise from oversimplified period assumptions, thereby enabling the model to effectively capture the complex multi-period temporal behaviors inherent in cold storage load data.

2.3. MA-CFAN

2.3.1. Multi-Scale Input Projection Layer

Prior studies such as PatchTST and DLinear adopt channel-independent processing to better model intra-variable temporal relations [26,29,30]. However, cold storage load dynamics fundamentally arise from coupled interactions among multiple variables (e.g., compressor cycling, door operations, ambient temperature fluctuations). Therefore, we adopt a channel-mixing strategy to explicitly capture these cross-variable temporal dependencies, which is crucial for accurate cold-load modeling. The multi-scale input projection layer consists of two modules: (i) multi-scale processing, and (ii) embedding. Given the original input sequence X R L × N , we apply average pooling to construct M + 1 scales:
X M = x 0 , , x M , x m = a v e r a g e p o o l i n g ( x m 1 ) , x m R L / 2 m × N ,
where L is the sequence length and N is the feature dimension.
Each scale then undergoes channel-mixed value embedding and temporal positional encoding:
X M 0 = v a l u e _ e m b e d d i n g ( X M ) + P o s i t i o n a l E m b e d d i n g ( X M ) ,
producing the embedded sequence set x m 0 R L / 2 m × d _ model , where d _ model denotes the embedding dimension.

2.3.2. CFABlocks

After projection, the M + 1 multi-scale sequences are processed independently by M + 1 parallel CFABlocks. For the m-th scale, the forward computation is:
x m 1 = C F A B l o c k m ( x m 0 ) ,
Each CFABlock consists of three major components:
(a)
Multi-Period Reshape
For a given scale, the input x m 0 R L / 2 m × d _ model is adaptively transformed into K higher-dimensional 2D representations. FFT is applied independently to each sample on the channel-averaged embedded sequence. The amplitude spectrum is obtained by averaging magnitudes across embedding dimensions, and the top-k frequencies with the largest averaged amplitudes are selected.
{ A f 1 , , A f K } { f 1 , , f K } , { p 1 , , p K } = FFT ( x M 0 ) ,
where A f k denotes the unnormalized amplitude and
p k = T f k , k { 1 , , K } ,
represents the estimated period length. Based on the detected frequencies, the 1D input is padded and reshaped into 2D tensors:
x m _ 2 D 0 , i = Reshape p i , f i Padding ( x m 0 ) , i { 1 , , K } ,
Each tensor x m 0 R L / 2 m × d _ model , captures intra-period variations (columns) and inter-period variations (rows).This dual locality allows attention to extract structural patterns along both directions.
(b)
Compression-Fusion Attention
The CFABlock is designed to simultaneously capture seasonal repetition and trend evolution across multiple periods, while suppressing redundant noise through directional compression. The structure of compressing attention is shown in the Figure 1, Once reshaped by period each 2D representation naturally separates into: a seasonal tensor sensitive to intra-period repetition, and a trend tensor sensitive to inter-period evolution. To focus on meaningful periodic structures, compression-fusion attention does not apply attention directly to the full 2D tensor. Instead, it performs directional adaptive compression: seasonal branch: compress along the trend dimension, and trend branch: compress along the seasonal dimension. This produces compact vectors that preserve representative periodic and trend-related patterns while discarding high-frequency noise caused by dead zones, device wear, or door openings.
Let the multi-period reshaped feature tensor be X R P × L × d , where P denotes the number of extracted periods, L is the intra-period length, and d is the embedding dimension. The seasonal branch compresses information along the inter-period dimension using average pooling:
S ( l , : ) = 1 P p = 1 P X ( p , l , : ) , l = 1 , , L ,
where S R L × d represents the compressed seasonal representation. Similarly, the trend branch compresses information along the intra-period dimension:
T ( p , : ) = 1 L l = 1 L X ( p , l , : ) , p = 1 , , P ,
where T R P × d denotes the compressed trend representation.
Within the Compression-Fusion Attention (CFA) module, Full Attention (Figure 3) serves as the fundamental operation applied to each compressed 2D tensor representation. After multi-period decomposition and structural reshaping, each transformed 2D tensor X l , D i 2 D is projected into query, key, and value spaces:
Q i = X l , D i 2 D W Q , K i = X l , D i 2 D W K , V i = X l , D i 2 D W V .
where W Q , W K , W V R d × d k are learnable matrices. CFA applies the standard scaled dot-product attention on each decomposed period component:
Attn i = Softmax Q i K i d k V i .
This operation computes dense token-to-token interactions within each compressed temporal-frequency block, allowing the model to extract both intra-period structure and localized multi-frequency patterns. Full Attention measures the similarity between all token pairs in the compressed representation through the dot product Q i K i . The Softmax function then assigns adaptive weights to each position, enabling the model to:
  • focus on salient temporal regions within a specific period,
  • capture high-resolution relationships preserved during compression,
  • maintain full expressiveness despite operating on shorter sequences.
Because the input to Full Attention has already been compressed through multi-period restructuring, CFA retains Full Attention’s modeling power while dramatically reducing computational cost. The attention outputs Attn i are later fused across periods using amplitude-normalized adaptive weights, completing the CFA pipeline.
The seasonal branch is processed using a Full Attention, enabling it to highlight repetitive structures and phase shifts. The trend branch preserves long-range evolution patterns without distortion. Both branches integrate time-frequency joint modeling, allowing attention to consider temporal locality and energy distribution across frequencies. After attention, the two branches are reshaped back to their 3D forms and fused multiplicatively: not by simple concatenation, but through element-wise interaction, allowing the model to dynamically emphasize trend-enhanced seasonality or seasonally modulated trend shifts. This asymmetric compression-fusion strategy enables CFABlock to explicitly model the interaction between long-term trend evolution and intra-period seasonal repetition while maintaining low computational complexity. This mechanism explicitly encodes the real-world phenomenon where repeated behaviors (e.g., compressor cycling) vary with long-term operating levels (e.g., daytime vs. nighttime load), making CFABlock well aligned with the coupled structures in cold storage operations.
(c)
periodic weight aggregation
Finally, to generate the input for the next layer, the CFABlock fuses the k extracted 1D representations { X ^ l , D 1 1 D , , X ^ l , D k 1 D } through an adaptive, amplitude-informed weighting mechanism. Inspired by the Auto-Correlation principle in [24], the amplitude values A associated with each selected frequency quantify the relative significance of the corresponding periodic component. These amplitudes naturally reflect the contribution of each transformed 2D tensor, enabling a principled fusion strategy. To this end, we first normalize the amplitudes using a Softmax function:
A ^ f 1 l 1 , , A ^ f k l 1 = Softmax ( A f 1 l 1 , , A f k l 1 ) ,
and then compute the aggregated representation as
X l 1 D = i = 1 k A ^ f i l 1 · X ^ l , D i 1 D ,
Since both intra-period and inter-period variations have already been encoded within the set of structurally enriched 2D tensors, this amplitude-guided fusion enables the CFABlock to effectively capture diverse multi-scale temporal patterns. Consequently, the CFABlock provides a more expressive and robust representation than directly modeling the raw 1D input sequence, ensuring stronger temporal modeling capability across heterogeneous periodicities.

2.4. Output Prediction Head

After processing all scales, we obtain the multi-scale feature set. Since each scale captures distinct temporal patterns, we assign an independent prediction head to each scale:
output = Averaging { Head m ( x m 1 ) } m = 0 M ,
where Head m ( · ) is a linear layer used for the m-th scale.

3. Baseline Models and Experimental Setup

3.1. Dataset

This study utilizes the historical operational data from a cold storage facility located in Jinan, China. The warehouse occupies approximately 4658 m2 and consists of three freezer chambers and a loading bay. Its cooling capacity is jointly supplied by a compressor system with an input power of 265.2 kW and an air-cooling system rated at 149.7 kW.
The dataset spans the period from 26 June 2023 to 11 June 2024, covering nearly one full year of operation. The raw operational data are originally recorded at a 10-min resolution, yielding a total of 50,550 time steps, each containing 57 features (see Table 1). For model training and evaluation, the data are aggregated to a 1-h resolution using mean aggregation to reduce high-frequency noise and align with practical scheduling requirements. The features are closely related to the cold-storage load, including:
  • Historical load power, which serves as the prediction target.
  • Equipment operational data, characterizing the states of compressors, evaporators, and fans, which directly influence load variations.
  • Operational status signals, such as door-opening records, reflecting cargo inflow/outflow that alters thermal disturbances.
  • Temperature setpoints (upper and lower bounds), which govern the regulation behavior of the cooling system. For example, after high-temperature goods enter the chamber, compressors and fans compensate by increasing power output.
  • Outdoor ambient temperature, which impacts overall energy consumption: higher loads in summer due to heat ingress and reduced loads in winter.
  • Chamber temperatures of the three freezers, included to enrich input dimensionality and improve forecasting precision.
All features are normalized using Z-score standardization. Occasional missing values are handled using forward filling, while abnormal sensor readings are implicitly mitigated through Z-score normalization and the model’s robustness to noise. Extreme operational events are retained in the dataset, as they reflect realistic cold storage operating conditions. The dataset is split chronologically into 70% training, 10% validation, and 20% testing subsets to prevent information leakage between past and future observations.

3.2. Baseline Models

To comprehensively evaluate the effectiveness of the proposed MA-CFAN model, we compared it against a diverse set of state-of-the-art baselines covering recurrent architectures, linear models, decomposition-based networks, and Transformer-family models. These baselines represent the mainstream paradigms in contemporary time-series forecasting. In addition, we include a simple persistent (seasonal naive) baseline as a reference. This baseline generates predictions by directly copying historical observations from previous temporal patterns (e.g., the most recent value or the value at the same hour of the previous day), without any parameter learning. Although conceptually simple, such a seasonal naive model serves as a meaningful lower-bound benchmark that anchors the intrinsic difficulty of the forecasting task and helps contextualize the performance gains achieved by more sophisticated models. Specifically, for a given forecasting horizon pred_len, the persistent (seasonal naive) baseline(SN) generates the prediction by directly copying the historical load values from pred_len time steps earlier, i.e., the forecasted sequence is identical to the corresponding historical segment immediately preceding the prediction window. This formulation provides a clear and reproducible reference that reflects purely seasonal repetition without any model training.
Long Short-Term Memory (LSTM) networks are classical recurrent architectures that capture temporal dependencies through gated memory units. LSTM serves as a strong traditional baseline for load forecasting due to its ability to model sequential patterns, though its limited capacity to extract long-range dependencies and multi-periodic structures constrains its performance on complex cold-storage signals.
DLinear [29] is a simple yet competitive linear model that decomposes time series into trend and seasonal components via channel-wise linear projections. Its minimal architectural assumptions enable efficient training and strong performance on many benchmark datasets, making it a popular baseline for testing whether complex models truly outperform linear structures.
TimeMixer [28] leverages temporal token mixing and feature mixing to capture cross-channel interactions and temporal patterns in a lightweight architecture. Its design enables efficient modeling of multi-frequency structures, offering a strong baseline for comparing multi-period extraction capabilities.
TimesNet [27] utilizes multi-period based convolutional encoders to capture 2D temporal patterns by transforming 1D signals into structured tensors. Due to its use of period-aware feature extraction, TimesNet is particularly relevant for cold-storage load forecasting, which is dominated by strong periodic behaviors.
Informer [23] employs ProbSparse self-attention and a generative decoder to handle long sequence forecasting efficiently. Its architecture is designed for scalability and large-range dependency modeling, providing an important reference for Transformer-based long-term prediction tasks.
Autoformer [24] integrates an auto-correlation mechanism to explicitly model period-based dependencies and decomposes time series into trend and seasonal components. As one of the first decomposition-enhanced Transformers, Autoformer is a key baseline for evaluating the periodic extraction ability of our proposed CFABlock.
iTransformer [25] introduces an inverted attention mechanism that swaps the roles of the feature and temporal dimensions, enabling cross-variable dependency modeling with improved computational efficiency. Its ability to learn channel-wise correlations makes it suitable for multivariate load forecasting tasks.
PatchTST [26] is a patch-level Transformer model that divides the input sequence into non-overlapping temporal patches and applies self-attention on patch embeddings. This patch representation enhances local pattern extraction and improves generalization on long sequences, making it a strong benchmark for high-frequency operational data.

3.3. Experimental Setup

For fair comparison, all baseline models and MA-CFAN are trained using the same experimental configuration, including identical input length, prediction horizons, data splits, optimizer settings, early stopping criteria, and training epochs. the look-back window is fixed at 96 time steps, corresponding to four days. Forecasting horizons are set to 24, 48, 96, and 192 steps, representing 1-day, 2-day, 4-day, and 1-week forecasts, respectively, thus, covering short-, medium-, and long-term prediction ranges. Each model is trained for 50 epochs with early stopping (patience = 20). The Adam optimizer is used for stable and efficient training, with an initial learning rate of 0.001 that decays exponentially. Traditional TSF models commonly adopt the Mean Squared Error (MSE) loss:
MSE = 1 n i = 1 n y i y ^ i 2 .
However, MSE is insufficient for capturing structural patterns in cold-storage load sequences. To address this limitation, we introduce a hybrid objective combining MSE with a Patch-wise Structural (PS) loss (Equation (21)). this method first perform Fourier-based Adaptive Patching (FAP), where a dominant frequency f yields an initial period p = T / f p . The patch length is then:
P = min p 2 , δ .
On the patched sequences Y p ( i ) and Y ^ p ( i ) , PS loss is defined by three components:
Correlation loss,
L Corr = 1 N i = 0 N 1 1 ρ Y p ( i ) , Y ^ p ( i ) .
variance loss,
L Var = 1 N i = 0 N 1 KL ϕ ( Y p ( i ) ) ϕ ( Y ^ p ( i ) ) ,
and mean alignment.
L Mean = 1 N i = 0 N 1 μ ( i ) μ ^ ( i ) .
A Gradient-based Dynamic Weighting strategy assigns adaptive weights.
α ( t ) = G ( t ) G Corr ( t ) , β ( t ) = G ( t ) G Var ( t ) , γ ( t ) = c v G ( t ) G Mean ( t ) ,
where G ( t ) = W L ( t ) 2 and c , v [ 0 , 1 ] reflect covariance and variance consistency.
The PS loss is then.
L PS = α L Corr + β L Var + γ L Mean ,
and the final training objective becomes.
L = L MSE + λ L PS .
The evaluation metrics for model performance in this study include the Mean Squared Error (MSE). The formulation for MSE is presented in Equation (15). Also included is the Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). The formulation for MAE is presented in Equation (23). Additionally, MAE is calculated in the normalized space, while MAPE is computed after inverse transformation back to the original kilowatt (kW) scale
MAE = 1 n i = 1 n | y i y ^ i | .
The formulation for MAPE is presented in Equation (24).
MAPE = 1 n i = 1 n y i y ^ i y i .

4. Results and Discussion

4.1. Main Results

Using the experimental configuration described above, we conducted extensive empirical evaluations of MA-CFAN against six state-of-the-art time series forecasting models on the cold-storage dataset. As shown in Table 2 and Table 3, MA-CFAN consistently achieved the best performance across all forecasting horizons, demonstrating superior robustness from short-term (24 h) to long-term (192 h) load prediction. Although evaluated on a single facility, MA-CFAN is designed to learn structural temporal dependencies driven by refrigeration mechanics and operational cycles rather than site-specific parameters. This design enables potential transferability to cold storage facilities with different sizes, equipment configurations, and climatic conditions.
From the perspective of forecasting difficulty, the persistent (seasonal naive) baseline provides a meaningful lower-bound reference. Across the majority of evaluation metrics and forecasting horizons, deep learning-based models consistently outperform this naive baseline, indicating that the prediction task cannot be solved by simple historical repetition alone. This performance gap becomes more pronounced as the forecasting horizon increases, reflecting the growing difficulty of capturing long-term temporal dependencies and non-stationary operational patterns in cold storage load data.
Compared with the traditional recurrent model LSTM, MA-CFAN reduces the average MSE by 19.3%, MAE by 14.8%, and MAPE by 23.8%. This improvement can be attributed to LSTM’s gated architecture struggles to capture long-range dependencies and cross-variable interactions inherent in multi-periodic cold-storage load patterns. Among Transformer-based baselines, both Informer and Autoformer perform relatively poorly. Autoformer relies heavily on periodic decomposition and, thus, becomes unstable under multi-period, multi-frequency data; Informer’s ProbSparse attention effectively handles long sequences but tends to lose critical temporal fluctuations due to sparsification.
DLinear, TimeMixer, and TimesNet outperform other baselines but still fall short of MA-CFAN. Compared with the strongest baseline DLinear, MA-CFAN improves average MSE by 4.25%, MAE by 2.8%, and MAPE by 3.6%. DLinear shows stable performance—especially in longer forecasting horizons (48, 96, 192 h), where it achieves 7 out of 9 s-best results—yet its purely linear mapping limits its ability to model the nonlinear multi-frequency dynamics of cold-storage loads. TimesNet exhibits strong short-term prediction capability due to its multi-scale convolutional architecture but deteriorates in long-term forecasting because convolution kernels have inherently restricted receptive fields. TimeMixer performs more consistently, benefiting from its coarse-to-fine multi-scale interaction design, yet still fails to fully uncover intricate temporal dependencies. In contrast, MA-CFAN’s multi-scale representation and multi-period CFA mechanism more effectively capture both global and local temporal structures.
iTransformer and PatchTST form the second-tier group. Both rely on patch-level modeling rather than point-wise attention: iTransformer treats each variable as a token and focuses on cross-variable relations while ignoring within-series temporal structure; PatchTST splits the series via fixed-length patches but applies channel-independent processing, limiting its ability to capture cross-channel interactions. Thanks to channel-mixing and adaptive multi-scale period decomposition, MA-CFAN surpasses both models by a considerable margin:
-
Compared with iTransformer: MSE −9.9%, MAE −6.4%, MAPE −5.2%
-
Compared with PatchTST: MSE −10.6%, MAE −6.4%, MAPE −8.2%
Figure 4 visualizes the prediction curves of MA-CFAN, DLinear, and TimesNet for 24, 48, 96, and 192-step forecasting. MA-CFAN maintains consistently superior alignment with the ground truth across all forecasting horizons. Notably, the real-world dataset includes abrupt logistics activities, compressor switching events, and dynamic control strategy changes. The proposed CFA module enhances robustness under such disturbances by compressing redundant high-frequency activations and selectively emphasizing structurally consistent temporal patterns. As evidenced by the stable performance gains across all horizons, MA-CFAN maintains strong predictive accuracy even under highly stochastic operational conditions. In the short 24-step prediction, Both DLinear and TimesNet show a large degree of offset at the starting posrition of the prediction and fail to capture the sudden change trend, while MA-CFAN we proposed can better grasp this change that is different from the past. In relatively long-term prediction steps, especially 192-step ahead forecast (8 days), TimesNet is able to capture general trends but misses high-frequency oscillations at longer horizons due to its limited receptive field, and DLinear exhibits significant deviations from the actual curve and tends to repeat similar temporal patterns for long horizons in 192-ahead forecast (8 days). MA-CFAN more accurately reproduces both the trend and fluctuation structures, which is crucial for cold-storage scheduling, although minor deviations from raw data remain.
Furthermore, Figure 5 summarizes the average MSE across forecasting windows. As expected, prediction errors increase with horizon length for all models. Nevertheless, MA-CFAN achieves the lowest MSE at every horizon, demonstrating strong robustness and generalization. DLinear remains the closest competitor, yet its error remains 1.1–1.8% higher than MA-CFAN across windows.
The results reported in the main text are obtained using a fixed random seed of 2025. To further assess the robustness of the proposed method with respect to random initialization, MA-CFAN and several representative baseline models are additionally evaluated across multiple random seeds (2021, 2022, 2023, 2024, and 2025). For each selected seed, the MSE and MAE scores are computed, and the mean and standard deviation of the results are summarized in Table 4. Among all compared methods, DLinear exhibits the most stable performance, followed closely by MA-CFAN, while the remaining models show relatively larger variations. Overall, the variances across different random seeds are consistently small, indicating that MA-CFAN demonstrates strong robustness against the choice of random seed.

4.2. Ablation Study

To evaluate the contribution of each architectural component, we conduct a systematic ablation study on MA-CFAN, with DLinear, PatchTST, TimeMixer and TimesNet included as competitive baselines for intuitive comparison. Three variants of MA-CFAN are designed:
  • MA-FAN—Replace the proposed Compress-Fusion Attention (CFA) with the full attention mechanism of the vanilla Transformer.
  • M-MLP—Replace the CFA module with a multilayer perceptron.
  • Patch-CFAN—Replace the multi-scale and multi-period processing with the fixed-length patching strategy of PatchTST.
As shown in Figure 6, all three variants exhibit different degrees of performance degradation, demonstrating the indispensable role of each architectural element.
Among the three variants, MA-FAN replaces the proposed CFA with the full attention mechanism of the vanilla Transformer. The average MSE of MA-FAN increases by 4.8%, and the MAE increases by 3.0% compared with MA-CFAN. This performance drop provides strong evidence that full attention is less effective in modeling the structured seasonal-trend interactions that CFA explicitly compresses and fuses across multiple periods. Full attention processes all pairwise dependencies uniformly, causing it to dilute the periodic and trend-aligned patterns that are crucial for cold storage load forecasting. Even so, MA-FAN still shows performance comparable to DLinear and remains superior to several other baselines, suggesting that the multi-scale and multi-period representation design retains its predictive strength even when CFA is removed.
For the M-MLP variant, replacing CFA with a multilayer perceptron results in a further decline in accuracy. The average MSE of MA-FAN increases by 8.8%, and the MAE increases by 5.1% compared with MA-CFAN. This degradation suggests that simple nonlinear transformations cannot compensate for the loss of the structured feature extraction and cross-period fusion capabilities embedded in CFA. The MLP fails to capture temporal dependencies with explicit periodicity, leading to weaker representations and diminished forecasting accuracy.
Patch-CFAN has the lowest performance drop, which substitutes the adaptive multi-scale and multi-period module with the fixed-length patching strategy of PatchTST, also shows substantial performance degradation. The inferior results indicate that fixed patches cannot effectively align with variable-length seasonal patterns or capture cross-scale temporal dependencies. Cold storage load exhibits multi-periodicity influenced by operational cycles, refrigeration mechanics, and environmental temperature fluctuations, making the patch-based representation insufficient for learning such dynamic structures.
Overall, the consistent performance decreases across all variants provide clear evidence that both the Compress-Fusion Attention module and the multi-scale, multi-period representation design are essential for MA-CFAN’s superior forecasting ability. The ablation results validate that CFA plays a central role in capturing interpretable seasonal-trend interactions, while the multi-scale/multi-period processing framework ensures comprehensive temporal representation. These components jointly enable MA-CFAN to outperform existing baselines and maintain robust predictive performance under complex operating conditions.
To further justify the design choice of multiplicative fusion in the proposed compression-fusion attention module, we conducted a dedicated ablation study comparing it with several commonly used fusion strategies, including additive fusion, MLP-based fusion, and gated fusion. The goal of this experiment was to examine how different interaction mechanisms between trend and seasonal components affect the overall forecasting performance. Due to the FFT-based adaptive period decomposition, the number of extracted periods varied across samples and, thus, the tensor shapes after period segmentation are not fixed in advance.
After the compression-attention operation, the trend component was reduced to a compact 2D representation T R L × d , while the seasonal component was reduced to S R P × d , where L denotes the intra-period length, P denotes the number of detected periods, and d is the feature dimension. After the compression-attention operation, the trend component is reduced to a compact two-dimensional representation T R L × d , while the seasonal component is reduced to S R P × d , where L denotes the intra-period length, P denotes the number of detected periods, and d is the feature dimension. These intermediate representations serve as the inputs to different fusion strategies described below.
For additive fusion, the compressed trend and seasonal representations are first expanded to compatible 3D tensors and then summed element-wise. This strategy assumes equal and independent contributions from trend and seasonal components without explicitly modeling their interactions. Direct interaction via an MLP is not feasible due to the adaptive and sample-dependent tensor shapes produced by the FFT-based period decomposition. Therefore, both the trend and seasonal components are first expanded to 3D tensors, then reshaped into 2D representations consistent with the Multi-Period Reshape operation. The resulting vectors are concatenated and passed through a fully connected MLP layer to model cross-component interactions. The gated fusion strategy follows the same preprocessing steps as the MLP-based fusion. After concatenation, a gating unit is applied to adaptively control the information flow between the trend and seasonal components, allowing the model to selectively emphasize or suppress each component. In contrast, the proposed multiplicative fusion directly models the interaction between trend and seasonal components via element-wise multiplication after compression, without introducing additional parameters or requiring tensor reshaping. This design enables dynamic modulation between the two components while preserving structural simplicity.
The ablation results are summarized in Figure 7. Among all compared fusion strategies, the proposed multiplicative fusion consistently achieves the best forecasting performance across all evaluation metrics. This indicates that explicitly modeling cross-component modulation through multiplicative interaction is more effective than additive or parameter-heavy fusion mechanisms in capturing the complex temporal dynamics of cold storage load data.

5. Conclusions and Outlooks

In this study, we proposed MA-CFAN, a novel neural forecasting framework specifically designed for cold-storage load prediction. To address the inherently complex characteristics of cold-storage load data—including multi-scale structures, multi-periodicity, and rich high-frequency variations—we first constructed multi-resolution representations through hierarchical downsampling. Based on these representations, we introduced an adaptive period extraction mechanism together with a Compression-Fusion Attention (CFA) module, enabling effective modeling of multi-period dependencies while suppressing noise and redundant temporal patterns. To comprehensively evaluate the performance of MA-CFAN, we benchmarked it against a diverse suite of state-of-the-art baselines spanning Transformer-based, CNN-based, MLP-based, and RNN-based forecasting paradigms. Extensive experiments on a real-world cold-storage dataset yield the following conclusions:
  • MA-CFAN demonstrates superior CPF performance across short-, medium-, and long-term forecasting horizons. In short-term forecasting (24 and 48 steps), MA-CFAN substantially reduces both MSE and MAE compared with traditional models such as LSTM. In medium- and long-term forecasting (96 and 192 steps), MA-CFAN further outperforms the second-best baseline with notably lower MSE and MAE, confirming its robustness and stability under longer prediction spans.
  • Ablation studies strongly validate the effectiveness of CFA and the multi-scale multi-period strategy. Removing either component leads to consistent performance drops, highlighting their crucial roles in capturing complex multi-period dependencies and extracting discriminative temporal structures from noisy cold-storage load sequences.
  • MA-CFAN provides a reliable and powerful forecasting framework for cold-storage clusters, offering improved accuracy, enhanced robustness, and stronger adaptability to multi-period, multi-frequency temporal dynamics. These advantages make MA-CFAN well-suited for practical deployment in cold-storage scheduling, energy management, and grid-interactive demand response applications.
Despite the strong performance of MA-CFAN, this study still has several limitations. First, the model is trained and evaluated on a specific cold storage dataset, and its generalizability to other industrial load scenarios remains to be validated. Second, although the proposed CFA effectively captures multi-scale and multi-period patterns, it introduces additional computational overhead compared with purely linear models. Future work will focus on validating MA-CFAN across multiple cold storage facilities and other industrial load types to further assess generalizability. To reduce computational complexity, strategies such as attention pruning, knowledge distillation, and lightweight approximation of CFA will be explored. In addition, interpreting cross-period interactions through attention visualization and frequency-weight analysis may provide actionable insights for operators and energy managers, supporting more transparent and informed decision-making.

Author Contributions

Conceptualization, H.C. and Y.Z.; methodology, H.C. and Y.Z.; software, Y.Z. and J.L.; validation, J.Z.; formal analysis, Y.Z. and J.C.; investigation, J.Z. and J.X.; resources, H.C.; data curation, Y.Z.; writing—original draft preparation, Y.Z.; writing—review and editing, H.C. and Y.Z.; visualization, Y.Z.; supervision, J.C., J.L. and J.X.; project administration, H.C.; funding acquisition, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Guangdong Province Special Fund for Science and Technology (“major special projects + task list”) Project (No. STKJ202209017), Guangdong Science and Technology Plan Projects (No. STKJ2023012), and Guangdong Provincial Key Laboratory of Frontier Mathematics and Large-Scale Model Computing in Higher Education Institutions.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to (they contain commercially sensitive operational information and are subject to confidentiality agreements with the data provider.).

Acknowledgments

We wish to thank “the Key Laboratory of Frontier Mathematics and Large Model Computing, Guangdong Provincial Universities” for its generous support.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Abbreviations

The following abbreviations are used in this manuscript:
MAMulti-Scale and adaptive multi-period
CFACompression-Fusion Attention
CFANCompression-Fusion Attention network
FAFULL attention
FFTFast Fourier Transform
SVMSupport vector machine
LSTMLong Short-Term Memory
GRUGated Recurrent Unit
MLPMultilayer Perceptron
MSEMean Squared Error
MAEMean Absolute Error
MAPEMean Absolute Percentage Error

References

  1. Luo, S.; Hu, W.; Liu, W.; Liu, Z.; Huang, Q.; Chen, Z. Flexibility enhancement measures under the COVID-19 pandemic—A preliminary comparative analysis in Denmark, The Netherlands, and Sichuan of China. Energy 2022, 239, 122166. [Google Scholar] [CrossRef] [PubMed]
  2. Fida, K.; Abbasi, U.; Adnan, M.; Iqbal, S.; Mohamed, S.E.G. A comprehensive survey on load forecasting hybrid models: Navigating the Futuristic demand response patterns through experts and intelligent systems. Results Eng. 2024, 23, 102773. [Google Scholar] [CrossRef]
  3. Uz Zaman, M.; Islam, A.; Sultana, N. Short Term Load Forecasting Based on Internet of Things (IoT). Ph.D. Thesis, BRAC University, Dhaka, Bangladesh, 2018. [Google Scholar]
  4. Han, G.; Hua, Z.; Xi, C.; Ju, Z. Research on Dynamic Energy Consumption of Front Warehouse Assembly Cold Storage and Analysis of Influencing Factors. J. Refrig. 2024, 42, 146–153. [Google Scholar]
  5. Forbicini, F.; Pinciroli Vago, N.O.; Fraternali, P. Time series analysis in compressor-based machines: A survey. Neural Comput. Appl. 2025, 37, 11001–11038. [Google Scholar] [CrossRef]
  6. Tian, S.; Gao, Y.; Shao, S.; Xu, H.; Tian, C. An experimental investigation of the single-sided infiltration through doorways of the cold store. Int. J. Refrig. 2017, 73, 175–182. [Google Scholar] [CrossRef]
  7. Lança, M.; Garcia, J.; Gomes, J. Heat Transfer Mechanisms in Refrigerated Spaces: A Comparative Study of Experiments, CFD Predictions and Heat Load Software Accuracy. Energies 2025, 18, 6280. [Google Scholar] [CrossRef]
  8. Zheng, H.; Yuan, J.; Chen, L. Short-term load forecasting using EMD-LSTM neural networks with a Xgboost algorithm for feature importance evaluation. Energies 2017, 10, 1168. [Google Scholar] [CrossRef]
  9. Nhat, N.N.V.; Huu, D.N.; Hoai, T.N.T. Evaluating the EEMD-LSTM model for short-term forecasting of industrial power load: A case study in Vietnam. Int. J. Renew. Energy Dev. 2023, 12, 881. [Google Scholar] [CrossRef]
  10. Pełka, P.; Dudek, G. Pattern-based long short-term memory for mid-term electrical load forecasting. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
  11. Kim, J.; Kim, H.; Kim, H.; Lee, D.; Yoon, S. A comprehensive survey of time series forecasting: Architectural diversity and open challenges. arXiv 2024, arXiv:2411.05793. [Google Scholar] [CrossRef]
  12. Stathopoulos, C.; Kaperoni, A.; Galanis, G.; Kallos, G. Wind power prediction based on numerical and statistical models. J. Wind Eng. Ind. Aerodyn. 2013, 112, 25–38. [Google Scholar] [CrossRef]
  13. Kochetkova, I.; Kushchazli, A.; Burtseva, S.; Gorshenin, A. Short-term mobile network traffic forecasting using seasonal ARIMA and holt-winters models. Future Internet 2023, 15, 290. [Google Scholar] [CrossRef]
  14. Mohamed, A.O. Modeling and forecasting Somali economic growth using ARIMA models. Forecasting 2022, 4, 1038–1050. [Google Scholar] [CrossRef]
  15. Izadyar, N.; Ghadamian, H.; Ong, H.C.; Tong, C.W.; Shamshirband, S. Appraisal of the support vector machine to forecast residential heating demand for the District Heating System based on the monthly overall natural gas consumption. Energy 2015, 93, 1558–1567. [Google Scholar] [CrossRef]
  16. Valecha, H.; Varma, A.; Khare, I.; Sachdeva, A.; Goyal, M. Prediction of consumer behaviour using random forest algorithm. In Proceedings of the 2018 5th IEEE Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), Gorakhpur, India, 2–4 November 2018; pp. 1–6. [Google Scholar]
  17. Yang, J.; Zhao, C.; Yu, H.; Chen, H. Use GBDT to predict the stock market. Procedia Comput. Sci. 2020, 174, 161–171. [Google Scholar] [CrossRef]
  18. Picozzi, M.; Iaccarino, A.G. Forecasting the preparatory phase of induced earthquakes by recurrent neural network. Forecasting 2021, 3, 17–36. [Google Scholar] [CrossRef]
  19. Vermaak, J.; Botha, E. Recurrent neural networks for short-term load forecasting. IEEE Trans. Power Syst. 2002, 13, 126–132. [Google Scholar] [CrossRef]
  20. Liu, J.; Wang, X.; Zhao, Y.; Dong, B.; Lu, K.; Wang, R. Heating load forecasting for combined heat and power plants via strand-based LSTM. IEEE Access 2020, 8, 33360–33369. [Google Scholar] [CrossRef]
  21. Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; Liu, Y. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 2018, 8, 6085. [Google Scholar] [CrossRef] [PubMed]
  22. Huang, J.; Zhang, X.; Jiang, X. Short-term power load forecasting based on the CEEMDAN-TCN-ESN model. PLoS ONE 2023, 18, e0284604. [Google Scholar] [CrossRef]
  23. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; AAAI: Washington, DC, USA, 2021; Volume 35, pp. 11106–11115. [Google Scholar]
  24. Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
  25. Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. itransformer: Inverted transformers are effective for time series forecasting. arXiv 2023, arXiv:2310.06625. [Google Scholar]
  26. Nie, Y. A Time Series is Worth 64Words: Long-term Forecasting with Transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar]
  27. Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. Timesnet: Temporal 2d-variation modeling for general time series analysis. arXiv 2022, arXiv:2210.02186. [Google Scholar]
  28. Wang, S.; Wu, H.; Shi, X.; Hu, T.; Luo, H.; Ma, L.; Zhang, J.Y.; Zhou, J. Timemixer: Decomposable multiscale mixing for time series forecasting. arXiv 2024, arXiv:2405.14616. [Google Scholar] [CrossRef]
  29. Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington DC, USA, 7–14 February 2023; AAAI: Washington, DC, USA, 2023; Volume 37, pp. 11121–11128. [Google Scholar]
  30. Mozer, M.C. Induction of multiscale temporal structure. Adv. Neural Inf. Process. Syst. 1991, 4. Available online: https://proceedings.neurips.cc/paper_files/paper/1991/file/53fde96fcc4b4ce72d7739202324cd49-Paper.pdf (accessed on 1 December 2025).
Figure 1. The framework of MA-CFAN. Cold storage power signals are first downsampled to multiple granularities to obtain sequences of different temporal resolutions. Each resolution-specific sequence is then processed by a dedicated CFABlock to extract dominant periodicities, underlying trends, and seasonality patterns while effectively suppressing high-frequency noise. Finally, the prediction head fuses these multi-scale representations to produce the future load estimates. The detailed architecture of each module is provided in the following subsections.
Figure 1. The framework of MA-CFAN. Cold storage power signals are first downsampled to multiple granularities to obtain sequences of different temporal resolutions. Each resolution-specific sequence is then processed by a dedicated CFABlock to extract dominant periodicities, underlying trends, and seasonality patterns while effectively suppressing high-frequency noise. Finally, the prediction head fuses these multi-scale representations to produce the future load estimates. The detailed architecture of each module is provided in the following subsections.
Electronics 15 00160 g001
Figure 2. Cold storage load profiles at different temporal scales: (a) 10-min resolution, showing strong high-frequency fluctuations; (b) hourly resolution, where daily trends become more pronounced; (c) weekly resolution, where high-frequency variations largely disappear and long-term periodicity dominates.
Figure 2. Cold storage load profiles at different temporal scales: (a) 10-min resolution, showing strong high-frequency fluctuations; (b) hourly resolution, where daily trends become more pronounced; (c) weekly resolution, where high-frequency variations largely disappear and long-term periodicity dominates.
Electronics 15 00160 g002
Figure 3. The structure of FULL attention.
Figure 3. The structure of FULL attention.
Electronics 15 00160 g003
Figure 4. Prediction cases from Cold datasets by MA-CFAN, DLinear and TimesNet under input-96-predict-24/48/96/192 settings. The blue lines are the ground truths and the orange lines are the model predictions.
Figure 4. Prediction cases from Cold datasets by MA-CFAN, DLinear and TimesNet under input-96-predict-24/48/96/192 settings. The blue lines are the ground truths and the orange lines are the model predictions.
Electronics 15 00160 g004aElectronics 15 00160 g004b
Figure 5. Average MSE of models across different prediction steps.
Figure 5. Average MSE of models across different prediction steps.
Electronics 15 00160 g005
Figure 6. The ablation study results.
Figure 6. The ablation study results.
Electronics 15 00160 g006
Figure 7. Ablation study on different fusion strategies. Add: additive fusion; MLP: concatenation followed by an MLP; Gate: concatenation followed by a gating unit; Mul (Ours): proposed multiplicative fusion.
Figure 7. Ablation study on different fusion strategies. Add: additive fusion; MLP: concatenation followed by an MLP; Gate: concatenation followed by a gating unit; Mul (Ours): proposed multiplicative fusion.
Electronics 15 00160 g007
Table 1. Summary of Dataset Features.
Table 1. Summary of Dataset Features.
Variable No.VariableDescriptionUnit
1cpr1Compressor #1 capacity level state.gear (0–100)
2cpr2Compressor #2 capacity level state.gear (0–100)
3cpr3Compressor #3 capacity level state.gear (0–100)
4cdr_pumpEvaporative cooling water pump state.gear (0–1)
5cdr_fan1Evaporative cooling fan #1 state.gear (0–1)
6cdr_fan2Evaporative cooling fan #2 state.gear (0–1)
7cdr_fan3Evaporative cooling fan #3 state.gear (0–1)
8cp_in_paSuction pressure.MPa
9cp_in_tempSuction temperature. ° C
10cp_ex_paDischarge pressureMPa
11cp_liq_tempLiquid supply temperature. ° C
12cp_ex_tempDischarge temperature. ° C
13env_tempAmbient temperature. ° C
14fre1_fan1Freezer #1 fan #1 state.gear (0–1)
15fre1_fan2Freezer #1 fan #2 state.gear (0–1)
16fre1_fan3Freezer #1 fan #3 state.gear (0–1)
17fre1_fan4Freezer #1 fan #4 state.gear (0–1)
18fre1_fan5Freezer #1 fan #5 state.gear (0–1)
19fre1_fan6Freezer #1 fan #6 state.gear (0–1)
20fre1_temp1Freezer #1 sensor #1 temperature. ° C
21fre1_temp2Freezer #1 sensor #2 temperature. ° C
22fre1_temp3Freezer #1 sensor #3 temperature. ° C
23fre1_temp4Freezer #1 sensor #4 temperature. ° C
24fre1_temp5Freezer #1 sensor #5 temperature. ° C
25fre1_temp6Freezer #1 sensor #6 temperature. ° C
26fre1_temp_avgFreezer #1 average temperature. ° C
27fre1_temp_upFreezer #1 upper temperature setpoint. ° C
28fre1_temp_lowFreezer #1 lower temperature setpoint. ° C
29fre2_fan1Freezer #2 fan #1 state.gear (0–1)
30fre2_fan2Freezer #2 fan #2 state.gear (0–1)
31fre2_fan3Freezer #2 fan #3 state.gear (0–1)
32fre2_fan4Freezer #2 fan #4 state.gear (0–1)
33fre2_door1Freezer #2 door #1 state.gear (0–1)
34fre2_door2Freezer #2 door #2 state.gear (0–1)
35fre2_door3Freezer #2 door #3 state.gear (0–1)
36fre2_temp1Freezer #2 sensor #1 temperature. ° C
37fre2_temp2Freezer #2 sensor #2 temperature. ° C
38fre2_temp3Freezer #2 sensor #3 temperature. ° C
39fre2_temp4Freezer #2 sensor #4 temperature. ° C
40fre2_temp_avgFreezer #2 average temperature. ° C
41fre2_temp_upFreezer #2 upper temperature setpoint. ° C
42fre2_temp_lowFreezer #2 lower temperature setpoint. ° C
43fre3_fan1Freezer #3 fan #1 state.gear (0–1)
44fre3_fan2Freezer #3 fan #2 state.gear (0–1)
45fre3_fan3Freezer #3 fan #3 state.gear (0–1)
46fre3_fan4Freezer #3 fan #4 state.gear (0–1)
47fre3_temp1Freezer #3 sensor #1 temperature. ° C
48fre3_temp2Freezer #3 sensor #2 temperature. ° C
49fre3_temp3Freezer #3 sensor #3 temperature. ° C
50fre3_temp4Freezer #3 sensor #4 temperature. ° C
51fre3_temp_avgFreezer #3 average temperature. ° C
52fre3_temp_upFreezer #3 upper temperature setpoint. ° C
53fre3_temp_lowFreezer #3 lower temperature setpoint. ° C
54pf_fan1&2Platform fans #1-2 state.gear (0–1)
55pf_fan3&4Platform fans #3-4 state.gear (0–1)
56pf_fan5&6Platform fans #5-6 state.gear (0–1)
57PTotal system power (Psum).kW
Table 2. Forecasting result on Cold datasets when all models are trained using the same MSE + PS loss objective (Part I: 24 & 48 steps). The best results are highlighted in bold, while the second-best results are underlined.
Table 2. Forecasting result on Cold datasets when all models are trained using the same MSE + PS loss objective (Part I: 24 & 48 steps). The best results are highlighted in bold, while the second-best results are underlined.
Models96-2496-48
MSE MAE MAPE MSE MAE MAPE
MA-CFAN (ours)0.4490.4870.2620.4800.5060.268
DLinear0.4720.5050.2750.4980.5170.278
LSTM0.5150.5400.3230.5870.5710.329
TimesNet0.4580.4990.2720.5270.5320.281
PatchTST0.4910.5210.2930.5150.5280.287
TimeMixer0.4720.5030.2740.5060.5200.278
iTransformer0.4950.5190.2780.5360.5430.291
Autoformer0.5140.5370.3120.5840.5690.327
Informer0.5190.5430.3280.5910.5730.332
SN0.7570.6320.3350.7900.6440.341
Table 3. Forecasting result on Cold datasets when all models are trained using the same MSE + PS loss objective (Part II: 96 & 192 steps). The best results are highlighted in bold, while the second-best results are underlined.
Table 3. Forecasting result on Cold datasets when all models are trained using the same MSE + PS loss objective (Part II: 96 & 192 steps). The best results are highlighted in bold, while the second-best results are underlined.
Models96-9696-192
MSE MAE MAPE MSE MAE MAPE
MA-CFAN (ours)0.5080.5130.2710.5470.5340.276
DLinear0.5340.5310.2810.5670.5450.281
LSTM0.6120.5990.3570.7450.6840.401
TimesNet0.5430.5410.2820.6260.5650.287
PatchTST0.5550.5450.2980.6570.5870.293
TimeMixer0.5460.5350.2770.6080.5640.278
iTransformer0.5500.5430.2870.6210.5740.279
Autoformer0.5920.5560.3070.6720.5650.312
Informer0.6050.5980.3490.7210.6540.379
SN0.8660.6720.3540.8650.6940.354
Table 4. Forecasting result with different random seeds in MA-CFAN, DLinear, TimesNet, PatchTST and TimesNet when all models are trained using the same MSE + PS loss objective. The best std are highlighted in bold, while the second-best std are underlined.
Table 4. Forecasting result with different random seeds in MA-CFAN, DLinear, TimesNet, PatchTST and TimesNet when all models are trained using the same MSE + PS loss objective. The best std are highlighted in bold, while the second-best std are underlined.
SettingMetricMA-CFAN (Ours)DLinearTimesNetPatchTSTTimeMixer
96-24MSE0.4495 ± 0.00180.4725 ± 0.00130.4582 ± 0.00210.4914 ± 0.00200.4729 ± 0.0022
MAE0.4873 ± 0.00140.5051 ± 0.00110.4995 ± 0.00170.5217 ± 0.00180.5032 ± 0.0021
96-48MSE0.4807 ± 0.00150.4984 ± 0.00120.5278 ± 0.00240.5152 ± 0.00250.5065 ± 0.0021
MAE0.5062 ± 0.00130.5176 ± 0.00140.5323 ± 0.00260.5289 ± 0.00270.5208 ± 0.0023
96-96MSE0.5089 ± 0.00180.5342 ± 0.00160.5431 ± 0.00180.6578 ± 0.00300.5463 ± 0.0019
MAE0.5134 ± 0.00160.5317 ± 0.00150.5416 ± 0.00230.5871 ± 0.00220.5351 ± 0.0020
96-192MSE0.5471 ± 0.00220.5673 ± 0.00170.6264 ± 0.00190.6086 ± 0.00260.5348 ± 0.0018
MAE0.5348 ± 0.00190.5459 ± 0.00140.5657 ± 0.00250.5644 ± 0.00210.5348 ± 0.0023
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cai, H.; Zhang, Y.; Zhang, J.; Chen, J.; Liu, J.; Xu, J. A Novel Multi-Scale and Adaptive Multi-Period Deep Learning with Compression-Fusion Attention for Cold Storage Load Prediction. Electronics 2026, 15, 160. https://doi.org/10.3390/electronics15010160

AMA Style

Cai H, Zhang Y, Zhang J, Chen J, Liu J, Xu J. A Novel Multi-Scale and Adaptive Multi-Period Deep Learning with Compression-Fusion Attention for Cold Storage Load Prediction. Electronics. 2026; 15(1):160. https://doi.org/10.3390/electronics15010160

Chicago/Turabian Style

Cai, Hao, Yi Zhang, Jinhong Zhang, Jie Chen, Jiafu Liu, and Jingxuan Xu. 2026. "A Novel Multi-Scale and Adaptive Multi-Period Deep Learning with Compression-Fusion Attention for Cold Storage Load Prediction" Electronics 15, no. 1: 160. https://doi.org/10.3390/electronics15010160

APA Style

Cai, H., Zhang, Y., Zhang, J., Chen, J., Liu, J., & Xu, J. (2026). A Novel Multi-Scale and Adaptive Multi-Period Deep Learning with Compression-Fusion Attention for Cold Storage Load Prediction. Electronics, 15(1), 160. https://doi.org/10.3390/electronics15010160

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop