A Patch-Based State-Space Hybrid Network for Container Resource Usage Forecasting
Abstract
1. Introduction
- This paper proposes a dual-branch hybrid architecture that integrates Transformer and Mamba, enabling simultaneous modeling of local fine-grained patterns and global long-range trends in container resource time series.
- The design of a Patch-based Encoder for reducing the quadratic complexity of self-attention, along with a Cross-Branch Fusion module that realizes adaptive information interaction between dual branches to balance accuracy and efficiency.
- For the Alibaba Cluster Trace 2018 dataset, this study develops a tailored preprocessing pipeline: it addresses key challenges like irregular sampling intervals and massive data volume, and generates a representative subset to facilitate efficient model prototyping.
- We conduct extensive experiments on the large-scale, real-world Alibaba Cluster Traces 2018 dataset, demonstrating that PSH significantly outperforms existing state-of-the-art forecasting models, with it achieving the lowest CPU MAE of 0.0931 and a near-perfect Memory R2 of 0.9957.
2. Related Work
2.1. Traditional Statistical and Machine Learning Methods
2.2. Single-Branch Deep Learning Methods
2.2.1. RNN-Based Models
2.2.2. Transformer-Based Models
2.3. Hybrid Deep Learning Models
3. Problem Analysis and Formulation
3.1. Problem Analysis
3.2. Problem Formulation
4. Methodology
4.1. Data Preprocessing
4.2. PSH Model
- 1.
- The Local Transformer Path utilizes multi-head self-attention to model short- to mid-range dependencies and capture fine-grained, local patterns within the data.
- 2.
- The Global Mamba Path employs a SSM to efficiently capture long-range dependencies and global trends across the entire sequence. While this specific branch scales linearly, the integration with the Transformer path means the overall network cost is dominated by the quadratic term of the attention mechanism.
4.2.1. Patching Encoder
4.2.2. Local Transformer Path
4.2.3. Global Mamba Path
4.2.4. Cross-Branch Fusion
4.2.5. Multi-Step Head
4.3. Training and Optimization
| Algorithm 1 Patch-based State-space Hybrid Network (PSH). |
| Require: Data , features , targets ; windows ; patch p; epochs E; learning rate ; patience P |
| Ensure: Best checkpoint selected by validation avg MAE |
|
5. Experiments
5.1. Experimental Setup
5.1.1. Datasets
5.1.2. Evaluation Metrics
5.1.3. Configuration
5.2. Comparison Methods
5.2.1. ARIMA
5.2.2. CNN-LSTM
5.2.3. Autoformer
5.2.4. Informer
5.2.5. PatchTST
5.2.6. DLinear
5.2.7. TimeMachine
5.3. Discussion of Prediction Results
5.4. Ablation Studies
5.4.1. Effectiveness of Architectural Components
- PSH: The complete prediction model.
- PSH-T: Remove Mamba branch and fusion module.
- PSH-M: Remove Transformer branch and fusion module.
- PSH-A: Replace cross-attention fusion with addition.
- PSH-G: Replace cross-attention fusion with gating.
5.4.2. Parameter Sensitivity Analysis
5.5. Computational Efficiency Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Dogani, J.; Khunjush, F.; Seydali, M. Host load prediction in cloud computing with discrete wavelet transformation (dwt) and bidirectional gated recurrent unit (bigru) network. Comput. Commun. 2023, 198, 157–174. [Google Scholar] [CrossRef]
- Bi, J.; Li, S.; Yuan, H.; Zhou, M. Integrated deep learning method for workload and resource prediction in cloud systems. Neurocomputing 2021, 424, 35–48. [Google Scholar] [CrossRef]
- Zhong, W.; Zhuang, Y.; Sun, J.; Gu, J. A load prediction model for cloud computing using PSO-based weighted wavelet support vector machine. Appl. Intell. 2018, 48, 4072–4083. [Google Scholar] [CrossRef]
- Qi, S.; Chen, J.; Chen, P.; Wen, P.; Niu, X.; Xu, L. An efficient GAN-based predictive framework for multivariate time series anomaly prediction in cloud data centers. J. Supercomput. 2024, 80, 1268–1293. [Google Scholar] [CrossRef]
- Calheiros, R.N.; Masoumi, E.; Ranjan, R.; Buyya, R. Workload prediction using ARIMA model and its impact on cloud applications’ QoS. IEEE Trans. Cloud Comput. 2014, 3, 449–458. [Google Scholar] [CrossRef]
- Barati, M.; Sharifian, S. A hybrid heuristic-based tuned support vector regression model for cloud load prediction. J. Supercomput. 2015, 71, 4235–4259. [Google Scholar] [CrossRef]
- Wang, L.; Zeng, Y.; Chen, T. Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst. Appl. 2015, 42, 855–863. [Google Scholar] [CrossRef]
- Song, B.; Yu, Y.; Zhou, Y.; Wang, Z.; Du, S. Host load prediction with long short-term memory in cloud computing. J. Supercomput. 2018, 74, 6554–6568. [Google Scholar] [CrossRef]
- Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar]
- Zhang, R.; Chen, J.; Song, Y.; Shan, W.; Chen, P.; Xia, Y. An effective transformation-encoding-attention framework for multivariate time series anomaly detection in iot environment. Mob. Netw. Appl. 2024, 29, 1551–1563. [Google Scholar] [CrossRef]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. In Proceedings of the First Conference on Language Modeling, Philadelphia, PA, USA, 7–9 October 2024. [Google Scholar]
- Alibaba Inc. Alibaba Production Cluster Data v2018. Available online: https://github.com/alibaba/clusterdata/tree/v2018 (accessed on 2 June 2025).
- Amiri, M.; Mohammad-Khanli, L. Survey on prediction models of applications for resources provisioning in cloud. J. Netw. Comput. Appl. 2017, 82, 93–113. [Google Scholar] [CrossRef]
- Lackinger, A.; Morichetta, A.; Dustdar, S. Time series predictions for cloud workloads: A comprehensive evaluation. In Proceedings of the 2024 IEEE International Conference on Service-Oriented System Engineering (SOSE), Shanghai, China, 15–18 July 2024; pp. 36–45. [Google Scholar]
- Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in time series: A survey. arXiv 2022, arXiv:2202.07125. [Google Scholar]
- Nie, Y. A Time Series is Worth 64Words: Long-term Forecasting with Transformers. arXiv 2022, arXiv:2211.14730. [Google Scholar]
- Leka, H.L.; Fengli, Z.; Kenea, A.T.; Tegene, A.T.; Atandoh, P.; Hundera, N.W. A hybrid CNN-LSTM model for virtual machine workload forecasting in cloud data center. In Proceedings of the 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 17–19 December 2021; pp. 474–478. [Google Scholar]
- Yuan, H.; Bi, J.; Zhou, M. Geography-aware task scheduling for profit maximization in distributed green data centers. IEEE Trans. Cloud Comput. 2020, 10, 1864–1874. [Google Scholar] [CrossRef]
- Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
- Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 11121–11128. [Google Scholar]
- Ahamed, M.A.; Cheng, Q. Timemachine: A time series is worth 4 mambas for long-term forecasting. In Proceedings of the 27th European Conference on Artificial Intelligence, Santiago de Compostela, Spain, 19–24 October 2024; Volume 392, pp. 1688–1695. [Google Scholar]



| Parameters | Values | Description |
|---|---|---|
| Patch Size (p) | 12 | Input patch size |
| Hidden Dimension (d) | 640 | Model’s feature dimension |
| FFN Dimension () | 1536 | FFN inner dimension |
| Attention Heads | 10 | Number of attention heads |
| Transformer Layers () | 5 | Local path depth |
| Mamba Layers () | 3 | Global path depth |
| Dropout Rate | 0.12 | Dropout probability |
| Optimizer | AdamW | Optimizer for training |
| Batch Size | 2048 | Batch size for training |
| Learning Rate | Initial learning rate | |
| Weight Decay | L2 regularization | |
| Gradient Clip Norm | 0.1 | Max gradient norm |
| EMA Momentum () | 0.999 | EMA decay factor |
| Random Seed | 42 | Seed for experimental reproducibility |
| Methods | CPU | MEM | Net_in | Net_out | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | |||||
| ARIMA | 0.1641 | 0.3595 | 0.8734 | 0.0154 | 0.0804 | 0.9938 | 0.0036 | 0.0674 | 0.9955 | 0.0035 | 0.0669 | 0.9956 |
| CNN-LSTM | 0.0947 | 0.2227 | 0.9514 | 0.0184 | 0.0760 | 0.9945 | 0.0062 | 0.0662 | 0.9957 | 0.0063 | 0.0661 | 0.9957 |
| Autoformer | 0.0981 | 0.2240 | 0.9509 | 0.0277 | 0.0801 | 0.9939 | 0.0109 | 0.0698 | 0.9952 | 0.0109 | 0.0694 | 0.9952 |
| Informer | 0.0971 | 0.2231 | 0.9512 | 0.0222 | 0.0776 | 0.9942 | 0.0076 | 0.0662 | 0.9957 | 0.0073 | 0.0659 | 0.9957 |
| PatchTST | 0.1310 | 0.2377 | 0.9447 | 0.0209 | 0.1226 | 0.9856 | 0.0107 | 0.1163 | 0.9866 | 0.0120 | 0.1161 | 0.9867 |
| DLinear | 0.1028 | 0.2218 | 0.9518 | 0.0197 | 0.0804 | 0.9938 | 0.0077 | 0.0705 | 0.9951 | 0.0079 | 0.0704 | 0.9951 |
| TimeMachine | 0.0928 | 0.2231 | 0.9513 | 0.0141 | 0.0748 | 0.9946 | 0.0050 | 0.0706 | 0.9956 | 0.0054 | 0.0705 | 0.9956 |
| PSH | 0.0931 | 0.2123 | 0.9525 | 0.0174 | 0.0712 | 0.9957 | 0.0034 | 0.0662 | 0.9957 | 0.0034 | 0.0659 | 0.9957 |
| Methods | CPU | MEM | Net_in | Net_out | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | |||||
| PSH | 0.0931 | 0.2123 | 0.9525 | 0.0174 | 0.0712 | 0.9957 | 0.0034 | 0.0662 | 0.9957 | 0.0034 | 0.0659 | 0.9957 |
| PSH-T | 0.1032 | 0.2431 | 0.9421 | 0.0190 | 0.0836 | 0.9933 | 0.0037 | 0.0674 | 0.9955 | 0.0037 | 0.0672 | 0.9955 |
| PSH-M | 0.1408 | 0.2911 | 0.9170 | 0.0402 | 0.1564 | 0.9766 | 0.0155 | 0.1430 | 0.9798 | 0.0156 | 0.1414 | 0.9802 |
| PSH-A | 0.1025 | 0.2400 | 0.9435 | 0.0185 | 0.0820 | 0.9936 | 0.0036 | 0.0676 | 0.9955 | 0.0036 | 0.0674 | 0.9955 |
| PSH-G | 0.1033 | 0.2422 | 0.9425 | 0.0186 | 0.0824 | 0.9935 | 0.0033 | 0.0673 | 0.9955 | 0.0033 | 0.0673 | 0.9955 |
| Patch Size (p) | CPU | MEM | Net_in | Net_out | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | |||||
| 8 | 0.1012 | 0.2315 | 0.9458 | 0.0192 | 0.0784 | 0.9942 | 0.0037 | 0.0674 | 0.9955 | 0.0037 | 0.0672 | 0.9955 |
| 12 | 0.0931 | 0.2123 | 0.9525 | 0.0174 | 0.0712 | 0.9957 | 0.0034 | 0.0662 | 0.9957 | 0.0034 | 0.0659 | 0.9957 |
| 16 | 0.1154 | 0.2568 | 0.9382 | 0.0225 | 0.0891 | 0.9921 | 0.0041 | 0.0692 | 0.9948 | 0.0040 | 0.0688 | 0.9949 |
| 24 | 0.1421 | 0.2984 | 0.9145 | 0.0312 | 0.1156 | 0.9874 | 0.0052 | 0.0754 | 0.9930 | 0.0051 | 0.0748 | 0.9931 |
| Model | Parameters (M) | Latency (ms/Sample) | Peak Memory (MB) |
|---|---|---|---|
| PSH (Ours) | 28.69 | 4.88 | 120.25 |
| TimeMachine | 0.27 | 1.16 | 10.66 |
| PatchTST | 0.28 | 0.57 | 10.83 |
| DLinear | <0.01 | 0.20 | 8.46 |
| Autoformer | 4.94 | 3.46 | 35.30 |
| Informer | 5.56 | 3.18 | 37.42 |
| CNN-LSTM | 1.16 | 1.96 | 17.07 |
| ARIMA | <0.01 | 4.24 | N/A (CPU) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Song, Z.; Yin, X.; Li, C.; Ba, H.; Li, L. A Patch-Based State-Space Hybrid Network for Container Resource Usage Forecasting. Algorithms 2026, 19, 148. https://doi.org/10.3390/a19020148
Song Z, Yin X, Li C, Ba H, Li L. A Patch-Based State-Space Hybrid Network for Container Resource Usage Forecasting. Algorithms. 2026; 19(2):148. https://doi.org/10.3390/a19020148
Chicago/Turabian StyleSong, Zhilong, Xiangguo Yin, Chencheng Li, He Ba, and Lin Li. 2026. "A Patch-Based State-Space Hybrid Network for Container Resource Usage Forecasting" Algorithms 19, no. 2: 148. https://doi.org/10.3390/a19020148
APA StyleSong, Z., Yin, X., Li, C., Ba, H., & Li, L. (2026). A Patch-Based State-Space Hybrid Network for Container Resource Usage Forecasting. Algorithms, 19(2), 148. https://doi.org/10.3390/a19020148

