PatchConvFormer: A Patch-Based and Convolution-Augmented Transformer for Periodic Metro Energy Consumption Forecasting
Abstract
1. Introduction
- Channel-Independent Modeling: To handle weak correlations among different metro lines, PCformer adopts an independent modeling strategy with parameter sharing, ensuring inter-line consistency while preserving individual line dynamics. This design eliminates non-causal interference and enhances the model’s interpretability.
- Patch-based Modeling Mechanism: A sliding-window convolutional patching operation is introduced at the input layer to capture localized fluctuations within weekly and holiday cycles, effectively balancing short-term variability and long-term trend modeling.
- Multi-scale Convolution-Augmented Self-Attention: In the attention layer, multi-scale one-dimensional convolutions replace linear projections for key and value computations, strengthening local temporal dependency modeling and enabling joint feature extraction across multiple time scales.
2. Related Work
2.1. Patch in Transformer Architecture
2.2. Convolution in Transformer Architecture
2.3. Deep Learning in Energy Consumption Forecasting
3. Methods
3.1. Introduction to Informer
- ProbSparse Self-Attention MechanismThe authors of Informer observe that the probability distribution of self-attention exhibits a long-tail property. Based on this observation, they aim to reduce attention complexity by retaining only the most important Query–Key interactions. Therefore, ProbSparse Self-Attention is proposed, and its core formulation is given as:where Q, K, and V denote the Query, Key, and Value matrices, respectively, and d is the feature dimension. represents a sparse query matrix formed by selecting the Top-u dominant queries according to Informer’s sparsity measurement. This design preserves only the most critical query–key interactions. The value of u is determined jointly by the sampling factor and the input sequence length. More details can be found in the original Informer paper.
- Layer-wise Distilling OperationTo mitigate redundancy in the encoder, Informer introduces a layer-wise distilling mechanism composed of a one-dimensional convolution (Conv1D), an activation function (ELU, Exponential Linear Units), and a max-pooling layer (stride = 2). This process extracts the dominant temporal features while progressively reducing the sequence length. The transformation from the i-th to the -th layer can be formulated as:which effectively halves the computational cost per layer while preserving the essential temporal information. Here, denotes the output feature of the attention block at the j-th encoder layer. The operation compresses the sequence length step by step following the order Conv1D → ELU → MaxPool. Since the MaxPool stride is set to 2, the sequence length is reduced from L to approximately after each distilling step. Equation (2) explains the key mechanism that enables Informer to efficiently handle long sequences. More detailed implementations and derivations can be found in the original Informer paper.
3.2. Channel-Independent Modeling
- Dilution of local features: The mixing process tends to obscure fine-grained temporal variations within individual lines, such as sharp energy peaks during rush hours.
- Increased overfitting risk: The model may capture spurious correlations among lines, thereby reducing its generalization capability.
- Reduced computational efficiency: As the number of lines increases, the input dimensionality grows linearly, while the attention computation complexity increases quadratically, resulting in higher GPU memory usage and longer training time.
3.3. Patch-Based Modeling
3.4. Convolution-Augmented Self-Attention
3.5. Loss Function
4. Experiment
4.1. Implementation Details
4.2. Comparison with the State of the Art
4.3. Ablation Studies
4.4. Transfer Learning
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| PCformer | PatchCovFormer, Patch-based and Convolution-Agumented Transformer |
| MSE | Mean Squared Error |
| MAE | Mean Absolute Error |
| IEA | International Energy Agency |
| ARIMA | Autoregressive Integrated Moving Average model |
| SARIMA | Seasonal Autoregressive Integrated Moving Average model |
| SVR | Support Vector Regression |
| RF | Random Forests |
| CNN | Convolutional Neural Networks |
| RNN | Recurrent Neural Networks |
| LSTM | Long Short-Term Memory |
| GRU | Gated Recurrent Unit |
| FEDformer | Frequency Enhanced Decomposed Transformer |
| ELU | Exponential Linear Units |
| GPU | Graphics Processing Unit |
| DWConv | Depthwise Convolution |
| CUDA | Compute Unified Device Architecture |
| CPU | Central Processing Unit |
References
- Han, D.; Wu, S. The capitalization and urbanization effect of subway stations: A network centrality perspective. Transp. Res. Part A Policy Pract. 2023, 76, 103815. [Google Scholar] [CrossRef]
- Su, W.; Li, X.; Zhang, Y.; Zhang, Q.; Wang, T.; Magdziarczyk, M.; Smolinski, A. High-speed rail, technological improvement, and carbon emission efficiency. Transp. Res. Part D Transp. Environ. 2025, 142, 104685. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhao, N.; Li, M.; Xu, Z.; Wu, D.; Hillmansen, S.; Tsolakis, A.; Blacktop, K.; Roberts, C. A techno-economic analysis of ammonia-fuelled powertrain systems for rail freight. Transp. Res. Part D Transp. Environ. 2023, 119, 103739. [Google Scholar] [CrossRef]
- Feng, Z.; Chen, W.; Liu, Y.; Chen, H.; Skibniewski, M.J. Long-term equilibrium relationship analysis and energy-saving measures of metro energy consumption and its influencing factors based on cointegration theory and an ARDL model. Energy 2023, 263, 125965. [Google Scholar] [CrossRef]
- Zheng, S.; Liu, Y.; Xia, W.; Cai, W.; Liu, H. Energy consumption optimization through prediction models in buildings using deep belief networks and a modified version of big bang-big crunch theory. Build. Environ. 2025, 279, 112973. [Google Scholar] [CrossRef]
- Singh, S.K.; Das, A.K.; Singh, S.R.; Racherla, V. Prediction of rail-wheel contact parameters for a metro coach using machine learning. Expert Syst. Appl. 2023, 215, 119343. [Google Scholar] [CrossRef]
- Domala, V.; Kim, T. Application of Empirical Mode Decomposition and Hodrick Prescot filter for the prediction single step and multistep significant wave height with LSTM. Ocean. Eng. 2023, 285, 115229. [Google Scholar] [CrossRef]
- Cao, W.; Yu, J.; Chao, M.; Wang, J.; Yang, S.; Zhou, M.; Wang, M. Short-term energy consumption prediction method for educational buildings based on model integration. Energy 2023, 283, 128580. [Google Scholar] [CrossRef]
- Li, M.; Zhang, P.; Xing, W.; Zheng, Y.; Zaporojets, K.; Chen, J.; Zhang, R.; Zhang, Y.; Gong, S.; Hu, J.; et al. A Survey of Large Language Models for Data Challenges in Graphs. Expert Syst. Appl. 2025, 225, 129643. [Google Scholar] [CrossRef]
- Zhang, R.; Zou, R.; Zhao, Y.; Zhang, Z.; Chen, J.; Cao, Y.; Hu, C.; Song, H. BA-Net: Bridge Attention in Deep Neural Networks. Expert Syst. Appl. 2025, 292, 128525. [Google Scholar] [CrossRef]
- Zhao, Y.; Chen, J.; Zhang, Z.; Zhang, R. BA-Net: Bridge attention for deep convolutional neural networks. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 297–312. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Lake Tahoe, NV, USA, 3–6 December 2012; Volume 25. [Google Scholar]
- Jordan, M.I. Serial order: A parallel distributed processing approachn. Adv. Psychol. 1997, 121, 471–495. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–6 December 2017; Volume 30. [Google Scholar]
- Kalyan, K.S.; Rajasekharan, A.; Sangeetha, S. Ammus: A survey of transformer-based pretrained models in natural language processing. arXiv 2021, arXiv:2108.05542. [Google Scholar]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM Comput. Surv. (CSUR) 2022, 54, 1–41. [Google Scholar] [CrossRef]
- Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in time series: A survey. arXiv 2022, arXiv:2202.07125. [Google Scholar]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Virtual, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar] [CrossRef]
- Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtually, 6–14 December 2021; Volume 34, pp. 22419–22430. [Google Scholar]
- Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning (ICML), Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
- Xu, J.; Chen, T.; Yuan, J.; Fan, Y.; Li, L.; Gong, X. Ultra-Short-Term wind power prediction based on spatiotemporal contrastive learning. Electronics 2025, 14, 3373. [Google Scholar] [CrossRef]
- Zi, X.; Liu, F.; Liu, M.; Wang, Y. Transformer with Adaptive Sparse Self-Attention for Short-Term Photovoltaic Power Generation Forecasting. Electronics 2025, 14, 3981. [Google Scholar] [CrossRef]
- Pavlatos, C.; Makris, E.; Fotis, G.; Vita, V.; Mladenov, V. Enhancing electrical load prediction using a bidirectional LSTM neural network. Electronics 2023, 12, 4652. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Minneapolis, MN, USA, 2–7 June 2019; Volume 1 (long and short papers), pp. 4171–4186. [Google Scholar] [CrossRef]
- Nie, Y. A Time Series is Worth 64Words: Long-term Forecasting with Transformers. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Tang, P.; Zhang, W. Unlocking the Power of Patch: Patch-Based MLP for Long-Term Time Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 12640–12648. [Google Scholar] [CrossRef]
- Dosovitskiy, A. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR), Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Virtual Event, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Beyer, L.; Izmailov, P.; Kolesnikov, A.; Caron, M.; Kornblith, S.; Zhai, X.; Minderer, M.; Tschannen, M.; Alabdulmohsin, I.; Pavetic, F. Flexivit: One model for all patch sizes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 14496–14506. [Google Scholar]
- Ronen, T.; Levy, O.; Golbert, A. Vision transformers with mixed-resolution tokenization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 4613–4622. [Google Scholar]
- Chen, M.; Lin, M.; Li, K.; Shen, Y.; Wu, Y.; Chao, F.; Ji, R. Cf-vit: A general coarse-to-fine method for vision transformer. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 7042–7052. [Google Scholar] [CrossRef]
- Wang, Y.; Du, B.; Wang, W.; Xu, C. Multi-tailed vision transformer for efficient inference. Neural Netw. 2024, 174, 106235. [Google Scholar] [CrossRef]
- Hu, Y.; Cheng, Y.; Lu, A.; Cao, Z.; Wei, D.; Liu, J.; Li, Z. LF-ViT: Reducing spatial redundancy in vision transformer for efficient image recognition. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 2274–2284. [Google Scholar] [CrossRef]
- Wu, H.; Xiao, B.; Codella, N.; Liu, M.; Dai, X.; Yuan, L.; Zhang, L. Cvt: Introducing convolutions to vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Virtual Event, 11–17 October 2021; pp. 22–31. [Google Scholar]
- Yu, W.; Luo, M.; Zhou, P.; Si, C.; Zhou, Y.; Wang, X.; Feng, J.; Yan, S. Metaformer is actually what you need for vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10819–10829. [Google Scholar]
- Yuan, K.; Guo, S.; Liu, Z.; Zhou, A.; Yu, F.; Wu, W. Incorporating convolution designs into visual transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Virtual Event, 11–17 October 2021; pp. 579–588. [Google Scholar]
- Yuan, Y.; Fu, R.; Huang, L.; Lin, W.; Zhang, C.; Chen, X.; Wang, J. Hrformer: High-resolution transformer for dense prediction. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Online, 6–14 December 2021; Volume 34, pp. 7281–7293. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Dai, Z.; Liu, H.; Le, Q.V.; Tan, M. Coatnet: Marrying convolution and attention for all data sizes. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Online, 6–14 December 2021; Volume 34, pp. 3965–3977. [Google Scholar]
- Li, K.; Wang, Y.; Zhang, J.; Gao, P.; Song, G.; Liu, Y.; Li, H.; Qiao, Y. Uniformer: Unifying convolution and self-attention for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 12581–12600. [Google Scholar] [CrossRef] [PubMed]
- Yao, Z.; Liu, X. A cnn-transformer deep learning model for real-time sleep stage classification in an energy-constrained wireless device. In Proceedings of the 2023 11th International IEEE/EMBS Conference on Neural Engineering (NER), Baltimore, MD, USA, 25–27 April 2023; pp. 1–4. [Google Scholar] [CrossRef]
- Liu, M.; Zeng, A.; Chen, M.; Xu, Z.; Lai, Q.; Ma, L.; Xu, Q. Scinet: Time series modeling and forecasting with sample convolution and interaction. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022; Volume 35, pp. 5816–5828. [Google Scholar]
- Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. Timesnet: Temporal 2d-variation modeling for general time series analysis. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Tian, Z.; Liu, W.; Jiang, W.; Wu, C. Cnns-transformer based day-ahead probabilistic load forecasting for weekends with limited data availability. Energy 2024, 293, 130666. [Google Scholar] [CrossRef]
- Zhang, E.; Yuan, W.; Liu, X. ChannelMixer: A Hybrid CNN-Transformer Framework for Enhanced Multivariate Long-Term Time Series Forecasting. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; pp. 1–5. [Google Scholar] [CrossRef]
- Zheng, P.; Zhou, H.; Liu, J.; Nakanishi, Y. Interpretable building energy consumption forecasting using spectral clustering algorithm and temporal fusion transformers architecture. Appl. Energy 2023, 349, 121607. [Google Scholar] [CrossRef]
- Liu, J.; Yang, F.; Yan, K.; Jiang, L. Household energy consumption forecasting based on adaptive signal decomposition enhanced iTransformer network. Energy Build. 2024, 324, 114894. [Google Scholar] [CrossRef]
- Wang, C.F.; Liu, K.X.; Peng, J.; Li, X.; Liu, X.F.; Zhang, J.W.; Niu, Z.B. High-precision energy consumption forecasting for large office building using a signal decomposition-based deep learning approach. Energy 2025, 314, 133964. [Google Scholar] [CrossRef]
- Sreekumar, G.; Martin, J.P.; Raghavan, S.; Joseph, C.T.; Raja, S.P. Transformer-based forecasting for sustainable energy consumption toward improving socioeconomic living: AI-enabled energy consumption forecasting. IEEE Syst. Man Cybern. Mag. 2024, 10, 52–60. [Google Scholar] [CrossRef]
- Xi, Y.; Gan, X.; Zhan, Z.; Deng, K. Energy Data Forecasting Based on the STRLM Time Series Prediction Model. In Proceedings of the 2025 10th International Conference on Electronic Technology and Information Science (ICETIS), Hangzhou, China, 27–29 June 2025; pp. 489–494. [Google Scholar] [CrossRef]
- Rahn, K.; Bode, C.; Albrecht, T. Energy-efficient driving in the context of a communications-based train control system (CBTC). In Proceedings of the 2013 IEEE International Conference on Intelligent Rail Transportation Proceedings, Beijing, China, 30 August–1 September 2013; pp. 19–24. [Google Scholar] [CrossRef]
- Sanchis, I.V.; Zuriaga, P.S. An energy-efficient metro speed profiles for energy savings: Application to the Valencia metro. Transp. Res. Procedia 2016, 18, 226–233. [Google Scholar] [CrossRef]
- Peng, J.; Kimmig, A.; Wang, D.; Niu, Z.; Liu, X.; Tao, X.; Ovtcharova, J. Energy consumption forecasting based on spatio-temporal behavioral analysis for demand-side management. Appl. Energy 2024, 374, 124027. [Google Scholar] [CrossRef]
- Moon, J. A multi-step-ahead photovoltaic power forecasting approach using one-dimensional convolutional neural networks and transformer. Electronics 2024, 13, 2007. [Google Scholar] [CrossRef]
- Fu, H.; Zhang, J.; Xie, S. A novel improved variational mode decomposition-temporal convolutional network-gated recurrent unit with multi-head attention mechanism for enhanced photovoltaic power forecasting. Electronics 2024, 13, 1837. [Google Scholar] [CrossRef]
- Wang, H.; Guo, M.; Tian, L. A deep learning model with signal decomposition and informer network for equipment vibration trend prediction. Sensors 2023, 23, 5819. [Google Scholar] [CrossRef]
- He, K.; Yang, Q.; Ji, L.; Pan, J.; Zou, Y. Financial time series forecasting with the deep learning ensemble model. Mathematics 2023, 11, 1054. [Google Scholar] [CrossRef]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. In Proceedings of the First Conference on Language Modeling, Philadelphia, PA, USA, 7–9 October 2024. [Google Scholar]
- Mohammadi, M.; Jamshidi, S.; Rezvanian, A.; Gheisari, M.; Kumar, A. Advanced fusion of MTM-LSTM and MLP models for time series forecasting: An application for forecasting the solar radiation. Meas. Sens. 2024, 33, 101179. [Google Scholar] [CrossRef]








| Method | Backbone | Channel Modeling | Patching Modeling | Convolution Multi-Scale |
|---|---|---|---|---|
| PatchTST [27] | Transformer | Independent | ✔ | × |
| PatchMLP [28] | Transformer | Mixed | ✔ | × |
| Informer [20] | Transformer | Mixed | × | × |
| TimesNet [45] | CNN-based | Mixed | × | ✔ |
| SCINet [44] | CNN | Mixed | × | ✔ |
| DECPE-TFT [49] | Transformer | Mixed | × | × |
| ASSA-iTransformer [50] | Transformer | Mixed | × | × |
| SPAformer [51] | Transformer | Mixed | ✔ | × |
| Rahn et al. [54] | Regression models | - | × | × |
| Sanchis et al. [55] | Regression models | - | × | × |
| PCformer (Ours) | Transformer | Independent (param. sharing) | ✔ | ✔ |
| Models | PCformer | MTM-LSTM | Mamba | ARMA-CNNLSTM | Informer | Transformer | BiGRU | GRU | BiLSTM | LSTM |
|---|---|---|---|---|---|---|---|---|---|---|
| (Ours) | (2024) | (2024) | (2023) | (2021) | (2017) | (2014) | (2014) | (1997) | (1997) | |
| Metric | MSE MAE | MSE MAE | MSE MAE | MSE MAE | MSE MAE | MSE MAE | MSE MAE | MSE MAE | MSE MAE | MSE MAE |
| Line 1 | 0.017 0.096 | 0.055 0.186 | 0.031 0.167 | 0.041 0.174 | 0.028 0.131 | 0.044 0.246 | 0.051 0.157 | 0.082 0.297 | 0.070 0.271 | 0.089 0.337 |
| Line 2 | 0.015 0.112 | 0.058 0.195 | 0.032 0.174 | 0.043 0.182 | 0.037 0.142 | 0.044 0.235 | 0.054 0.164 | 0.086 0.310 | 0.073 0.283 | 0.093 0.352 |
| Line 3 | 0.010 0.091 | 0.029 0.164 | 0.016 0.147 | 0.021 0.153 | 0.011 0.077 | 0.027 0.254 | 0.027 0.139 | 0.043 0.262 | 0.036 0.239 | 0.046 0.297 |
| Line 4 | 0.058 0.195 | 0.190 0.257 | 0.107 0.230 | 0.141 0.240 | 0.120 0.262 | 0.124 0.152 | 0.177 0.217 | 0.284 0.410 | 0.242 0.374 | 0.308 0.465 |
| Line 5 | 0.041 0.223 | 0.089 0.232 | 0.050 0.207 | 0.066 0.216 | 0.041 0.160 | 0.046 0.230 | 0.082 0.195 | 0.132 0.369 | 0.113 0.337 | 0.143 0.419 |
| Line 6 | 0.060 0.223 | 0.146 0.241 | 0.082 0.216 | 0.108 0.225 | 0.053 0.202 | 0.115 0.134 | 0.136 0.203 | 0.219 0.384 | 0.186 0.350 | 0.237 0.436 |
| Line 7 | 0.129 0.201 | 0.307 0.274 | 0.173 0.245 | 0.228 0.255 | 0.165 0.227 | 0.157 0.231 | 0.285 0.231 | 0.459 0.436 | 0.391 0.398 | 0.497 0.495 |
| Line 8 | 0.113 0.272 | 0.525 0.516 | 0.296 0.462 | 0.390 0.481 | 0.313 0.437 | 0.485 0.584 | 0.488 0.436 | 0.786 0.822 | 0.668 0.751 | 0.850 0.934 |
| Line 9 | 0.006 0.067 | 0.026 0.118 | 0.015 0.106 | 0.019 0.110 | 0.013 0.093 | 0.028 0.136 | 0.024 0.100 | 0.039 0.189 | 0.033 0.172 | 0.042 0.214 |
| Line 10 | 0.013 0.077 | 0.035 0.133 | 0.020 0.119 | 0.026 0.124 | 0.019 0.123 | 0.020 0.128 | 0.032 0.112 | 0.052 0.212 | 0.044 0.193 | 0.056 0.241 |
| Line 11 | 0.057 0.183 | 0.141 0.249 | 0.079 0.222 | 0.104 0.232 | 0.076 0.199 | 0.077 0.218 | 0.131 0.210 | 0.210 0.396 | 0.179 0.362 | 0.228 0.450 |
| Line 12 | 0.045 0.140 | 0.118 0.197 | 0.067 0.176 | 0.088 0.183 | 0.061 0.162 | 0.076 0.174 | 0.110 0.166 | 0.177 0.313 | 0.151 0.286 | 0.192 0.356 |
| Line 13 | 0.007 0.065 | 0.021 0.090 | 0.012 0.081 | 0.016 0.084 | 0.012 0.073 | 0.016 0.081 | 0.020 0.076 | 0.032 0.143 | 0.027 0.131 | 0.034 0.163 |
| Line 14 | 0.009 0.062 | 0.028 0.098 | 0.016 0.088 | 0.021 0.091 | 0.012 0.062 | 0.026 0.120 | 0.026 0.083 | 0.042 0.156 | 0.036 0.142 | 0.045 0.177 |
| Line 15 | 0.094 0.267 | 0.294 0.416 | 0.165 0.372 | 0.218 0.388 | 0.205 0.407 | 0.156 0.336 | 0.273 0.351 | 0.439 0.662 | 0.373 0.605 | 0.475 0.752 |
| Line 16 | 0.016 0.101 | 0.039 0.144 | 0.022 0.129 | 0.029 0.134 | 0.024 0.133 | 0.016 0.113 | 0.036 0.121 | 0.058 0.229 | 0.049 0.209 | 0.063 0.260 |
| Avg. | 0.043 0.145 | 0.131 0.219 | 0.074 0.196 | 0.097 0.204 | 0.074 0.181 | 0.104 0.214 | 0.122 0.185 | 0.196 0.349 | 0.167 0.319 | 0.212 0.397 |
| Std. | 0.039 0.070 | 0.135 0.110 | 0.076 0.098 | 0.100 0.102 | 0.083 0.106 | 0.112 0.117 | 0.126 0.093 | 0.202 0.175 | 0.172 0.160 | 0.219 0.199 |
| Prediction Length | 14 | 28 | 42 | 56 |
|---|---|---|---|---|
| MSE MAE | MSE MAE | MSE MAE | MSE MAE | |
| Transformer | 0.0782 0.1703 | 0.1042 0.2142 | 0.1588 0.2517 | 0.1674 0.2991 |
| Informer | 0.0622 0.1592 | 0.0743 0.1806 | 0.1446 0.2425 | 0.1527 0.2517 |
| Mamba | 0.0409 0.1336 | 0.0738 0.1962 | 0.1227 0.2238 | 0.1741 0.3122 |
| MTM-LSTM | 0.0961 0.1912 | 0.1312 0.2193 | 0.1293 0.2342 | 0.1634 0.2548 |
| PCformer (Ours) | 0.0335 0.1168 | 0.0430 0.1452 | 0.0955 0.2021 | 0.1308 0.2223 |
| Patch Length (P) | Stride (S) | Patched Length () | MSE | MAE |
|---|---|---|---|---|
| 3 | 1 | 26 | 0.1441 | 0.2374 |
| 5 | 1 | 24 | 0.0821 | 0.1951 |
| 7 | 1 | 22 | 0.0430 | 0.1452 |
| 7 | 3 | 8 | 0.1456 | 0.2410 |
| 7 | 5 | 5 | 0.2574 | 0.3286 |
| 7 | 7 | 4 | 0.2954 | 0.3459 |
| Method | PCformer (Ours) | Mamba | Informer | Transformer |
|---|---|---|---|---|
| Metric | MSE MAE | MSE MAE | MSE MAE | MSE MAE |
| Line 5 | 0.0192 0.1165 | 0.0459 0.1628 | 0.0777 0.2293 | 0.0871 0.2356 |
| Line 9 | 0.0409 0.1884 | 0.0523 0.1685 | 0.0729 0.2160 | 0.0963 0.2383 |
| Line 12 | 0.3094 0.3637 | 0.3761 0.4268 | 0.2563 0.3435 | 0.3677 0.4147 |
| Line 15 | 0.0309 0.1451 | 0.0677 0.2185 | 0.0589 0.2029 | 0.0754 0.2252 |
| Avg. | 0.1001 0.2034 | 0.1355 0.2433 | 0.1165 0.2479 | 0.1566 0.2785 |
| Std. | 0.1211 0.0960 | 0.1391 0.1077 | 0.0810 0.0560 | 0.1221 0.0788 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Long, L.; Li, L.; Zhang, L.; Fu, Q.; Zou, R.; Feng, F.; Zhang, R. PatchConvFormer: A Patch-Based and Convolution-Augmented Transformer for Periodic Metro Energy Consumption Forecasting. Electronics 2026, 15, 178. https://doi.org/10.3390/electronics15010178
Long L, Li L, Zhang L, Fu Q, Zou R, Feng F, Zhang R. PatchConvFormer: A Patch-Based and Convolution-Augmented Transformer for Periodic Metro Energy Consumption Forecasting. Electronics. 2026; 15(1):178. https://doi.org/10.3390/electronics15010178
Chicago/Turabian StyleLong, Liheng, Linlin Li, Lijie Zhang, Qing Fu, Runzong Zou, Fan Feng, and Ronghui Zhang. 2026. "PatchConvFormer: A Patch-Based and Convolution-Augmented Transformer for Periodic Metro Energy Consumption Forecasting" Electronics 15, no. 1: 178. https://doi.org/10.3390/electronics15010178
APA StyleLong, L., Li, L., Zhang, L., Fu, Q., Zou, R., Feng, F., & Zhang, R. (2026). PatchConvFormer: A Patch-Based and Convolution-Augmented Transformer for Periodic Metro Energy Consumption Forecasting. Electronics, 15(1), 178. https://doi.org/10.3390/electronics15010178

