KAN+Transformer: An Explainable and Efficient Approach for Electric Load Forecasting
Abstract
1. Introduction
- (1)
- Adaptive Feature Aggregation Mechanism for Multi-KAN Collaboration: We pioneer a novel approach to aggregate features learned by multiple KANs, where aggregation weights are dynamically computed through a dedicated strategy. This design deviates from conventional static fusion methods, enabling the model to adaptively emphasize critical feature patterns based on data characteristics, thereby enhancing the representativeness of integrated features.
- (2)
- MoK-based Pre-Transformer Feature Enhancement: A mixture of KAN experts (MoK) layer is introduced before the standard Transformer architecture as a nonlinear feature enhancement layer. The MoK module adaptively aggregates the outputs of multiple KAN experts through a gating mechanism, improving input representations prior to the Transformer encoder and enhancing the model’s ability to capture nonlinear patterns in time series data.
- (3)
- Novel Multi-Objective Loss with Load Balancing and Coefficient of Variation: We introduce a specialized loss function optimized during training, which incorporates a unique load balancing loss. This loss is calculated using the standard deviation (σ) and mean (μ) of the data to force more balanced feature contributions across KAN experts. Moreover, we innovatively integrate the coefficient of variation, which enables robust comparison of relative variability across datasets with significant mean differences—an advancement that addresses the limitations of traditional metrics in cross-dataset optimization.
- (4)
- Empirical Validation of Superiority: Comprehensive experiments on four benchmark datasets (ETTh1, ETTh2, ETTm1, ETTm2) demonstrate that our model consistently outperforms state-of-the-art approaches. These results not only validate the effectiveness of our innovative designs but also confirm the framework’s practical applicability in real-world power load forecasting, underscoring the transformative impact of our contributions.
2. Related Work
2.1. Time-Series Forecasting Models
2.2. Transformer + KAN
3. Method
3.1. Overview
3.2. Kolmogorov–Arnold Network
3.3. Transformer Block
3.4. Mixture of KAN Experts Layer
3.5. Loss Function
4. Experiment
4.1. Datasets
4.2. Experimental Implementation and Details
4.2.1. Data Preprocessing
4.2.2. Model Configuration
4.2.3. Training and Evaluation
4.3. Comparison with State-of-the-Art Methods
4.4. Ablation Study
4.5. Computational Efficiency
4.6. Visual Analysis
4.6.1. Visual Result of ETTh1 Dataset
4.6.2. Visual Result of ETTh2 Dataset
4.6.3. Visual Result of ETTm1 Dataset
4.6.4. Visual Result of ETTm2 Dataset
4.6.5. Interpretability Experiment
5. Discussion
- (1)
- Data Scope and Generalizability: The current evaluation primarily relies on the ETT benchmark datasets to validate the effectiveness of the proposed framework in electrical load forecasting. These datasets consist of endogenous load and temperature variables. We have verified that eliminating informative load-related variables leads to a noticeable degradation in forecasting accuracy, particularly for long-term horizons. This finding indicates that the proposed model relies on multi-variable interactions to capture complex load dynamics, as the joint modeling of load and temperature signals provides essential contextual information for both short- and long-range predictions. However, since the ETT datasets do not include exogenous covariates such as calendar information or complex meteorological conditions, the model may exhibit reduced performance when applied to scenarios that are highly sensitive to external dynamic drivers, such as regional residential load forecasting under extreme weather events. To enhance the model’s applicability in broader real-world settings, future work will focus on integrating multi-source heterogeneous data, including meteorological, calendar, and demographic information, and on incorporating domain adaptation techniques. These extensions are expected to improve robustness across varying scenarios and mitigate performance degradation caused by missing endogenous variables or insufficient exogenous information.
- (2)
- Algorithmic Optimization and Robustness: Regarding algorithmic design, while the KAN layer offers superior nonlinear approximation, it relies on B-spline basis functions that can be sensitive to hyperparameter settings (e.g., grid size). Furthermore, the Transformer encoder, despite its global modeling strengths, suffers from quadratic computational complexity with respect to the historical look-back window size. This characteristic poses efficiency challenges when attempting to extend the input sequence length to capture broader temporal contexts. To overcome these constraints, future research will explore replacing the Transformer backbone with linear-complexity architectures like Mamba and incorporating online learning strategies. This integration will enable the model to dynamically adapt to evolving load patterns and mitigate the sensitivity of spline parameters without the need for frequent full retraining.
- (3)
- Challenges in Peak Forecasting: As observed in the visualization results (Figure 2, Figure 3, Figure 4 and Figure 5), the proposed model occasionally underestimates extreme peak loads. This behavior is likely related to the design of the current multi-objective loss function, which combines Mean Squared Error (MSE), Periodicity Loss, and Trend Loss. While MSE primarily emphasizes overall reconstruction accuracy, and the periodicity- and trend-aware terms help the model better capture regular cyclic patterns and long-term temporal evolutions, these objectives are not explicitly tailored to emphasize rare and abrupt extreme events. In particular, the periodicity and trend constraints encourage the learning of smooth and regular temporal structures, which may cause sharp, short-lived peaks to be treated as less dominant signals during optimization. Moreover, in the absence of a dedicated loss component targeting extreme values, the model may lack sufficient incentive to strongly penalize large deviations associated with sudden demand surges. From a practical deployment perspective, this limitation suggests potential directions for future improvement, such as incorporating peak-sensitive weighting schemes or integrating loss formulations inspired by focal loss to place greater emphasis on hard-to-predict extreme load events.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| ETTh1 | Electricity Transformer Temperature from transformer 1, hourly data. |
| ETTh2 | Electricity Transformer Temperature from transformer 2, hourly data. |
| ETTm1 | Electricity Transformer Temperature from transformer 1, 15 min data. |
| ETTm2 | Electricity Transformer Temperature from transformer 2, 15 min data. |
References
- Yang, Z.; Zhu, R.; Liao, W. Minkowski distance based pilot protection for tie lines between offshore wind farms and MMC. IEEE Trans. Ind. Inform. 2024, 20, 8441–8452. [Google Scholar] [CrossRef]
- Cheng, L.; Zang, H.; Xu, Y.; Wei, Z.; Sun, G. Probabilistic residential load forecasting based on micrometeorological data and customer consumption pattern. IEEE Trans. Power Syst. 2021, 36, 3762–3775. [Google Scholar] [CrossRef]
- Ji, Y.; Buechler, E.; Rajagopal, R. Data-driven load modeling and forecasting of residential appliances. IEEE Trans. Smart Grid 2020, 11, 2652–2661. [Google Scholar] [CrossRef]
- Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-term residential load forecasting based on LSTM recurrent neural network. IEEE Trans. Smart Grid 2019, 10, 841–8451. [Google Scholar] [CrossRef]
- Panda, S.K.; Ray, P.; Salkuti, S.R. A review on short-term load forecasting using different techniques. In Recent Advances in Power Systems; Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2022; pp. 433–454. [Google Scholar] [CrossRef]
- Sadaei, H.J.; Enayatifar, R.; Abdullah, A.H.; Gani, A. Short-term load forecasting using a hybrid model with a refined exponentially weighted fuzzy time series and an improved harmony search. Int. J. Electr. Power Energy Syst. 2014, 62, 118–129. [Google Scholar] [CrossRef]
- Ruan, L.; Rubio, L.; Velasquez, C.E. Sensitivity analysis for forecasting Brazilian electricity demand using artificial neural networks and hybrid models based on Autoregressive Integrated Moving Average. Energy 2023, 274, 127365. [Google Scholar] [CrossRef]
- Hong, T.; Wang, P.; Willis, H. A naïve multiple linear regression benchmark for short term load forecasting. In Proceedings of the IEEE Power & Energy Society General Meeting, Detroit, MI, USA, 24–28 July 2011. [Google Scholar] [CrossRef]
- Venkataramana Veeramsetty, D.; Chandra, R.; Salkuti, S.R. Short term active power load forecasting using machine learning with feature selection. In Next Generation Smart Grids: Modeling, Control and Optimization; Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2022; pp. 103–124. [Google Scholar] [CrossRef]
- Salkuti, S.R. Short-term electrical load forecasting using hybrid ANN–DE and wavelet transforms approach. Electr. Eng. 2018, 100, 2755–2763. [Google Scholar] [CrossRef]
- Shi, H.; Xu, M.; Li, R. Deep learning for household load forecasting—A novel pooling deep RNN. IEEE Trans. Smart Grid 2018, 9, 5271–5280. [Google Scholar] [CrossRef]
- Munkhammar, J.; Widén, J. Very short term load forecasting of residential electricity consumption using the Markov-chain mixture distribution (MCM) model. Appl. Energy 2021, 282, 116180. [Google Scholar] [CrossRef]
- Afrasiabi, M.; Mohammadi, M.; Rastegar, M.; Stankovic, L.; Afrasiabi, S.; Khazaei, M. Deep-based conditional probability density function forecasting of residential loads. IEEE Trans. Smart Grid 2020, 11, 3646–3657. [Google Scholar] [CrossRef]
- Jiang, Y.; Gao, T.; Dai, Y.; Si, R.; Hao, J.; Zhang, J.; Gao, D.W. Very short-term residential load forecasting based on deep-autoformer. Appl. Energy 2022, 328, 120120. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhao, P.; Wang, P.; Lee, W.-J. Transfer learning featured short-term combining forecasting model for residential loads with small sample sets. IEEE Trans. Ind. Appl. 2022, 58, 4279–4288. [Google Scholar] [CrossRef]
- Neufeld, A.; Schmocker, P. Universal approximation results for neural networks with non-polynomial activation function over non-compact domains. Anal. Appl. 2025. Available online: https://www.worldscientific.com/doi/10.1142/S0219530525500423 (accessed on 19 August 2025). [CrossRef]
- Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. KAN: Kolmogorov–Arnold Networks. In Proceedings of the International Conference on Representation Learning, Singapore, 24–28 April 2025; pp. 70367–70413. [Google Scholar]
- Li, S.; Jin, X.; Xuan, Y.; Zhou, X.; Chen, W.; Wang, Y.-X.; Yan, X. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. In Proceedings of the 33rd International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2019; Article 471; pp. 5243–5253. [Google Scholar]
- Shukla, M. Interpreting time series forecasts with LIME and SHAP: A case study on the air passengers dataset. arXiv 2025, arXiv:2508.12253. [Google Scholar] [CrossRef]
- Liu, S.; Zhang, L.; Chen, W. TIMING: Temporality-aware integrated gradients for time series explanation. In Proceedings of the 42nd International Conference on Machine Learning, Vancouver, BC, Canada, 13–19 July 2025. [Google Scholar]
- Sidharth, S.S. Chebyshev polynomial-based kolmogorov-arnold networks: An efficient architecture for nonlinear function approximation. arXiv 2024, arXiv:2405.07200. [Google Scholar]
- Aghaei, A.A. fkan: Fractional kolmogorov-arnold networks with trainable jacobi basis functions. arXiv 2024, arXiv:2406.07456. [Google Scholar]
- Bozorgasl, Z.; Chen, H. Wav-kan: Wavelet kolmogorov-arnold networks. arXiv 2024, arXiv:2405.12832. [Google Scholar] [CrossRef]
- Yu, S.; Chen, Z.; Yang, Z.; Gu, J.; Feng, B.; Sun, Q. Exploring Kolmogorov-Arnold networks for realistic image sharpness assessment. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Hyderabad, India, 6–11 April 2025; Available online: https://ieeexplore.ieee.org/abstract/document/10890447 (accessed on 7 March 2025).
- Huang, S.; Zhao, Z.; Li, C.; Bai, L. Timekan: Kan-Based Frequency Decomposition Learning Architecture for Long-Term Time Series Forecasting. In Proceedings of the 13th International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
- Zeng, A.; Chen, M.; Zhang, L.; Xu, Q. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 11121–11128. [Google Scholar] [CrossRef]
- Dai, Y.H.; Liao, L.Z. R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 2002, 22, 1–10. [Google Scholar] [CrossRef]
- Lai, G.; Chang, W.-C.; Yang, Y.; Liu, H. Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2227–2240. [Google Scholar] [CrossRef]
- Jia, Y.; Lin, Y.; Hao, X.; Lin, Y.; Guo, S.; Wan, H. Witran: Water-wave information transmission and recurrent acceleration network for long-range time series forecasting. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2023; Article 544; pp. 12389–12456. Available online: https://proceedings.neurips.cc/paper_files/paper/2023/hash/2938ad0434a6506b125d8adaff084a4a-Abstract-Conference.html (accessed on 23 August 2025).
- Luo, D.; Wang, X. Moderntcn: A modern pure convolution structure for general time series analysis. In Proceedings of the 12th International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024; pp. 1–43. [Google Scholar]
- Liu, M.; Zeng, A.; Chen, M.; Xu, Z.; Lai, Q.; Ma, L.; Xu, Q. Scinet: Time series modeling and forecasting with sample convolution and interaction. Adv. Neural Inf. Process. Syst. 2022, 35, 5816–5828. [Google Scholar]
- Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
- Liu, S.; Yu, H.; Liao, C.; Li, J.; Lin, W.; Liu, A.X.; Dustdar, S. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A time series is worth 64 words: Long-term forecasting with transformers. In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Zhang, Y.; Yan, J. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Dao, T.; Gu, A. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Peng, B.; Alcaide, E.; Anthony, Q.; Albalak, A.; Arcadinho, S.; Biderman, S.; Cao, H.; Cheng, X.; Chung, M.; Derczynski, L.; et al. Rwkv: Reinventing rnns for the transformer era. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023; Association for Computational Linguistics: Singapore, 2023. [Google Scholar] [CrossRef]
- Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
- Cheng, K.; Zhang, C.; Shao, K.; Tong, J.; Wang, A.; Zhou, Y.; Zhang, Z.; Zhang, Y. A SOH Estimation Method for Lithium-Ion Batteries Based on CPA and CNN-KAN. Batteries 2025, 11, 238. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhang, R.; Liu, X.; Zhang, C.; Sun, G.; Zhou, Y.; Yang, Z.; Liu, X.; Chen, S.; Dong, X.; et al. Advanced State-of-Health Estimation for Lithium-Ion Batteries Using Multi-Feature Fusion and KAN-LSTM Hybrid Model. Batteries 2024, 10, 433. [Google Scholar] [CrossRef]
- Yang, X.; Wang, X. Kolmogorov-Arnold Transformer. In Proceedings of the 13th International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
- Li, C.; Liu, X.; Li, W.; Wang, C.; Liu, H.; Liu, Y.; Chen, Z.; Yuan, Y. U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025. [Google Scholar] [CrossRef]
- Zhao, P.; Hu, W.; Cao, D.; Zhang, Z.; Huang, Y.; Dai, L.; Chen, Z. Probabilistic multienergy load forecasting based on hybrid attention-enabled transformer network and Gaussian process-aided residual learning. IEEE Trans. Ind. Inform. 2024, 20, 8379–8393. [Google Scholar] [CrossRef]
- Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 15908–15919. [Google Scholar] [CrossRef]
- Vaca-Rubio, C.J.; Blanco, L.; Pereira, R.; Caus, M. Kolmogorov-arnold networks (kans) for time series analysis. In Proceedings of the 2024 IEEE Globecom Workshops, Cape Town, South Africa, 8–12 December 2024. [Google Scholar] [CrossRef]
- Zhu, M.; Wang, H.; Meng, Y.; Xu, S.; Lin, Y.; Shan, Z. Self-Supervised Mamba for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5531312. [Google Scholar] [CrossRef]






| Ours | Autoformer | Informer | Transformer | TCN | LSTNet | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | ||
| ETTm1 | 48 | 0.434 | 0.438 | 0.493 | 0.466 | 0.370 | 0.400 | 0.479 | 0.461 | 0.562 | 0.596 | 1.999 | 1.215 |
| 96 | 0.495 | 0.469 | 0.557 | 0.499 | 0.548 | 0.517 | 0.563 | 0.486 | 0.641 | 0.652 | 2.762 | 1.542 | |
| 168 | 0.528 | 0.485 | 0.613 | 0.516 | 0.557 | 0.510 | 0.558 | 0.512 | 0.998 | 0.801 | 2.344 | 1.320 | |
| 336 | 0.568 | 0.505 | 0.578 | 0.512 | 0.846 | 0.680 | 0.780 | 0.609 | 1.094 | 0.868 | 1.513 | 2.355 | |
| avg | 0.506 | 0.474 | 0.560 | 0.498 | 0.580 | 0.527 | 0.595 | 0.517 | 0.824 | 0.729 | 2.154 | 1.608 | |
| ETTm2 | 48 | 0.209 | 0.305 | 0.210 | 0.305 | 0.256 | 0.367 | 0.454 | 0.509 | 0.690 | 0.684 | 2.513 | 1.023 |
| 96 | 0.230 | 0.306 | 0.252 | 0.336 | 0.500 | 0.520 | 0.435 | 0.480 | 0.665 | 0.662 | 3.142 | 1.365 | |
| 168 | 0.281 | 0.334 | 0.285 | 0.342 | 0.732 | 0.653 | 0.609 | 0.604 | 0.985 | 0.837 | 3.183 | 1.440 | |
| 336 | 0.344 | 0.373 | 0.583 | 0.465 | 1.405 | 0.907 | 1.304 | 0.865 | 1.023 | 0.855 | 3.160 | 1.369 | |
| avg | 0.267 | 0.331 | 0.333 | 0.362 | 0.723 | 0.612 | 0.701 | 0.615 | 0.841 | 0.760 | 2.999 | 1.299 | |
| ETTh 1 | 48 | 0.465 | 0.462 | 0.456 | 0.444 | 0.591 | 0.541 | 0.792 | 0.677 | 0.465 | 0.536 | 1.456 | 0.959 |
| 96 | 0.494 | 0.474 | 0.446 | 0.448 | 1.012 | 0.765 | 1.700 | 1.015 | 1.262 | 0.952 | 1.514 | 1.043 | |
| 168 | 0.523 | 0.491 | 0.495 | 0.475 | 0.985 | 0.754 | 0.996 | 0.773 | 0.953 | 0.798 | 1.997 | 1.214 | |
| 336 | 0.521 | 0.493 | 0.522 | 0.499 | 1.962 | 1.086 | 1.210 | 0.832 | 1.170 | 0.898 | 2.655 | 1.370 | |
| avg | 0.501 | 0.480 | 0.480 | 0.467 | 1.138 | 0.787 | 1.175 | 0.824 | 0.963 | 0.796 | 1.906 | 1.147 | |
| ETTh 2 | 48 | 0.301 | 0.366 | 0.295 | 0.359 | 1.227 | 0.913 | 1.565 | 1.099 | 0.677 | 0.670 | 3.567 | 1.687 |
| 96 | 0.360 | 0.392 | 0.378 | 0.411 | 2.636 | 1.204 | 1.484 | 0.968 | 0.931 | 0.809 | 3.142 | 1.433 | |
| 168 | 0.418 | 0.425 | 0.420 | 0.434 | 5.586 | 1.941 | 5.807 | 1.846 | 0.997 | 0.843 | 3.242 | 2.513 | |
| 336 | 0.470 | 0.465 | 0.459 | 0.473 | 5.625 | 1.949 | 6.155 | 2.034 | 1.24 | 0.917 | 2.544 | 2.591 | |
| avg | 0.387 | 0.412 | 0.388 | 0.419 | 3.769 | 1.502 | 3.753 | 1.487 | 0.961 | 0.810 | 3.124 | 2.056 | |
| Transformer | √ | √ | √ | √ | |||||
|---|---|---|---|---|---|---|---|---|---|
| MoK Layer | √ | √ | √ | ||||||
| MoK Residual | √ | ||||||||
| Loss | √ | √ | |||||||
| MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE | ||
| ETTm1 | 48 | 0.495 | 0.489 | 0.357 | 0.383 | 0.467 | 0.475 | 0.434 | 0.438 |
| 96 | 0.601 | 0.512 | 0.518 | 0.488 | 0.497 | 0.487 | 0.495 | 0.469 | |
| 168 | 0.656 | 0.599 | 0.553 | 0.504 | 0.539 | 0.487 | 0.528 | 0.485 | |
| 336 | 0.798 | 0.732 | 0.592 | 0.524 | 0.578 | 0.524 | 0.568 | 0.505 | |
| avg | 0.6375 | 0.583 | 0.505 | 0.47475 | 0.52025 | 0.49325 | 0.50625 | 0.47425 | |
| ETTm2 | 48 | 0.557 | 0.591 | 0.213 | 0.312 | 0.219 | 0.308 | 0.209 | 0.305 |
| 96 | 0.401 | 0.443 | 0.23 | 0.306 | 0.226 | 0.306 | 0.231 | 0.313 | |
| 168 | 0.498 | 0.554 | 0.293 | 0.334 | 0.303 | 0.341 | 0.281 | 0.339 | |
| 336 | 0.892 | 0.768 | 0.344 | 0.373 | 0.345 | 0.373 | 0.345 | 0.377 | |
| avg | 0.762 | 0.618 | 0.27 | 0.33125 | 0.27325 | 0.332 | 0.2665 | 0.3335 | |
| ETTh1 | 48 | 0.501 | 0.616 | 0.476 | 0.472 | 0.475 | 0.463 | 0.465 | 0.462 |
| 96 | 1.722 | 1.435 | 0.512 | 0.487 | 0.503 | 0.479 | 0.494 | 0.474 | |
| 168 | 1.118 | 0.921 | 0.537 | 0.501 | 0.523 | 0.498 | 0.523 | 0.491 | |
| 336 | 1.332 | 0.154 | 0.575 | 0.521 | 0.556 | 0.503 | 0.521 | 0.493 | |
| avg | 1.16825 | 0.7815 | 0.525 | 0.49525 | 0.51425 | 0.48575 | 0.50075 | 0.48 | |
| ETTh2 | 48 | 1.318 | 1.206 | 0.338 | 0.382 | 0.321 | 0.366 | 0.301 | 0.366 |
| 96 | 1.701 | 1.294 | 0.383 | 0.405 | 0.360 | 0.392 | 0.360 | 0.392 | |
| 168 | 3.665 | 2.26 | 0.438 | 0.435 | 0.432 | 0.435 | 0.418 | 0.425 | |
| 336 | 4.12 | 3.112 | 0.483 | 0.473 | 0.470 | 0.465 | 0.470 | 0.465 | |
| avg | 2.701 | 1.968 | 0.4105 | 0.42375 | 0.39575 | 0.4145 | 0.38725 | 0.412 | |
| Transformer | √ | √ | √ | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| MoK Layer | √ | √ | ||||||||
| MoK Residual | √ | |||||||||
| Params (K) | Train Time (s) | Infer Time (s) | Params (K) | Train Time (s) | Infer Time (s) | Params (K) | Train Time (s) | Infer Time (s) | ||
| ETTm1 | 48 | 55.607 | 100.3353 | 16.4057 | 92.677 | 129.5712 | 19.4351 | 92.679 | 182.4870 | 19.9823 |
| 96 | 60.263 | 100.4389 | 16.8862 | 143.509 | 100.5479 | 18.5128 | 143.518 | 106.5039 | 18.8903 | |
| 168 | 67.247 | 99.9876 | 16.3470 | 254.317 | 105.1728 | 19.8446 | 254.334 | 129.4222 | 19.6151 | |
| 336 | 83.543 | 101.6427 | 16.7577 | 674.149 | 113.3706 | 19.1442 | 674.172 | 114.7070 | 18.7029 | |
| ETTm2 | 48 | 55.607 | 153.2247 | 16.0978 | 92.677 | 181.9950 | 19.5962 | 92.679 | 127.5592 | 20.0813 |
| 96 | 60.263 | 177.5038 | 16.4092 | 143.509 | 103.7785 | 18.9556 | 143.518 | 105.7998 | 19.8466 | |
| 168 | 67.247 | 96.1211 | 16.2430 | 254.317 | 137.6505 | 19.0147 | 254.334 | 103.4803 | 19.4713 | |
| 336 | 83.543 | 100.4475 | 16.2375 | 674.149 | 115.2007 | 18.2542 | 674.172 | 114.9110 | 18.8254 | |
| ETTh1 | 48 | 55.607 | 31.9809 | 4.2406 | 92.677 | 41.0232 | 4.7795 | 92.679 | 39.5874 | 4.9688 |
| 96 | 60.263 | 26.3584 | 4.1642 | 143.509 | 33.3917 | 4.6764 | 143.518 | 32.7113 | 4.8728 | |
| 168 | 67.247 | 25.1203 | 3.8040 | 254.317 | 46.0878 | 4.5608 | 254.334 | 46.2840 | 4.5593 | |
| 336 | 83.543 | 30.3644 | 3.6458 | 674.149 | 29.0643 | 4.2770 | 674.172 | 28.2728 | 4.1489 | |
| ETTh2 | 48 | 55.607 | 53.0575 | 3.9185 | 92.677 | 26.7017 | 4.7748 | 92.679 | 26.0733 | 4.7652 |
| 96 | 60.263 | 25.2872 | 3.8582 | 143.509 | 33.0026 | 4.5649 | 143.518 | 34.6736 | 4.7381 | |
| 168 | 67.247 | 25.1023 | 3.7895 | 254.317 | 53.4380 | 4.6053 | 254.334 | 51.9874 | 4.6088 | |
| 336 | 83.543 | 24.5331 | 3.6359 | 674.149 | 28.5350 | 4.3008 | 674.172 | 28.2535 | 4.0403 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Ma, L.; Guo, C.; Wang, Y.; Zhang, Y.; Zhang, B. KAN+Transformer: An Explainable and Efficient Approach for Electric Load Forecasting. Sustainability 2026, 18, 1677. https://doi.org/10.3390/su18031677
Ma L, Guo C, Wang Y, Zhang Y, Zhang B. KAN+Transformer: An Explainable and Efficient Approach for Electric Load Forecasting. Sustainability. 2026; 18(3):1677. https://doi.org/10.3390/su18031677
Chicago/Turabian StyleMa, Long, Changna Guo, Yangyang Wang, Yan Zhang, and Bin Zhang. 2026. "KAN+Transformer: An Explainable and Efficient Approach for Electric Load Forecasting" Sustainability 18, no. 3: 1677. https://doi.org/10.3390/su18031677
APA StyleMa, L., Guo, C., Wang, Y., Zhang, Y., & Zhang, B. (2026). KAN+Transformer: An Explainable and Efficient Approach for Electric Load Forecasting. Sustainability, 18(3), 1677. https://doi.org/10.3390/su18031677

