Next Article in Journal
The Effect of Temperature and Current on the Insulation Performance of PE and PVC Power Cables: A Finite Element Approach
Previous Article in Journal
The Substitution of Natural Gas with Biomethane in an Industrial Fluidized Bed Sand Drying Process
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integrating Kolmogorov–Arnold Networks with Time Series Prediction Framework in Electricity Demand Forecasting

College of Intelligent Manufacturing and Control Engineering, Shanghai Polytechnic University, Shanghai 201209, China
*
Author to whom correspondence should be addressed.
Energies 2025, 18(6), 1365; https://doi.org/10.3390/en18061365
Submission received: 10 December 2024 / Revised: 31 January 2025 / Accepted: 3 February 2025 / Published: 11 March 2025
(This article belongs to the Section F1: Electrical Power System)

Abstract

:
Electricity demand is driven by a diverse set of factors, including fluctuations in business cycles, interregional dynamics, and the effects of climate change. Accurately quantifying the impact of these factors remains challenging, as existing methods often fail to address the complexities inherent in these influences. This study introduces a time series forecasting model based on Kolmogorov–Arnold Networks (KANs), integrated with three advanced neural network architectures, Temporal Convolutional Network (TCN), Bidirectional Long Short-Term Memory (BiLSTM), and Transformer, to forecast UK electricity demand. The analysis utilizes real-world datasets from a leading utility company and publicly available sources. Experimental findings reveal that the integration of KANs significantly improves forecasting accuracy, robustness, and adaptability, particularly in modeling intricate sequential patterns in electricity demand time series. The proposed approach addresses the limitations of traditional time series models, underscoring the potential of KANs as a transformative tool for predictive analytics.

1. Introduction

Electricity demand forecasting plays a critical role in modern energy management. With the increasing strain on energy resources, enhancing forecasting accuracy through advanced models is essential for optimizing energy allocation and utilization. This is not only pivotal for the advancement of the power industry but also provides indispensable technical support for addressing global energy challenges. Numerous methods have been developed for electricity demand forecasting, ranging from traditional statistical techniques, such as regression analysis and autoregressive models, to more advanced approaches. Traditional methods, which rely on historical data, are well-suited for short-term and medium-term forecasting [1], but they exhibit significant limitations in long-term predictions. Statistical approaches such as ARIMA, ETS, and Prophet effectively address seasonality, long-term trends, and noise; however, their reliance on linearity assumptions, extensive historical data requirements, and restricted capacity to model complex non-linear relationships hinder their adaptability to abrupt structural changes in the data [2]. In contrast, machine learning models, such as support vector machines (SVMs) and artificial neural networks (ANNs), have been extensively employed for forecasting monthly electricity demand across different countries and regions [3]. While these models demonstrate strong nonlinear modeling capabilities, they are often susceptible to overfitting and struggle to effectively capture long-term trends. Grey prediction models, although capable of enhancing forecasting accuracy through data sequence transformation and background value optimization, are constrained by their sensitivity to initial conditions and reliance on small sample sizes. These limitations make it difficult for grey models to manage highly volatile or complex datasets effectively [4]. Although machine learning techniques are widely utilized in electricity demand forecasting [5,6,7,8,9,10,11], their inherent shortcomings have motivated researchers to explore alternative approaches.
Time series forecasting plays a vital role in various fields, influencing key decision-making processes in finance, medicine, economics, biology, and meteorology, highlighting its broad applicability and significance across disciplines [12,13,14,15]. In recent years, the application of time series models to forecast trends, seasonal variations, and cyclical patterns in electricity demand has garnered significant interest [16,17,18,19,20]. Ahmed et al. [21] addressed the complex multi-source power load forecasting problem by employing an ensemble learning method, integrating multiple predictive factors into the model, and achieving high prediction accuracy. Shi et al. [22] proposed a deep neural network model based on long short-term memory (LSTM) and recurrent neural networks (RNNs) for short-term electricity demand forecasting, although factors such as weather were not considered. To enhance forecasting performance, techniques such as optimization algorithms, integration methods, and time series decomposition have emerged as key technologies [23]. Kenneth [24] examined terminal energy consumption in Canada, the Asia-Pacific region, the United States, and European countries, incorporating socio-economic development factors. The findings revealed a nonlinear growth relationship between per capita income and electricity consumption. Similarly, Jovanovic [25] identified national-level factors, such as economic governance and population size, as critical determinants of electricity demand alongside economic variables.
Building on advancements in time series forecasting and the identification of key determinants of electricity consumption, this study seeks to improve the accuracy and reliability of electricity demand predictions by integrating Kolmogorov–Arnold Networks (KANs) with three established time series models: Transformer, Bidirectional Long Short-Term Memory (BiLSTM), and Temporal Convolutional Network (TCN).
The Transformer model, celebrated for its ability to capture long-range dependencies through self-attention mechanisms, offers substantial advantages in modeling intricate temporal relationships [26,27]. However, optimizing sparse attention for diverse data structures remains a challenge, as it may lead to the loss of critical information in highly dynamic sequences [28]. Long Short-Term Memory (LSTM) and its bidirectional variant, BiLSTM, are widely utilized for their proficiency in capturing long-term dependencies within sequential data [29]. BiLSTM extends this capability by processing input sequences in both forward and backward directions, enabling it to capture dependencies spanning past to present as well as present to future. This dual processing enhances prediction accuracy, often outperforming unidirectional LSTM in tasks such as stock price prediction [30]. TCN is another powerful deep learning architecture specifically designed for time series data, excelling at capturing local dependencies through convolutional operations. TCNs are highly adaptable and have been effectively applied to various forecasting tasks, including weather prediction [31] and short-term load forecasting for industrial users [32]. However, the increased complexity of BiLSTM and TCN makes them more prone to overfitting, especially when trained on smaller datasets, and they typically require larger amounts of data to achieve optimal performance [33].
Recent studies have introduced Kolmogorov–Arnold Networks (KANs) [34] as an innovative neural network architecture grounded in the Kolmogorov–Arnold representation theorem [35,36]. This theorem asserts that any multivariate continuous function can be expressed as a composition of univariate functions and addition operations. This foundational principle sets KANs apart from traditional Multi-Layer Perceptrons (MLPs) [37]. By leveraging the representation theorem [38], KANs enhance both accuracy and interpretability, enabling them to achieve comparable or even superior performance with more compact architectures in tasks such as data fitting and solving partial differential equations. Kolmogorov–Arnold Networks (KANs) have gained significant attention as a promising method in electrical engineering. For instance, Cabral et al. [39] utilized KANs for fault diagnosis in oil-immersed power transformers, highlighting their reliability in handling real-world imbalanced datasets. Jiang et al. [40] proposed the D_KAN model, integrating KANs with DLinear to enhance power load forecasting through improved accuracy and adaptability. Shuai et al. [41] developed physics-informed KANs for dynamic modeling of power systems, showcasing their ability to integrate physical constraints into data-driven models for improved interpretability. Dao et al. [42] employed KANs for estimating the state of charge in lithium-ion batteries and demonstrated that KANs provide more accurate predictions in complex scenarios. These studies highlight the significant potential of KANs in advancing electrical engineering applications.
To achieve more accurate power demand forecasting, this study integrates KAN with three established models. By utilizing these methods, the complex relationships and nonlinearities inherent in power demand are effectively captured, enabling a more comprehensive and precise analysis of the factors influencing power demand.
This paper is organized as follows. Section 2 outlines the materials and methods, detailing the data collection, preprocessing techniques, and feature selection process for power demand forecasting. Section 3 provides the theoretical background, covering time series models and the Kolmogorov–Arnold representation theorem, while also explaining the experimental framework and process. Section 4 presents the forecasting results and evaluates the performance of KAN integrated with time series models. Finally, Section 5 summarizes the study’s contributions and discusses potential future research directions.

2. Materials and Methods

In order to establish a UK electricity demand forecasting model, this study collected monthly electricity consumption data, along with various related datasets, from 2015 to April 2024 (Table 1). These datasets include, but are not limited to, economic and meteorological data. In this analysis, the total electricity demand in the UK (national demand, nd) is modeled as the dependent variable. The collected monthly data are time-series data, some of which exhibit a certain periodicity over time and include eight distinct factors (Figure 1).
Electricity demand in the UK is influenced by a variety of economic, environmental, and regional factors. Industrial electricity consumption (x2) is a significant component of total electricity demand, reflecting the monthly consumption of users nationwide. According to the UK’s regional electricity mobility policy, England and Wales exhibit relatively concentrated patterns of electricity production and consumption. The majority of the electricity supply in these regions is managed and distributed through a unified power transmission network, resulting in a strong relationship between regional power generation (e.g., England x3, Wales x4) and electricity consumption. Meteorological factors, including precipitation (x5), temperature (x6), wind speed (x7), and others, are key variables that influence electricity demand. Particularly during cold winters and hot summers, temperature fluctuations significantly affect both household and industrial electricity demand. Wind speed and precipitation may further influence electricity consumption by affecting climate comfort and the frequency of air conditioning and heating system use. Furthermore, the Consumer Price Index (CPI, x8), as an economic variable, may negatively impact electricity demand. Research indicates that when the CPI rises [43,44], higher consumer goods prices typically reduce purchasing power, thereby decreasing electricity demand. Transmission system demand (x9) is another critical factor, representing the additional generation needed to accommodate regional load changes or demand fluctuations via inter-regional electricity transmission. By analyzing the Pearson correlation coefficient heat map (Figure 2) and mutual information plot (Figure 3) of these factors, the relationship between each variable and nation demand can be quantified. The Pearson coefficient and mutual information are shown in Table 1.

3. Theory and Calculation

3.1. KANs Background

Kolmogorov–Arnold Networks (KANs) [34] represent a novel neural network architecture inspired by the Kolmogorov–Arnold representation theorem, designed to serve as an alternative to traditional Multi-Layer Perceptrons (MLPs). The Kolmogorov–Arnold representation theorem [35] posits that any multivariate continuous function f that depends on x = x 1 , x 2 , , x n within a bounded domain can be represented as a finite composition of simpler continuous functions involving only one variable. Formally, a real, smooth, and continuous multivariate function f x : 0,1 n R can be expressed through the superposition of univariate functions:
f x = f x 1 , , x n = q = 1 2 n + 1 Φ q p = 1 n ϕ q , p x p
where ϕ q , p : 0,1 R is inner function, Φ q : R R is outer function.
The architecture of KAN is defined by a KAN layer Φ = ϕ q , p , where p = 1,2 , ,   n in ,   q = 1,2 , , n out , and the functions ϕ q , p are parameterized with learnable parameters. This structure allows KAN to capture complex nonlinear relationships in the data more effectively. A KAN with a depth of L is constructed by stacking L KAN layers, with the configuration of this deeper KAN specified by an integer array. n 0 , n 1 , , n L , where n l indicates the number of neurons in the layer. Each l-th KAN layer takes an input of dimension n i n and produces an output of dimension n l + 1 , transforming the input vector x l R n l to x l + 1 R n l + 1 [34], as shown in Equation (2).
x l + 1 = ϕ l , 1,1 ϕ l , 1,2 ϕ l , 1 , n l ϕ l , 2,1 ϕ l , 2,2 ϕ l , 2 , n l ϕ l , n l + 1 , 1 ϕ l , n l + 1 , 2 ϕ l , n l + 1 , n l Φ l
The function matrix Φ l corresponds to the l- t h KAN layer, and the KAN is essentially formed by combining multiple KAN layers
K A N x = Φ L 1 Φ L 2 Φ 0 x
The network depth, determined by the number of layers, enables the model to capture more complex patterns and relationships within the data. Each KAN layer processes the input x through a series of learnable functions   ϕ q , p , enabling the network to be highly adaptable and resilient.
The Kolmogorov–Arnold (KAN) representation theorem facilitates the development of novel neural network architectures by replacing conventional linear weights with univariate B-spline-based functions, which serve as learnable activation functions. These B-splines are represented through the basis functions N i , j t . Integrating the network’s multilayer structure with dynamic adjustments to the activation function through a grid expansion technique allows the network to adapt to varying data resolutions. This design optimizes performance by fine-tuning the accuracy of the activation function. The flexibility of the spline enables adaptive modeling of complex relationships by reshaping the spline, minimizing approximation errors, and enhancing the network’s capability to learn intricate patterns from high-dimensional datasets. The 0th-order basis function N i , 0 t is defined as follows:
N i , 0 t = 1   i f   t i     t   t i + 1     a n d   t i <   t i + 1     , 0   o t h e r w i s e  
Higher-order basis functions N i , j t are calculated using the recursive formula:
N i , j t = t t i t i + j t i N i , j 1 t + t i + j + 1   t t i + j + 1 t i + 1 N i + 1 , j 1 t ,
where j = 1 ,   2 ,   ,   p , and i is the index of the basis function, corresponding to the position of the first knot t i that influences N i , j t , with i = 0 ,   1 ,   ,   m j 1 , where m denotes the total number of knots, ensuring the proper definition of the basis functions within the knot sequence. The B-spline curve is defined by the following equation:
C t = i = 0 n P i N i , p t
This is called B-spline. This method provides greater adaptability in creating the architecture of the neural network and improves the KAN model’s ability to learn and represent data, allowing it to better handle nonlinear relationships within intricate datasets.
During the training process, a grid search method is used, where the parameters of the spline function (i.e., the coefficients of the basis functions) are adjusted to minimize the loss function. The shape of the spline function is dynamically refined to best fit the training data. This optimization process usually utilizes techniques such as gradient descent, and the spline function parameters are updated in each iteration to reduce the prediction error. Since KANs are similar to multi-layer perceptrons (MLPs) on the outside, they can not only extract features but also optimize features with high accuracy by leveraging their internal similarities to spline functions. KANs exhibit strong function approximation capabilities, high interpretability, smoothness, and robustness while providing flexible parameter settings and adaptive characteristics for time series forecasting. These advantages enable KANs to effectively handle a variety of complex time series data, thereby improving prediction accuracy and model stability.

3.2. BiLSTM-KANs

With its bidirectional structure, long short-term memory units, and strong context understanding capabilities, BiLSTM effectively captures long-term dependencies, thereby enhancing prediction accuracy and reliability in time series forecasting. The BiLSTM-KANs model leverages the bidirectional long short-term memory network’s ability to comprehensively capture temporal dependencies and understand the context of time series data [30]. Unlike LSTM, which typically processes information in only a forward direction, BiLSTM incorporates two layers of LSTM: one processes the input sequence in the forward direction, while the other processes it in reverse. The forward LSTM captures information about past data within the input sequence, whereas the reverse LSTM retrieves information about future data. The outputs of these two hidden layers are then combined to provide a more comprehensive representation of the input sequence [45]. The hidden state h t   of BiLSTM at t current time contains both of forward h t and backward h t , where denotes the summation by component, used to sum the forward and backward output components.
h t = h t h t
Replacing the fully connected layer of BiLSTM with KANs enhances the model’s function approximation and feature extraction abilities, while also improving smoothness and robustness, thus reducing the risk of overfitting and providing higher interpretability. The integrated equation is as follows:
z t = Φ h t
where Φ ( )   is the output of the KAN layer to the BiLSTM hidden state, and z t   represents the refined feature output of the KAN layer. These advantages make KANs highly promising for time series forecasting tasks. BiLSTM-KAN is well-suited for modeling short to medium-time series and demonstrates strong performance in capturing bidirectional dependencies, though its efficiency and the constraints on sequence length require further optimization.

3.3. TCN-KANs

Temporal Convolutional Networks (TCNs) are a type of convolutional neural network (CNN) specifically designed for sequence modeling tasks under causal constraints. A TCN comprises multiple stacked residual blocks, each containing two layers of dilated causal convolutions with ReLU as the activation function [46]. The dilated convolution operation F applied to an element s in the 1-D sequence x R using a filter f : { 0,1 , , k 1 } R , can be expressed as follows, where k refers to the size of the filter, d represents the dilation factor, and ∗ represents the convolution operation:
F s = x d f s = i = 1 k 1 f i x s d i
TCN is another powerful deep learning architecture tailored for time series data, which excels at capturing local dependencies through convolution operations. After stacking multiple layers of residual structures, the final low-level features of TCN can be expressed as:
h T C N = T C N ( x )
where h T C N is the feature obtained by stacking the dilated convolution F s . The integrated equation is as follows:
z t = q = 1 2 D + 1 g q p = 1 D ϕ q , p h p
where h p is the p- t h dimension of the low-level features h T C N extracted by TCN; ϕ q , p is the one-dimensional continuous function used for nonlinear transformation of the feature h p ; g q is one-dimensional output function that processes the high-order combined features; D is the feature dimension of h T C N ; q   indicates the index range of feature combinations.
The TCN has several advantages in time series forecasting, such as capturing long-term dependencies, processing long sequence data, supporting data parallelism, and ensuring fast training speed. TCN-KANs can fully leverage the strengths of TCN in handling long-term dependencies and multi-scale feature extraction. The convolutional layer of TCN extracts low-level features from the original data, while KANs further extract higher-level feature representations from these low-level features. KANs replace the fully connected layer of TCN. This hierarchical feature abstraction enhances the model’s understanding of complex structures and patterns in the input data, thereby enhancing feature representation, forecasting performance, and interpretability. Additionally, it reduces the risk of overfitting and allows the model to better adapt to time series forecasting tasks. TCN-KAN exhibits notable strengths in computational efficiency, particularly for large-scale parallel tasks, but its performance in handling complex dependency patterns may be limited.

3.4. Transformer-KANs

The core concept of the Transformer model lies in its self-attention mechanism, which enables the model to capture dependencies between any two positions in a sequence, regardless of their distance. This mechanism significantly enhances the model’s capability to manage long-range dependencies [23]. The self-attention mechanism is defined by the formula:
Attention Q , K , V = softmax Q K T d k V
where Q ,   K , and V represent the query, key, and value matrices, and d k is the dimensionality of the keys. To capture information from various representation subspaces, the Transformer utilizes multi-head attention, which is mathematically formulated as:
MultiHead Q , K , V = Concat head 1 , , head h W O
Here, head i   denotes individual attention mechanisms, and W O is the output weight matrix. Since the Transformer does not inherently consider the order of sequence elements, positional encoding is incorporated to provide sequence order information:
P E p o s , 2 i = sin p o s 10000 2 i / d model ,   P E p o s , 2 i + 1 = cos p o s 10000 2 i / d model
In these equations, p o s indicates the position in the sequence, i is the dimension index, and d model is the model’s dimensionality. The Transformer’s architecture, consisting of an encoder and a decoder with self-attention and feed-forward networks in each layer, has proven highly effective for sequence-to-sequence tasks, offering parallel processing advantages and improved training efficiency.
The Transformer provides several advantages, including global dependency modeling, strong parallel processing capabilities, a multi-head attention mechanism to capture complex patterns, and high training efficiency in time series forecasting. Transformer-KANs fully utilize the Transformer’s strengths in global dependency modeling and parallel processing, with KANs replacing the Transformer’s fully connected layer. The mapping equation for passing the output of the self-attention mechanism to KAN is:
z t = q = 1 2 D + 1 g q p = 1 D ϕ q , p h p
where h p is the superposition of the input feature h and the positional encoding P E p o s , which contains contextual information and sequence position information. ϕ q , p is one-dimensional continuous function used for nonlinear transformation of the feature h p ; g q is the one-dimensional output function that processes the high-order combined features; D is the feature dimension of h T C N ; q   indicates the index range of feature combinations.
KANs take advantage of their strengths in the final feature mapping stage to generate more representative feature vectors. Compared to a simple fully connected layer, KANs more effectively extract important features from the input data, thereby improving the model’s performance. Transformer-KAN excels in global modeling of long time series; however, it demands significant computational resources.

3.5. Model Building Steps Using KANs

In this study, the electricity demand forecasting problem is formulated as a time series at time t, denoted by y _ t . The goal is to predict the future values of this series.
Y t 0 : T = y t 0 , y t 0 + 1 , , y t 0 + T
based solely on its historical values
X t 0 k : t 0 1 = X t 0 k , , X t 0 2 , X t 0 1 k
where t 0 denotes the starting point from which future values y t , t = t 0 ,…, T are to be predicted. The historical time range [ t 0 k , t 0 1 ] and the forecast range [ t 0 , T] are the context and prediction lengths. The focus is on generating point forecasts for each time step in the prediction length, aiming to achieve accurate and reliable forecasts.
The electricity demand forecasting problem is formulated as a supervised learning task, where input–output pairs X t 0 k : t 0 1 , Y t 0 : T represent the context and prediction lengths. The goal is to find a function g that approximates Y t 0 : T . In this framework, three different time series models are employed: Transformer, TCN, and BiLSTM. Despite differences in their architectures, the models share a common structure. The output and input layers will comprise the N o u t and N i n nodes, while the hidden layer comprise n nodes. Each model then processes the input sequence through its core structure. Following these processing steps, the output is passed to a KAN layer. The framework can be expressed by the:
y = f time Φ
where Φ is defined the KAN layer, f t i m e denotes the time series models. The schematic diagram of the network structure is depicted in Figure 4.

3.6. The Experimental Framework

The research process is divided into four consecutive steps (Figure 5). In step 1, data is collected from large organizations and public sources. In step 2, the data is initially processed, including handling missing values, outliers, data scaling, etc. In step 3, the prediction model is trained and optimized. In step 4, cross-validation is performed to evaluate the performance of the prediction model.
Step 1: Drawing from multiple information sources, this study comprehensively analyzes the key factors influencing regional electricity demand to establish a robust feature selection framework. First, common feature variables were identified based on insights from classic literature in the field of electricity demand forecasting and related research. These variables include economic indicators, weather conditions, and historical power load data, all of which are widely recognized as having a significant impact on electricity demand. Additionally, the expertise of senior professionals from the power industry—spanning power generation companies, power dispatching departments, and power grid operation units—was leveraged. These experts, with extensive experience in the industry, provided valuable insights into the primary drivers of electricity demand through long-term practice. Using this background information, a preliminary candidate feature set was constructed. Next, the correlation between each feature and the target variable was evaluated using statistical and analytical techniques, such as mutual information and heatmap analysis, to eliminate low correlation or redundant features. Ultimately, an optimal feature set was determined for model construction and subsequent analysis.
Electricity-related data such as national demand, monthly electricity consumption of users, and regional power generation are downloaded and collated from the public website of the UK National Grid ESO. Financial and economic historical data are obtained from the large database of the data service provider Wind. Considering the impact of weather conditions on electricity demand, regional meteorological data are collected from the public website of the UK meteorological station, covering nine indicators from 2015 to 2024. Information such as the heat map and mutual information of factors is described in the second part.
Step 2: The model was developed and executed using Python in the PyCharm (Version Number 2024.3.2) environment. Preliminary descriptive statistical analysis of the dataset was conducted to address issues such as missing values and outliers. Basic visualization techniques were applied to gain a clear understanding of the data distribution, while the relationships between variables were examined using heatmaps and mutual information analysis. Outliers were identified during the preliminary analysis using the interquartile range method and were replaced through median interpolation. The missing rates for the dataset are detailed in Table 2. To handle missing values efficiently, feature engineering techniques were employed alongside the trend extrapolation method. The trend extrapolation method was specifically chosen for its ability to preserve inherent trends and temporal dependencies within the dataset.
The data was standardized using z-score standardization (Equation (19)). Where x i j represents the j- t h observation of the i- t h variable in the dataset, while M e a n i and S t d i represent the mean and standard deviation of the i- t h variable, respectively. The standardized data after processing is denoted as x ¯ i j .
x ¯ i j = x i j M e a n i S t d i
Step 3: A sliding window approach was applied to structure the input data for the model. During the experiments, different window sizes, including 10, 15, 20, and 30, were evaluated to determine the best value for capturing temporal dependencies in the time series data. A window size of 20 was chosen because it consistently achieved the best performance in both RMSE and MAE, the optimal window size is determined by grid search. This approach ensures that temporal dependencies within the time series data are preserved, allowing the model to learn sequential patterns effectively. All input and target features are scaled to the range of [−1, 1] to ensure data processing on a unified scale, thereby enhancing training efficiency and prediction accuracy.
Training and tuning the regional electricity demand forecasting models were conducted using three time series forecasting models, namely, the Transformer model, BiLSTM model, and TCN model, integrated with Kolmogorov–Arnold Networks (KANs). The forecasting indicators are compared with those of traditional models such as LSTM and GRU. Based on existing literature, these models were selected to capture the complex dependencies in the data. Use the preprocessed dataset in step 2 to train the model. To improve model predictive performance, the models were carefully tuned, and the best combination of parameters was identified using a grid search method. The dataset is divided into two parts, 80% for training and the remaining 20% for testing. Early stopping was also employed to prevent overfitting and further enhance model generalizability. In the KAN structure, the activation function employed was SiLU, with a grid size of 200. The weights between the adaptive and uniform grids were set to 0.02, and the noise scale of the spline weights was set to 0.1.
Step 4: To assess the predictive performance and accuracy of the models, we used the following indicators: mean squared error (RMSE), mean absolute error (MAE), the coefficient of determination (R2), and Akaike Information Criterion (AIC).

4. Experimental Results and Discussion

The results demonstrate that these methods and strategies significantly improve the model’s forecasting accuracy and generalization capacity, providing robust support for future electricity demand forecasting. Throughout the training process, the MAE, RMSE, R², and AIC values for each epoch were recorded, and the models with traditional LSTM and GRU indicators were compared. Radar diagrams of various metrics are shown in Figure 6. These metrics are used to assess the model’s goodness of fit and forecasting performance on the training set. For demonstration purposes, data from 2024 to the present were sampled. Figure 7, Figure 8 and Figure 9 depict the actual input values (red) and test forecasted values (green). Comparing the actual values with the forecasted values reveals that the KAN-based model generates smoother forecasts and exhibits smaller errors across various time periods, particularly when addressing erratic fluctuations in the data. This advantage is particularly crucial in long-term time series forecasting tasks, as it delivers more reliable forecasts, offering valuable insights for practical applications.
As shown in Table 3 that TCN-KANs, Transformer-KANs, and BiLSTM-KANs exhibit improved RMSE and R 2 compared to their original models (TCN, Transformer, and BiLSTM). Specifically, when comparing TCN-KANs with TCN, the introduction of the KANs network reduces the TCN model’s RMSE and MAE by 0.011 and 0.006, respectively, and R 2 enhanced by 0.031. This signifies improved accuracy in electricity demand forecasting and effective prediction error reduction, indicating that the addition of the KANs network enhances the TCN model’s feature extraction and data pattern capturing capabilities. The AIC indicator is employed for model selection, as it balances model fit and complexity while evaluating the relative quality of statistical models by penalizing overfitting. A lower AIC value indicates better model prediction performance [47]. Both TCN-KANs and TCN achieve a smaller AIC value of 125.05, demonstrating their superior ability to balance accuracy and complexity. Compared to the indexes of traditional LSTM and GRU models, the proposed approach demonstrates significant improvements in performance metrics.
Similarly, comparing Transformer-KANs with Transformer, the introduction of the KANs network reduces the Transformer model’s RMSE and MAE by 0.012 and 0.009, respectively, and R 2 enhanced by 0.0299. Both Transformer-KANs and Transformer achieve a smaller AIC value of 99.1, and the indicators are better than those of radical LSTM and GRU models. This demonstrates improved accuracy through better capture of complex patterns and global dependencies, highlighting the KANs network’s role in enhancing feature representation and reducing overfitting.
When comparing BiLSTM-KANs with BiLSTM, the introduction of the KANs network reduces the BiLSTM model’s RMSE and MAE by 0.002, R 2 enhanced improvement by 0.0251. Both Transformer-KANs and Transformer achieve a smaller AIC value of 39.78. All indicators are better than LSTM and GRU. This improvement demonstrates the KANs network’s effectiveness in enhancing the BiLSTM model’s ability to capture and interpret time series data, thereby increasing prediction accuracy. Furthermore, this enhancement confirms the advantages of the KANs network in feature extraction and pattern recognition within the BiLSTM model.
To verify the performance of the proposed model, the widely recognized ETTh1 and ETTm1 datasets were used for evaluation, as shown in Table 4. The results indicate that the proposed model performs exceptionally well with respect to RMSE and MAE metrics. These findings validate the model’s effectiveness and generalizability for achieving accurate and dependable forecasting in practical scenarios.
In summary, it can be concluded that the introduction of the KANs network in the TCN, Transformer, and BiLSTM models has resulted in varying degrees of index reduction across all models, significantly improving forecasting performance. This demonstrates that the KANs network can further optimize feature extraction and pattern recognition capabilities within existing time series forecasting models, thereby enhancing overall forecasting performance. The findings of this study hold great significance for further refining time series forecasting models and improving the accuracy of electricity demand forecasting.

5. Conclusions

This work tackles the electricity demand forecasting problem by considering both common influencing factors and region-specific predictor variables. The proposed method integrates Kolmogorov–Arnold Networks (KANs) with prediction networks, demonstrating the potential and robustness of KANs in forecasting applications. Compared to traditional methods, the modified architecture significantly improves multi-step forecasting performance and stability, effectively addressing the long-term dependency issues often encountered in time series forecasting.
When compared to models such as LSTM, BRU, BiLSTM, TCN, and Transformer, the proposed architecture excels in long-term forecasting, showcasing its capacity to manage extended time horizons. By incorporating factors such as climate and economic variables, the model confirms the superior stability and effectiveness of KANs in processing real-world historical market data. These findings further validate the relevance and practicality of the Kolmogorov–Arnold Network framework for time series applications. The combination of the KAN network with other neural networks is driven by the complementary capabilities of these networks. While KAN excels at nonlinear feature extraction, it lacks the inherent ability to model temporal dependencies or global relationships, which are crucial for tasks such as time-series forecasting. Neural networks like BiLSTM, TCN, and Transformer specialize in capturing temporal and global patterns but often face challenges in capturing intricate nonlinear representations. Integrating KAN enhances these architectures by refining feature representation, thus improving predictive accuracy and robustness.
While the results of this study demonstrate the effectiveness of the proposed models, several limitations should be acknowledged. The findings are based on specific datasets and application scenarios. While the proposed models demonstrated strong generalizability in this context, and have been evaluated on the widely used ETTm1 and ETTh1 benchmark datasets, their applicability to other domains and data types requires further validation.
This study can be extended by incorporating additional machine learning techniques to further enhance the predictive framework. Balancing prediction accuracy, computational complexity, and practical applicability remains a critical aspect of future research. Considering data availability, it is important to focus on accumulating sufficient relevant industrial data to improve the robustness and real-world applicability of the proposed models. Additionally, future research should aim to quantify the impact of low-carbon transitions and economic development on regional electricity demand. This will help address emerging challenges in sustainable energy planning and provide more actionable insights for decision-makers.

Author Contributions

Y.Z.: Writing—original draft; Data curation; Methodology; Writing—review and editing. L.C.: Formal analysis; Supervision; Writing—review and editing. W.Y.: Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Klyuev, R.V.; Morgoev, I.D.; Morgoeva, A.D.; Gavrina, O.A.; Martyushev, N.V.; Efremenkov, E.A.; Mengxu, Q. Methods of Forecasting Electric Energy Consumption: A Literature Review. Energies 2022, 15, 8919. [Google Scholar] [CrossRef]
  2. Pełka, P. Analysis and Forecasting of Monthly Electricity Demand Time Series Using Pattern-Based Statistical Methods. Energies 2023, 16, 827. [Google Scholar] [CrossRef]
  3. Lee, M.H.L.; Ser, Y.C.; Selvachandran, G.; Thong, P.H.; Cuong, L.; Son, L.H.; Tuan, N.T.; Gerogiannis, V.C. A Comparative Study of Forecasting Electricity Consumption Using Machine Learning Models. Mathematics 2022, 10, 1329. [Google Scholar] [CrossRef]
  4. Li, K.; Zhang, T. Forecasting Electricity Consumption Using an Improved Grey Prediction Model. Information 2018, 9, 204. [Google Scholar] [CrossRef]
  5. Shiwakoti, R.K.; Charoenlarpnopparut, C.; Chapagain, K. A Deep Learning Approach for Short-Term Electricity Demand Forecasting: Analysis of Thailand Data. Appl. Sci. 2024, 14, 3971. [Google Scholar] [CrossRef]
  6. Yaprakdal, F. An Ensemble Deep-Learning-Based Model for Hour-Ahead Load Forecasting with a Feature Selection Approach: A Comparative Study with State-of-the-Art Methods. Energies 2023, 16, 57. [Google Scholar] [CrossRef]
  7. Abumohsen, M.; Owda, A.Y.; Owda, M. Electrical Load Forecasting Using LSTM, GRU, and RNN Algorithms. Energies 2023, 16, 2283. [Google Scholar] [CrossRef]
  8. Wang, Z.; Chen, Z.; Yang, Y.; Liu, C.; Li, X.; Wu, J. A hybrid autoformer framework for electricity demand forecasting. Energy Rep. 2023, 9, 3800–3812. [Google Scholar] [CrossRef]
  9. Li, X.; Jiang, M.; Cai, D.; Song, W.; Sun, Y. A Hybrid Forecasting Model for Electricity Demand in Sustainable Power Systems Based on Support Vector Machine. Energies 2024, 17, 4377. [Google Scholar] [CrossRef]
  10. Chen, Y.; Liu, C.; Ge, J.; Wu, J.; Zhao, X.; Gao, Z. Deep learning models for forecasting electricity demand in green low-carbon supply chains. Int. J. Low-Carbon Technol. 2024, 19, 2375–2382. [Google Scholar] [CrossRef]
  11. Jnr, E.O.N.; Ziggah, Y.Y. Electricity demand forecasting based on feature extraction and optimized backpropagation neural network. E-Prime-Adv. Electr. Eng. Electron. Energy 2023, 6, 100293. [Google Scholar] [CrossRef]
  12. Sezer, O.B.; Gudelek, M.U.; Ozbayoglu, A.M. Financial time series forecasting with deep learning: A systematic literature review:2005–2019. Appl. Soft Comput. 2020, 90, 106181. [Google Scholar] [CrossRef]
  13. Chen, Z.; Ma, M.; Li, T.; Wang, H.; Li, C. Long sequence time-series forecasting with deep learning: A survey. Inf. Fusion 2023, 97, 101819. [Google Scholar] [CrossRef]
  14. Zhu, X.; Xiong, Y.; Wu, M.; Nie, G.; Zhang, B.; Yang, Z. Weather2k:A multivariate spatio-temporal benchmark dataset for meteorological forecasting based on real-time observation data from ground weather stations. arXiv 2023, arXiv:2302.10493. [Google Scholar]
  15. Lee, Y.W.; Tay, K.G.; Choy, Y.Y. Forecasting electricity consumption using time series model. Int. J. Eng. Technol. 2018, 7, 218–223. [Google Scholar] [CrossRef]
  16. Iftikhar, H.; Gonzales, S.M.; Zywiołek, J.; López-Gonzales, J.L. Electricity demand forecasting using a novel time series ensemble technique. IEEE Access 2024, 12, 88963–88975. [Google Scholar] [CrossRef]
  17. Son, H.; Kim, C. A deep learning approach to forecasting monthly demand for residential–sector electricity. Sustainability 2020, 12, 3103. [Google Scholar] [CrossRef]
  18. Tzortzis, A.M.; Pelekis, S.; Spiliotis, E.; Karakolis, E.; Mouzakitis, S.; Psarras, J.; Askounis, D. Transfer learning for day-ahead load forecasting: A case study on European national electricity demand time series. Mathematics 2023, 12, 19. [Google Scholar] [CrossRef]
  19. Moalem, S.; Ahari, R.M.P.; Shahgholian, G.; Moazzami, M.; Kazemi, S.M. Electricity supply chain hybrid long-term demand forecasting approach based on deep learning—A case study of basic metals industry. J. Eng. 2023, 2023, e12265. [Google Scholar] [CrossRef]
  20. Waheed, W.; Xu, Q. Optimal Short Term Power Load Forecasting Algorithm by Using Improved Artificial Intelligence Technique. In Proceedings of the 2020 2nd International Conference on Computer and Information Sciences (ICCIS) 2020, Sakaka, Saudi Arabia, 13–15 October 2020; pp. 1–4. [Google Scholar]
  21. Ul Islam, B.; Ahmed, S.F. Short-Term Electrical Load Demand Forecasting Based on LSTM and RNN Deep Neural Networks. Math. Probl. Eng. 2022, 2022, 2316474. [Google Scholar] [CrossRef]
  22. Dimd, B.D.; Voller, S.; Cali, U.; Midtgard, O. A Review of Machine Learning-Based Photovoltaic Output Power Forecasting: Nordic Context. IEEE Access 2022, 10, 26404–26425. [Google Scholar] [CrossRef]
  23. Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar] [CrossRef]
  24. Kenneth, B. Economic Development and End-Use Energy Demand. Energy 2001, 22, 2–5. [Google Scholar]
  25. Jovanovic, S.; Savic, S.; Bojic, M.; Djordjevic, Z.; Nikolic, D. The impact of the mean daily air temperature change on electricity consumption. Energy 2015, 88, 604–609. [Google Scholar] [CrossRef]
  26. Lim, B.; Arık, S.Ö.; Loeff, N.; Pfister, T. Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
  27. Wu, Z.; Liu, Z.; Lin, J.; Lin, Y.; Han, S. Lite Transformer with Long-Short Range Attention. arXiv 2020, arXiv:2004.11886. [Google Scholar] [CrossRef]
  28. Wu, S.; Xiao, X.; Ding, Q.; Zhao, P.; Wei, Y.; Huang, J. Adversarial Sparse Transformer for Time Series Forecasting. Neural Inf. Process. Syst. 2020, 33, 17105–17115. Available online: https://api.semanticscholar.org/CorpusID:227275356 (accessed on 16 May 2024).
  29. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  30. Chaudhari, M.; Mayekar, H.; Mishra, A.; Bhave, D. Performance Analysis of Stock Price Prediction Model using LSTM and BiLSTM. Int. J. Res. Appl. Sci. Eng. Technol. 2022, 10, 810–812. [Google Scholar] [CrossRef]
  31. Hewage, P.; Behera, A.; Trovati, M.; Pereira, E.; Ghahremani, M.; Palmieri, F.; Liu, Y. Temporal convolutional neural (TCN) network for an effective weather forecasting using time-series data from the local weather station. Soft Comput. 2020, 24, 16453–16482. [Google Scholar] [CrossRef]
  32. Wang, Y.; Chen, J.; Chen, X.; Zeng, X.; Kong, Y.; Sun, S.; Guo, Y.; Liu, Y. Short-term load forecasting for industrial customers based on TCN-LightGBM. IEEE Trans. Power Syst. 2020, 36, 1984–1997. Available online: https://api.semanticscholar.org/CorpusID:225119880 (accessed on 17 May 2024). [CrossRef]
  33. Wang, W.; Liu, Y.; Sun, H. Tlnets: Transformation learning networks for long-range time-series prediction. arXiv 2023, arXiv:2305.15770. [Google Scholar] [CrossRef]
  34. Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. KANs: Kolmogorov-arnold networks. arXiv 2024, arXiv:2404.19756. [Google Scholar] [CrossRef]
  35. Kolmogorov, A.N. On the Representation of Continuous Functions of Several Variables by Superpositions of Continuous Functions of a Smaller Number of Variables; American Mathematical Society: Providence, RI, USA, 1961. [Google Scholar]
  36. Braun, J.; Griebel, M. On a constructive proof of Kolmogorov’s superposition theorem. Constr. Approx. 2009, 30, 653–675. [Google Scholar] [CrossRef]
  37. Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
  38. Schmidt-Hieber, J. The Kolmogorov-Arnold representation theorem revisited. Neural Netw. Off. J. Int. Neural Netw. Soc. 2020, 137, 119–126. [Google Scholar] [CrossRef] [PubMed]
  39. Cabral, T.W.; Gomes, F.V.; de Lima, E.R.; Filho, J.C.S.S.; Meloni, L.G.P. Kolmogorov–Arnold Network in the Fault Diagnosis of Oil-Immersed Power Transformers. Sensors 2024, 24, 7585. [Google Scholar] [CrossRef] [PubMed]
  40. Jiang, M.; Zhou, H. D_KAN Model: Enhancing Power Load Forecasting with Kolmogorov-Arnold Networks. In Proceedings of the 2024 4th International Conference on Energy Engineering and Power Systems (EEPS), Hangzhou, China, 9–11 August 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 335–338. [Google Scholar]
  41. Shuai, H.; Li, F. Physics-informed kolmogorov-arnold networks for power system dynamics. arXiv 2024, arXiv:2408.06650. [Google Scholar] [CrossRef]
  42. Dao, M.H.; Liu, F.; Sidorov, D.N. Kolmogorov–Arnold Neural Networks Technique for the State of Charge Estimation for Li-Ion Batteries. Bulletin of the South Ural State University. Ser. Math. Model. Program. Comput. Softw. 2024, 17, 22–31. [Google Scholar]
  43. Sheng, P.; He, Y.; Guo, X. The impact of urbanization on energy consumption and efficiency. Energy Environ. 2017, 28, 673–686. [Google Scholar] [CrossRef]
  44. Mir, A.A.; Alghassab, M.; Ullah, K.; Khan, Z.A.; Lu, Y.; Imran, M. A Review of Electricity Demand Forecasting in Low and Middle Income Countries: The Demand Determinants and Horizons. Sustainability 2020, 12, 5931. [Google Scholar] [CrossRef]
  45. Kavianpour, P.; Kavianpour, M.; Jahani, E.; Ramezani, A. A CNN-BiLSTM model with attention mechanism for earthquake prediction. J. Supercomput. 2023, 17, 19194–19226. [Google Scholar] [CrossRef]
  46. Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
  47. Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Figure 1. Time-series graphs of monthly variables.
Figure 1. Time-series graphs of monthly variables.
Energies 18 01365 g001
Figure 2. Pearson analysis.
Figure 2. Pearson analysis.
Energies 18 01365 g002
Figure 3. Mutual information between factors and nd.
Figure 3. Mutual information between factors and nd.
Energies 18 01365 g003
Figure 4. Network structure diagram.
Figure 4. Network structure diagram.
Energies 18 01365 g004
Figure 5. The experimental framework for the power demand forecasting study.
Figure 5. The experimental framework for the power demand forecasting study.
Energies 18 01365 g005
Figure 6. Radar diagrams of various metrics.
Figure 6. Radar diagrams of various metrics.
Energies 18 01365 g006
Figure 7. Comparison of TCN and TCN-KANs on nd.
Figure 7. Comparison of TCN and TCN-KANs on nd.
Energies 18 01365 g007
Figure 8. Comparison of Transformer and Transformer-KANs on nd.
Figure 8. Comparison of Transformer and Transformer-KANs on nd.
Energies 18 01365 g008
Figure 9. Comparison of BiLSTM and BiLSTM-KANs on nd.
Figure 9. Comparison of BiLSTM and BiLSTM-KANs on nd.
Energies 18 01365 g009
Table 1. Factor types and descriptions and importance.
Table 1. Factor types and descriptions and importance.
FactorsDescriptionPearson CoefficientMutual Information
National demand—ndNational demand is the sum of metered generation1-
Electricity consumption—x2Monthly electricity consumption
from users in the UK
0.980.83
England electricity generation—x3Monthly electricity generation in England0.860.94
Wales electricity generation—x4Monthly electricity generation in Wales0.561.06
Regional Total precipitation—x5Monthly measures of regional total precipitation0.430.58
Regional average temperature—x6Monthly measures of regional average temperature0.400.60
Regional average wind speed—x7Monthly measures of regional average wind speed0.350.42
Regional economy—x8Monthly CPI in a region−0.130.21
Transmission system demand—x9Monthly measures of transmission system demand additional generation required to meet station load, pump storage pumping, and interconnector exports0.110.28
Table 2. Factors missing value ratio.
Table 2. Factors missing value ratio.
FactorsMissing Value Ratio
National demand—nd1.5%
Electricity consumption—x21.3%
England electricity generation—x32.1%
Wales electricity generation—x42.6%
Regional Total precipitation—x54.3%
Regional average temperature—x62.1%
Regional average wind speed—x75.6%
Regional economy—x87.1%
Transmission system demand—x92.4%
Table 3. Various metrics in the Test Set.
Table 3. Various metrics in the Test Set.
ModelRMSEMAE R 2 AIC
LSTM0.0930.0710.8945 −2103.21
GRU0.0980.0830.9092 −2050.70
TCN0.0860.0610.9362−2235.67
TCN_KAN0.0750.0530.9672−2360.72
Transformer0.0950.0680.9136−2187.03
Transformer_KAN0.0830.0590.9435−2286.13
BiLSTM0.0850.0610.9262−2246.35
BiLSTM_KAN0.0830.0590.9513−2286.13
Table 4. Various metrics in the ETTh1 and ETTm1.
Table 4. Various metrics in the ETTh1 and ETTm1.
DatasetModelRMSEMAE R 2
ETTh1TCN0.6940.5780.9162
TCN_KAN0.5630.4530.9672
Transformer0.6820.5680.9136
Transformer_KAN0.5410.4590.9435
BiLSTM0.7930.6610.9262
BiLSTM_KAN0.5510.4590.9513
ETTm1TCN0.6290.5240.9253
TCN_KAN0.4760.4020.9516
Transformer0.5450.4290.9373
Transformer_KAN0.4420.3860.9637
BiLSTM0.5210.4270.9165
BiLSTM_KAN0.4580.3920.9542
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Cui, L.; Yan, W. Integrating Kolmogorov–Arnold Networks with Time Series Prediction Framework in Electricity Demand Forecasting. Energies 2025, 18, 1365. https://doi.org/10.3390/en18061365

AMA Style

Zhang Y, Cui L, Yan W. Integrating Kolmogorov–Arnold Networks with Time Series Prediction Framework in Electricity Demand Forecasting. Energies. 2025; 18(6):1365. https://doi.org/10.3390/en18061365

Chicago/Turabian Style

Zhang, Yuyang, Lei Cui, and Wenqiang Yan. 2025. "Integrating Kolmogorov–Arnold Networks with Time Series Prediction Framework in Electricity Demand Forecasting" Energies 18, no. 6: 1365. https://doi.org/10.3390/en18061365

APA Style

Zhang, Y., Cui, L., & Yan, W. (2025). Integrating Kolmogorov–Arnold Networks with Time Series Prediction Framework in Electricity Demand Forecasting. Energies, 18(6), 1365. https://doi.org/10.3390/en18061365

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop