1. Introduction
For electricity companies, providing accurate load forecasts is crucial to ensuring that the grid meets demand efficiently and reliably. Accurate forecasts make it possible to optimize supply while maintaining a continuous, stable, and economically viable service that avoids system interruptions and overloads. However, utilities face growing challenges in achieving higher accuracy, particularly with the increasing complexity of the grid due to the integration of intermittent renewable sources and varying consumption patterns [
1].
Electricity load forecasting can be classified into four time horizons: very short term (VSLF), covering fractions of a second up to fifteen minutes; short term (STLF), covering hours to days; medium term (MTLF), covering weeks to months; and long term (LTLF), involving annual projections [
1,
2]. Traditionally, most forecasting approaches focus on overall system demand, which is essential for operational functions such as economic dispatch, safety assessment, and maintenance scheduling [
3].
However, with the increasing complexity of power systems, multinodal forecasting that analyzes individual loads in substations, consumption zones, bars, or specific nodes has become equally important [
3,
4]. While global forecasting helps with the systemic balance between supply and demand, the multinodal approach is indispensable for static and dynamic system analysis such as load flow and voltage stability assessment [
4]. Despite the challenges inherent in its greater complexity, this approach enables more effective infrastructure management on a regional scale [
4,
5]. Both global and multinodal forecasting, however, face challenges arising from the nonlinear and nonstationary nature of load data, which are influenced by factors such as extreme weather conditions, seasonality, and socioeconomic events.
Faced with these complexities, traditional linear models such as ARIMA and linear regression often prove inadequate and unable to capture the required dynamic and nonlinear relations [
6,
7]. To address these limitations, a wide range of advanced data-driven methods, particularly those based on machine learning and Artificial Neural Networks (ANN), have emerged as superior alternatives for modeling these intricate relationships while offering greater precision in real-world scenarios.
A study by [
8] concluded that while traditional statistical and machine learning methods have their strengths, deep learning techniques are more suitable for the dynamic and nonlinear characteristics of electricity load data. Studies on the Greek electricity system have highlighted that even simple feedforward ANNs can achieve a decent level of predictive performance, though their results heavily depend on the careful selection and quality assurance of the input data [
9].
The literature reveals a variety of methods for predicting electrical loads, including both classic and innovative approaches. For instance, ref. [
10] obtained promising results by combining Autoregressive Integrated Moving Average (ARIMA) with ANNs for short-term electrical load forecasting, while [
11] conducted a comprehensive evaluation of various techniques based on artificial intelligence and time series analysis. Ref. [
12] demonstrated the importance of feature extraction and influencing factor analysis, showing that data mining improved results when combined with regression approaches such as Random Forest.
In recent years, more sophisticated approaches have been developed. Ref. [
13] proposed a Bidirectional Long Short-Term Memory (BiLSTM) architecture that reduced Mean Absolute Error (MAE) by 25% and 46% compared to unidirectional Recurrent Neural Networks (RNNs) and Long Short-Term Memory networks (LSTMs) in 24-h forecasts. Similarly, ref. [
14] further advanced the use of BiLSTM by incorporating stochastic weight averaging to improve generalization in substation-level forecasting.
Ref. [
15] integrated a deep neural network with time series analysis and feature selection, achieving an MAPE of 3.78% and RMSE of 432.4, which outperformed ARIMA, Random Forest, and Support Vector Regression. Ref. [
16] employed Temporal Convolutional Networks (TCNs) within a hierarchical framework that was resilient against missing or erroneous data, achieving reliable short-term forecasts for residential loads. At the distribution level, ref. [
17] introduced a lean forecasting approach combining clustering, neural networks, and Monte Carlo methods, while [
18] proposed a step-by-step framework that explicitly incorporates electric vehicle charging stations into load forecasting.
To further overcome existing limitations, a wide range of sophisticated and hybrid models has been explored. For example, ref. [
19] highlighted that while conventional models such as Autoregressive Integrated Moving Average (ARIMA) have their place, more complex approaches like Fuzzy Time Series (FTS) outperform them due to their ability to handle nonlinear and uncertain data at low computational cost. Other hybrid approaches show similar promise: Ref. [
20] proposed a method for multi-energy loads using a Radial Basis Function–AutoRegressive with eXogenous inputs (RBF-ARX) model for deterministic components and a Gaussian Mixture Model (GMM) for stochastic ones, improving forecast quality by up to 50%. Likewise, ref. [
21] combined an Autoencoder (AE) for dimensionality reduction with a Radial Basis Function Neural Network (RBFNN) for load forecasting, demonstrating better accuracy and efficiency compared to a standalone RBFNN.
Additionally, ref. [
22] compared the performance of ANN and Autoregressive with eXogenous inputs (ARX) models for net demand forecasting in distribution substations, concluding that the ANN model was better for direct forecasting due to the nonlinear nature of demand. Hybrid models have proven valuable in other domains as well, as seen in the work of [
23], who used a Particle Filtering–ARIMA (PF-ARIMA) fusion model to predict battery useful life, achieving a 70% improvement over other methods by handling data specificities such as capacity regeneration.
A particularly relevant advance was presented by [
24] with the Fuzzy-ARTMAP (FAM-ANN) continuous learning network, specifically developed for smart grids. This innovative solution proved capable of incorporating new data without requiring complete retraining, achieving a MAPE of ≈2% (60% better than the static version) and maintaining a consistent performance of 2.32% MAPE over 12 months, thereby combining precision, computational efficiency, and remarkable adaptability to dynamic scenarios.
Recently, multinodal forecasts have gained special prominence for their ability to integrate global planning with detailed local analysis. In this scenario, contributions include that of [
25], who proposed a single-stage forecasting method. This approach simultaneously considers scenarios with and without the load contribution factor, offering a more efficient solution compared to conventional two-stage methods.
While traditional approaches require first predicting the global load and then predicting the local loads, the proposed method significantly simplifies computational complexity by performing these steps in an integrated manner while also demonstrating improvements in the precision of the results. Another notable advance was presented by [
26], who introduced a methodology for forecasting consumption at all system nodes. Their method was based on the direct determination of specific rating coefficients for each voltage level, providing a comprehensive approach that encompasses everything from substations to supply zones and other critical components of the energy infrastructure. These advances in multinodal techniques pave the way for practical applications in real systems, such as the one presented in this paper.
However, it is crucial to note that the performance of these models depends fundamentally on the proper selection and optimization of their hyperparameters, such as learning rates and the number of neurons. An inadequate configuration can lead to problems of underfitting or overfitting, compromising the generalization of the model [
27,
28]. In this context, parameter optimization has proven to be a crucial strategy for improving the precision of load forecasting models. Techniques such as grid search, evolutionary algorithms, and Bayesian optimization make it possible to efficiently explore the parameter space in order to identify configurations that maximize predictive performance [
29]. This importance is corroborated by work such as that of [
30], which showed how hyperparameter optimization is decisive in reducing errors in short-term forecasting models for urban smart grids. Similarly, ref. [
31] obtained significant performance gains by applying optimization techniques to a fuzzy ARTMAP neural network for forecasting at disaggregated levels, combining it with singular spectrum analysis for noise treatment.
Despite the advances achieved with machine learning, deep learning, and hybrid approaches, several challenges remain. Many models, including deep learning architectures such as LSTM and transformer models, are highly sensitive to hyperparameter configurations, prone to overfitting, and often require large datasets and long training times. In multinodal forecasting, where the complexity is higher and data availability may be limited, systematic strategies for parameter optimization and data enrichment are still underexplored. These gaps open space for approaches such as Fuzzy ARTMAP, derived from Adaptive Resonance Theory (ART), which offers resistance to outliers, incremental learning without catastrophic forgetting, reduced dependence on large datasets, and fast convergence [
32]. These characteristics make ARTMAP a suitable alternative for scenarios that demand adaptability, accuracy, and efficiency.
This study applies and expands the methodology proposed by [
33] for short-term (24 h ahead) multinodal electricity load forecasting, using data from nine substations of a system in New Zealand. The approach focuses on implementing the fuzzy ARTMAP neural network [
34], emphasizing an exhaustive search for optimal parameters, detailed analysis of parameter effects, and enrichment of input data with descriptive statistics (maximum, minimum, mean, and standard deviation) calculated from the participation factors in the periods
,
, and
. Its performance is compared with well-known benchmark models, including Multilayer Perceptron (MLP), Random Forest, and Support Vector Regression (SVR), to evaluate the method’s effectiveness and investigate how network configurations and input variables influence the accuracy of short-term electricity demand forecasts.
The remainder of this article is organized as follows:
Section 2 outlines the key contributions of this study to the field of electrical load forecasting;
Section 3 describes the methodology, including the dataset, the fundamentals of the Fuzzy ARTMAP network, the strategies for global and multinodal load forecasting, the evaluation criteria, and the windowing techniques applied;
Section 4 presents and discusses the results, offering a comparative analysis of the performance of different techniques and a detailed investigation of the effect of network parameters on the forecasts; finally,
Section 5 summarizes the study’s conclusions and suggests future research directions.
3. Load Forecasting
This section details the methodology used for multinodal electricity load forecasting. It starts with a description of the dataset and then presents the Fuzzy ARTMAP neural network. This section also explains how global and multinodal forecasts are made, including the mathematical formulas for the participation factor. Finally, the criteria for comparing the results are presented.
3.1. Dataset
The database was supplied by the Electricity Commission of New Zealand and collected from the Centralized Dataset (CDS) from January 2007 to March 2009. It contains active power values measured every half an hour at each substation along with indicators for the day of the month, month, year, day of the week, holidays, time, and load sample value. The substations studied are in the Waikato region, located in the northern part of New Zealand’s North Island. The substations considered were Kopu, Waikino, Waihou, Hamilton 11, Hamilton 33, Cambridge, Te Awamutu, Hinuera, and Kinleith.
For the training and validation stages of the Fuzzy ARTMAP network, a specific subset of this data was used covering the period from 8 December 2007 to 7 January 2008, totaling 1488 samples. The hyperparameter optimization for all models was performed on data from 8 December 2007 to 6 January 2008, with the objective of minimizing the error on the final day, 7 January 2008. The forecast period consisted of 48 load values (half-hourly) for a 24-h day (8 January 2008).
Figure 1 highlights various substations, distribution lines, and power stations on the map of the power transmission system in the north of New Zealand’s North Island.
3.2. Fuzzy ARTMAP
The Fuzzy ARTMAP supervised learning network [
34] consists of two Adaptive Resonance Theory modules, ART
a and ART
b. These modules perform calculations based on fuzzy set theory [
36], specifically using the AND (∧) as defined by (
1).
Figure 2 illustrates the ART modules of the network. The input data and desired outputs are entered into the ART
a and ART
b modules, respectively; the inter-ART associative memory module is responsible for checking the correspondence between the input patterns and the expected output categories. A critical component of the network is the Match-Tracking used by the inter-ART module, which dynamically adjusts the vigilance parameter to achieve an optimal balance. This minimizes the prediction error while maximizing the model’s overall ability to handle a variety of input patterns while at the same time producing accurate results even for unseen data. Prior to training, the weight matrices associated with the ART
a, ART
b, and inter-ART modules are initialized with values of 1, indicating that no categories are active in the network. During the training process, as the network is exposed to the input data, resonance occurs between the input patterns and the predicted output categories, leading to the activation of the corresponding categories in the network. In turn, this triggers the updating of these weight matrices to capture the relationships learned between the input patterns and the output categories. Algorithm 1 shows the pseudocode for the Fuzzy ARTMAP network.
The input data for the global forecasting model is composed of a set of 11 binary bits related to the exogenous data plus three global load values from previous time steps, resulting in a 14-dimensional input space. For each substation, the input of the Fuzzy ARTMAP network (ARTa) is composed of the global load at the current time (h) and the participation factors from three past time steps, resulting in a four-dimensional input space. It should be noted that the raw data are not preprocessed or normalized externally, as the network performs L1 normalization internally.
The Fuzzy ARTMAP network’s performance is fundamentally influenced by the vigilance parameters and and the learning rate . In this study, we perform an extensive hyperparameter search on these critical parameters, while other parameters such as , , and are kept fixed. The search ranges were defined as follows: from 0.90 to 0.99 (increment of 0.01), from 0.980 to 0.999 (increment of 0.001), and from 0.02 to 1.00 (increment of 0.01).
The vigilance parameter () acts as a similarity constraint for category formation. A higher value enforces stricter similarity, leading to the creation of more categories (clusters). This is particularly relevant for , which clusters the network’s output. A large number of clusters results in a narrower range of values per cluster; conversely, a low generates fewer clusters, grouping a wider range of output values. This can lead to a “discretization” or “straight-line” effect on the forecast curve, as diverse output values are mapped to a single oversimplified category. The parameter, or learning rate, is a constant that indicates the extent to which new patterns influence a category’s weight update. The parameter, or training rate, controls the speed of learning, with a value of 1 implying faster training.
The inter-ART vigilance parameter acts as a match-tracking mechanism between the ARTa (input) and ARTb (output) modules. Its primary function is to ensure consistency between the input pattern and the expected output category. Because its role is to measure the compatibility between categories rather than to fine-tune the forecast precision itself, we considered its optimization to be non-critical for this study, and consequently kept its value fixed.
| Algorithm 1 Pseudocode for training the Fuzzy ARTMAP neural network. |
Input: a: Input Data; b: Desired output; : Learning Rate (); : Training Parameter (); : Vigilance Parameter Module A (); : Vigilance Parameter Module B (); : Vigilance Parameter Inter-ART Module (); : Increase. Output: - 1:
procedure - 2:
- 3:
- 4:
- 5:
while Data for training exists do - 6:
for all clusters do - 7:
- 8:
- 9:
while flag = True do - 10:
- 11:
if then - 12:
- 13:
- 14:
- 15:
else - 16:
- 17:
for all clusters do - 18:
- 19:
- 20:
while flag = True do - 21:
- 22:
if then - 23:
if then - 24:
- 25:
- 26:
- 27:
else - 28:
- 29:
- 30:
|
3.3. Global Load Forecasting
To forecast the multinodal load, it is necessary to obtain the global (forecast) load; in turn, this is used to calculate the participation factor provided by Equation (
2), which is fundamental to multinodal forecasting. Global forecasting is carried out by a system called the global load forecasting system, which is based on the Fuzzy ARTMAP neural network [
33]. The input data set of the
module in this system contains exogenous data such as time of day, day of the week, month, holidays, daylight savings time, and global load (
) at three previous time instants (
,
, and
). This representation of the global load includes the “sliding window” technique proposed by [
37] and aims to identify the behavior of the load profile. The forecasting result consists of the
at time
t, which is provided by the output of the
module of the Fuzzy ARTMAP network. The exogenous variables are described in detail in
Table 1, and are composed of 11 binary bits related to the exogenous data plus the three global load values from previous time steps, resulting in a 14-dimensional input space.
Figure 3 details the global load forecasting strategy with the Fuzzy ARTMAP network.
3.4. Multinodal Load Forecasting
The multinodal load forecasting system (
) is based on the Fuzzy ARTMAP neural network. Forecasting in this system is carried out in modules, where each module contains a neural network responsible for predicting the load at individual locations such as substations, transformers, and feeders. To perform substation load forecasting, the multinodal load forecasting system uses the global participation factor (
), defined as the percentage of each substation’s load within the total system load, as specified in (
2) [
37,
38]:
in which:
is the multinodal load associated with substation j at time t.
is the global load, which corresponds to the addition of the loads of each substation at time t.
For each substation, the load forecast is obtained by the multinodal forecast. The input dataset for the Fuzzy ARTMAP network for each substation
j consists of the global load at time
t,
, and the participation factors of substation
j at times
,
, and
. The network output in the
system corresponds to the predicted participation factor at time
t. The flowchart of the multinodal forecasting system is shown in
Figure 4.
Within this system, statistical features derived from the global participation factors () at times , , and were evaluated as potential input extensions to the network. The considered statistics included the mean, standard deviation, maximum, and minimum values, which were assessed individually in order to determine their impact on forecasting performance.
These statistical features are part of a feature engineering strategy. Feature engineering aims to transform raw data into informative input variables that help the model to capture relevant patterns. By deriving statistics from the recent participation factors, each feature provides a distinct perspective on the substation’s recent load behavior, such as its central tendency, variability, or extreme values [
39,
40,
41]. Evaluating these features separately allows the most effective input to be identified for each substation and forecasting horizon.
From a theoretical perspective, each derived feature adds a new dimension to the input space. For example, using only the mean of the past three
values extends the original three-dimensional lag space by one dimension. This dimensionality increase provides the Fuzzy ARTMAP network with richer information, enhancing its ability to separate patterns corresponding to different load behaviors and improving generalization [
42,
43].
The mean is obtained as described by (
3):
in which:
(considering the values at , , and ).
is the global participation factor of substation j at time .
The standard deviation, which measures the variability of the data around the mean, is calculated by (
4).
The maximum and minimum values are determined by Equations (
5) and (
6).
By evaluating each statistic individually, the most informative feature can be selected for each substation, thereby optimizing the network’s predictive performance. Empirical studies indicate that networks using feature-augmented inputs outperform those trained solely on raw lagged values, particularly in capturing trends, variability, and extreme events.
After the participation factor has been predicted at time
t, the substation load
is obtained by multiplying the predicted participation factor
by the predicted global load
(
7) [
33,
37]:
3.5. Benchmark Models and Hyperparameter Optimization
To validate the effectiveness of our proposed Fuzzy ARTMAP methodology, we compared its performance with three widely used machine learning models implemented in the Scikit-learn library in Python [
44]: Multi-layer Perceptron (MLP), Random Forest, and Support Vector Regression (SVR). For a fair comparison, each model went through a hyperparameter optimization process to find its best configuration, using the same training and validation period as Fuzzy ARTMAP. The optimization used a grid search, with MAPE as the main criterion for selecting the best model.
These models were chosen because they represent different approaches to time series forecasting. MLP is a type of artificial neural network that captures nonlinear relationships by learning patterns across multiple layers of neurons, making it a stronger alternative to linear models [
45,
46]. Random Forest combines many decision trees built from random subsets of the data, which lowers the risk of overfitting and produces more stable predictions [
47]. SVR, an extension of Support Vector Machines, models nonlinear relationships through kernels, is less sensitive to outliers, and balances model complexity with generalization by defining a margin of tolerance around the regression function [
48].
The hyperparameter search spaces for each model were defined as follows:
MLP
- -
Hidden layer sizes: (5), (10), (5, 5), (10, 5), (5, 10), (20), (20, 15), (20, 30), (100), (200), (100, 50)
- -
Maximum number of iterations: 1500
- -
Activation function: ‘logistic’, ‘tanh’, ‘relu’
- -
Optimization method (solver):
- ∗
‘LBFGS’ (Limited-memory Broyden–Fletcher–Goldfarb–Shanno)
- ∗
‘SGD’ (Stochastic Gradient Descent)
- ∗
‘ADAM’ (Adaptive Moment Estimation)
Random Forest
- -
Number of trees (estimators): 50, 100, 200
- -
Maximum depth: 2, 5, 8, 10, 15, 20, 25, 30, None
- -
Minimum samples required to split: 2, 5, 10
- -
Splitting criterion: ‘squared_error’, ‘absolute_error’, ‘friedman_mse’, ‘poisson’
SVR
- -
Kernel type: ‘RBF’, ‘poly’, ‘sigmoid’
- -
Regularization parameter (C): 1, 10, 100
- -
Tolerance margin (epsilon): 0.01, 0.1, 0.5
3.6. Criteria for Comparing Results
The Mean Absolute Percentage Error (MAPE) is a metric that is commonly used to assess the precision of forecasting models. It is calculated by taking the mean of the absolute percentage errors between the predicted values and the actual values. The MAPE formula is provided by (
8):
in which:
represents the actual value of the participation factor at time t.
represents the predicted value of the participation factor at time t.
n is the number of hours, with a reading every half hour.
MAPE is widely used due to its interpretability; it expresses the error as a percentage, which makes it easier to understand the model’s performance [
49]. Smaller MAPE values indicate more precise forecasts. The use of MAPE is particularly well suited for this study, as it provides a normalized error measure across different substations and load profiles. This allows for a direct comparison of forecasting quality between substations with varying load levels, since the metric is not biased by the scale of the load. Lower MAPE values mean better forecasts.
4. Load Forecast Result
This section presents and discusses the results obtained from the multinodal electricity load forecasting. It begins with a detailed analysis of the MAPE values for the nine substations, comparing the performance of different windowing techniques, followed by an in-depth look at the behavior of the Fuzzy ARTMAP parameters and their impact on forecasting precision.
4.1. Results
The results presented in
Table 2 provide a detailed analysis of the MAPE values for nine substations, using windowing methods that consider the maximum, minimum, mean, and standard deviation values of the samples calculated considering the participation factor at times
,
, and
.
Analysis of the results reveals different behavior between the substations. The Kopu substation showed relatively stable performance in terms of MAPE, with values of 10.35% (maximum), 9.68% (minimum), and 9.64% (mean). On the other hand, the standard deviation introduced a significant improvement, reducing the MAPE to 4.79%. This indicates a considerable improvement in the forecast.
Figure 5 shows the load curve, where it can be seen that the predicted values follow the actual load well throughout the day. The range around the forecast curve is relatively narrow, indicating that the variability within the cluster of the forecast values is low.
Waikino, on the other hand, revealed a trend of continuous improvement in forecasts when using different windowing techniques. The MAPE showed a significant reduction when using the minimum value (5.24%) compared to the maximum value (6.93%) and the mean (7.11%). In addition, using the standard deviation resulted in an MAPE of 5.00%, the lowest value observed for Waikino.
Figure 6 shows the forecast and actual load curves as well as the range obtained by the standard deviation within the cluster of forecasts. This range is slightly wider than that observed in Kopu, suggesting greater variability in the predicted values. However, the predictions still align well with the actual load, demonstrating the model’s effectiveness even in the face of this variability.
For Waihou, the MAPE results varied from 3.99% (minimum) to 3.68% (mean). This variation suggests that the different windowing techniques do not have a significant impact on the precision of the forecasts for this substation. The mean showed the best result, with an MAPE of 3.68%. The forecast curve in the
Figure 7 closely follows the actual load and the range around the forecast is relatively narrow, indicating that the model captures load fluctuations well and has low variability in the forecast clusters.
Hamilton 11 showed a moderate variation in MAPE values, with a maximum difference of 0.57% between the different methods. The best result was obtained using the mean (3.68%), followed by the minimum (3.79%). The moderate variation suggests that all the windowing techniques offer relatively accurate prediction, although the mean was able to better capture the trend of the data in this substation. As shown in
Figure 8, the confidence interval around the forecasts is narrow, indicating that the forecast is quite accurate and that the variability in the forecasts is low.
For Hamilton 33, all MAPE values were lower than 4.00%, with the best result obtained by the mean (3.21%). The corresponding prediction curve can be seen in
Figure 9. The consistency of the results suggests that windowing techniques are highly effective for this substation. In particular, inclusion of the mean seems to capture important characteristics that contribute to more precise predictions.
Cambridge showed a smooth and constant variation in MAPE values, from 3.16% (maximum) to 2.72% (minimum), with the minimum providing the best result, as shown in
Figure 10. It is worth noting that the minimum in Cambridge was not only the best result among all the substations but also had the lowest MAPE among the four techniques used, showing uniform and accurate performance. The forecast curve followed the actual curve well, with a slight divergence observed between 5:30 and 8:30. The interval around the forecast is quite narrow as a result of the low variability of the clusters.
Te Awamutu showed MAPE values of less than 4.5%, with the best result obtained with the minimum value (3.07%) and the highest with the maximum (4.34%). The forecast curve shown in
Figure 11 shows predicted behavior very close to the real values. In addition, the forecast showed no variability within the clusters, suggesting that for this substation there were no significant differences between the load values within the clusters.
Hinuera maintained a similar variation in MAPE values, between 3.39% and 3.60%, showing consistent performance across the different windowing techniques. This performance reveals that all the techniques offer relatively accurate predictions, with the standard deviation being slightly more effective in this substation. The range around the prediction is slightly wider, as shown in
Figure 12, indicating more variability in the predicted values; however, the predictions still follow the actual load well.
Finally, Kinleith showed an improvement in MAPE with the standard deviation (5.15%), standing out from the other techniques that kept MAPE constant at 5.40%. The prediction curve can be seen in
Figure 13. It shows greater variability in the interval around the prediction curve, indicating that there was a greater variety of values within the clusters.
In general, substations such as Kopu, Waikino and Kinleith stand out in the forecasts when using the standard deviation. On the other hand, Waihou, Hamilton 11, and Hamilton 33 showed better results when using the minimum value. Other substations such as Cambridge, Te Awamutu, and Hinuera also showed improvements with different strategies. The standard deviation and minimum value proved to be effective techniques in most cases, with each offering its own advantages for predicting electrical load. The combination of parameters that produced the best results can be analyzed in
Table 3.
When analyzing the table with the configuration of parameters that obtained the best results for each substation, we observed a relationship between these parameters and those substations that showed the greatest variability in the intervals around the prediction curve. The Kopu, Waikino, Hamilton 11, and Kinleith substations showed the greatest variations. This is directly related to the parameter, which controls the required similarity for a given piece of data to belong to the cluster. Lower values of relax this restriction, allowing slightly different values to be included. On the other hand, values closer to 1 restrict similarity so that relatively close data does not fit into the cluster, which can result in the creation of a new cluster to house it. The Kinleith substation, although it had a wider range with close to 1, had high variability due to the parameter responsible for the learning rate being close to zero. This reduced the creation of new categories, increasing the variability within the existing clusters.
It can be seen that the Waihou, Hamilton 33, Cambridge, and Te Awamutu substations have high values for , , and , which is directly related to greater variability and standard deviation within the clusters formed by the prediction labels that represent the actual load. These parameters indicate greater rigidity in the formation of these clusters, ensuring that only values with high similarity are grouped together. On the other hand, substations such as Waikino, Hamilton 11, and Kinleith have lower and values and show greater variability, as reflected in larger shaded areas around the prediction curves.
The results show notable differences in forecasting performance across the nine substations, reflecting the unique characteristics of each substation’s load profile. Substations such as Kopu and Kinleith benefited most from including the standard deviation as an input, suggesting more volatile or irregular load patterns; in contrast, Waihou and Hamilton 33 performed best when the mean was used, indicating more stable and predictable consumption. This demonstrates that the Fuzzy ARTMAP model adapts to the most informative statistical feature for each case, thereby maximizing prediction accuracy. A single forecasting strategy does not fit all substations in a multinodal system; tailoring the approach to each substation’s characteristics is necessary to achieve the best results.
4.2. Computational Performance and Method Practicality
The computational performance of the Fuzzy ARTMAP method was evaluated during the forecasting phase to assess its practical applicability.
Figure 14 shows the prediction time per sample (measured in microseconds,
s) for each substation and each statistical windowing technique.
The results indicate that the proposed method is highly efficient. For most substations and windowing approaches, the prediction time per sample remains below 1 s. The mean, maximum, and minimum techniques consistently show the lowest computational cost across all substations.
The standard deviation approach, which often yields the most accurate forecasts, requires slightly more computation. Notable increases in prediction time occur at the Waihou and Te Awamutu substations, reaching approximately 4 s and 2 s per sample, respectively. Despite these peaks, the times remain in the microsecond range, supporting real-time application.
This efficiency highlights an advantage of the Fuzzy ARTMAP architecture. Unlike complex deep learning models, which demand extended training and specialized hardware, this method provides fast predictions with minimal computational resources. The combination of rapid training and prediction times makes the Fuzzy ARTMAP model suitable for scalable real-time load forecasting in dynamic smart grid environments.
4.3. Analysis of Parameters
Analysis of the parameters of the Fuzzy ARTMAP network applied to predicting electrical loads in substations is crucial to optimizing its performance, since it is an architecture that is sensitive to changes in parameters. To adjust the parameters, was varied from 0.02 to 1.00 with an increment of 0.02, of the ARTa module from 0.91 to 0.99 with an increment of 0.02, and of the ARTb module from 0.98 to 0.999 with an increment of 0.001. This analysis was applied to all substations, taking into account the expansion of the input data dimension with the maximum, minimum, mean, and standard deviation values of the load participation factor at times , , and . The discussion on the behavior of the parameters is reported for the Kopu substation; similar behavior was observed for the other substations, especially in relation to the parameters , , , and the number of categories in the ARTa and ARTb modules of Fuzzy ARTMAP.
Figure 15 illustrates the three-dimensional relationship between the parameters
,
, and the number of categories (neurons) created in module A of the Fuzzy ARTMAP network. In
Figure 15a, it can be observed that high values of
combined with high values of
result in a greater number of categories. This combination of parameters makes the network more prone to creating multiple categories due to the greater restriction imposed by
, which prevents new data from being allocated to existing clusters, necessitating the creation of new categories to accommodate them whenever the similarity does not reach the stipulated threshold (
). In contrast, although the
parameter allows for fast learning without many training cycles when
, it prevents the old information contained in the clusters from remaining. Instead, this information is replaced by the current information, leading to the creation of more categories. Similar behavior is observed for
Figure 15b–d, which illustrates the same relationship with the insertion of the minimum, mean, and standard deviation of the three previous loads.
Figure 16 follows a similar structure to
Figure 15 but focuses on module B of the Fuzzy ARTMAP network. In
Figure 16a, it can be observed that high values of
combined with high values of
favor the creation of a greater number of categories, similar to what was observed in
Figure 16a, but generating a smaller number of clusters because these clusters are responsible for housing the network’s output information.
The interaction between the parameters
,
, and the Mean Absolute Percentage Error (MAPE) can be seen in
Figure 17 when inserting the maximum, minimum, mean, and standard deviation of the three previous loads into the input data. This makes it easier to understand which values of
and
produced the best results.
In
Figure 17a, the lowest MAPE values are observed for combinations of high
values, while
shows more flexibility, oscillating within the
range. This relationship between low MAPE values and high
values indicates that the network needs to create a considerable number of categories in module B during training in order to obtain good results. This allows the network to predict results more accurately when it comes to forecasting. However, the value of
does not seem to have a significant impact on the result, unlike its influence on the creation of clusters reported in
Figure 16.
Similar behavior is observed when the other statistics are used to expand the input dimension of the data, as shown in
Figure 17b–d. In addition, it can be seen that for
Figure 17d there is a greater fluctuation of the MAPE for the walls, causing more irregularities on the surface compared to the others. This unevenness indicates that the model’s response is more unstable when the standard deviation is used as one of the inputs, reflecting greater sensitivity to variations in the data. The more uneven surface also reveals that small adjustments to the
and
parameters can lead to significant changes in MAPE.
Despite this irregularity, the standard deviation produced the best results for most substations, as shown in
Table 2. This is because the standard deviation captures the variability of the data more effectively, allowing the model to better adjust to fluctuations and make more precise predictions. This highlights the importance of selecting the appropriate statistical feature for each unique dataset, which is a key contribution of this work.
Figure 18 explores the relationship between
,
, and MAPE. Analyzing
Figure 18a,b,d, it can be seen that the highest error rates occur in regions where
and
are high; in other words, a low constraint in module A (low
) combined with a high learning rate (high
) can lead to greater imprecision in predictions.
However, a different behavior is observed when using the standard deviation to scale the input data, as shown in
Figure 18d, where it can be seen that the lowest prediction errors are achieved when
is medium or low and
oscillates throughout the range, with values close to 0.6 obtaining the lowest MAPE.
Finally,
Figure 19 shows the relationship between the parameters
,
, and MAPE.
Figure 19a–c illustrates that the highest error rates occur when both
and
are low. In other words, relaxing the similarity of modules A and B can lead to greater inaccuracy in predictions. On the other hand,
Figure 19d reveals that by including the standard deviation, the behavior of
is modified. Higher values of
tend to reduce the model’s performance, which means that a greater restriction on
(resulting in a greater number of categories) compromises the precision of the forecasts when the standard deviation is taken into account.
4.4. Global Parameter Optimization Analysis
A global optimization was performed to evaluate whether a single set of parameters could provide satisfactory performance across all nine substations. The results are shown in
Table 4. Using the standard deviation as the windowing feature yielded the lowest global MAPE of 9.98%.
The single best global parameter set optimized across all substations resulted in an MAPE of 14.85%. This is considerably higher than the substation-specific results, where some values reached 2.72% (
Table 2). This outcome confirms that the load characteristics varied significantly between substations.
Although the global approach is less computationally demanding and may be suitable for large-scale grids, local optimization for each substation provides higher forecasting accuracy.
4.5. Comparison with Other Models
The Fuzzy ARTMAP model was evaluated against three established machine learning techniques: Multi-Layer Perceptron (MLP), Support Vector Regression (SVR), and Random Forest (RF). All models were applied to the same dataset using identical statistical windowing techniques, with performance assessed by MAPE as reported in
Table 5.
Fuzzy ARTMAP achieved the lowest error in several substations. For example, at Waihou and Hamilton 11, the mean windowing method yielded MAPEs of 3.68%, outperforming all other models. Cambridge and Te Awamutu achieved the best results with the minimum windowing method, reaching 2.72% and 3.07%, respectively. For Kopu and Waikino, the standard deviation windowing method provided the lowest errors.
These characteristics make Fuzzy ARTMAP a reliable alternative for multinodal electrical load forecasting, consistently providing accurate predictions across substations with diverse load profiles. Its ability to adjust to different statistical features allows it to maintain high precision while remaining computationally efficient, confirming its suitability for real-time forecasting in multinodal power systems.
4.6. Advantages and Limitations of the Study
The methodology presented in this paper combines data enrichment with the Fuzzy ARTMAP network. Using statistical features such as the mean, standard deviation, maximum, and minimum to expand the input data allows the model to adjust to different load profiles, improving the precision of predictions. This feature enables more stable loads to benefit from the mean, while more volatile loads benefit from the standard deviation. These results demonstrate the method’s adaptability to various conditions. Comparisons with other techniques such as MLP, SVR, and Random Forest show that Fuzzy ARTMAP achieves smaller errors in several substations.
Another clear advantage is its computational efficiency. The model makes predictions in microseconds without requiring specialized hardware or long training periods, which ensures its viability in real-time applications and scenarios with limited computational resources.
The flexibility gained from using statistics to enrich the data suggests that the proposed approach can be applied to other types of time series beyond electrical load data. There is potential for use in predicting vibrations in mechanical systems, as part of an equipment’s useful life analysis, or in other data series such as network traffic or financial indicators. The ability to recognize patterns and group information in an adaptive way allows the proposed method to be explored in different domains with internal variability and dynamic behavior.
Despite its advantages, this study has limitations that open the door for new work. The dataset we used is from 2010, a period before the widespread adoption of renewable energy and smart meters, which have altered consumption patterns. This means that application of the proposed method in modern grids still needs validation. In addition, the work was performed on a limited number of substations, which raises the challenge of scalability. For larger networks, optimization by exhaustive search becomes unfeasible. An exploration of distributed computing or other more efficient adjustment methods to process data in parallel could be a solution.
Another point is that the model’s performance on abnormal days such as holidays or under extreme weather conditions was not evaluated. These events generate load patterns that deviate from regular behavior. An analysis in these scenarios would be important to verifying the model’s reliability. It is also relevant to note that while using multiple statistics enriches the data, it also increases the complexity of the input set. Future work could investigate which combinations of statistics are the most relevant for different load types, thereby reducing redundancies and maintaining efficiency.
Finally, integration with probabilistic methods which can provide confidence intervals for predictions would be a valuable enhancement. Expanding the study to larger networks or other time series domains would help to evaluate the generality of the proposed approach as well as the adjustments needed for different contexts.
5. Conclusions
Precise forecasting of electricity demand is an essential tool for the efficient and sustainable management of today’s energy systems. This study applied the Fuzzy ARTMAP network to multinodal short-term electricity load forecasting at nine substations in New Zealand, presenting an approach that includes exhaustive parameter analysis and search along with the incorporation of statistics such as maximum, minimum, mean, and standard deviation.
The results showed that standard deviation was the most effective among the windowing techniques we explored in improving the precision of the forecasts, resulting in a reduction in Mean Absolute Percentage Error (MAPE) for most of the substations. In addition, our analysis of the Fuzzy ARTMAP parameters highlighted that the combination of the and parameters has a significant impact on the creation of categories, and consequently on the precision of forecasts.
The shaded area around the load curve, which represents the variability of the data and the uncertainty of the forecasts, was particularly impacted by the configuration of the parameters. Substations with higher values of , , and , such as Waihou, Hamilton 33, Cambridge, and Te Awamutu, showed a significant reduction in the shaded area of the load curve, indicating greater stability in the demand forecast. On the other hand, stations with lower parameters, such as Waikino, Hamilton 11, and Kinleith, showed greater dispersion, reflecting a larger shaded area and less accurate forecasting.
This information indicates that fine adjustments to parameter settings, especially those such as that are responsible for checking the similarity of data within clusters, have a direct effect on the model’s ability. When and are lower, this allows for greater variation in the data, resulting in greater imprecision and a larger shaded area.
To assess its scalability and robustness of this methodology in different contexts, future work could explore its application to other data types, such as demand from industrial consumers or individual households. Furthermore, a comparative analysis with other advanced machine learning models such as transformers or hybrid models could be conducted to more comprehensively evaluate the performance of Fuzzy ARTMAP.