Analysis and Optimization of Fuzzy ARTMAP Parameters for Multinodal Electric Load Forecasting

Moreira Júnior, Joaquim Ribeiro; da Silva, Reginaldo José; dos Santos Júnior, Carlos Roberto; Abreu, Thays; Lopes, Mara Lúcia Martins

doi:10.3390/en19010192

Open AccessArticle

Analysis and Optimization of Fuzzy ARTMAP Parameters for Multinodal Electric Load Forecasting

by

Joaquim Ribeiro Moreira Júnior

^1,*

,

Reginaldo José da Silva

¹

,

Carlos Roberto dos Santos Júnior

²

,

Thays Abreu

³

and

Mara Lúcia Martins Lopes

³

¹

Electrical Engineering Department, School of Engineering, São Paulo State University (UNESP), Ilha Solteira 15385-000, Brazil

²

Federal Institute of Mato Grosso do Sul (IFMS), Campus Três Lagoas, Três Lagoas 79641-162, Brazil

³

Department of Mathematics, School of Engineering, São Paulo State University (UNESP), Ilha Solteira 15385-000, Brazil

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(1), 192; https://doi.org/10.3390/en19010192 (registering DOI)

Submission received: 24 August 2025 / Revised: 29 September 2025 / Accepted: 12 October 2025 / Published: 30 December 2025

(This article belongs to the Section F: Electrical Engineering)

Download

Browse Figures

Versions Notes

Abstract

Accurate electrical load forecasting is fundamental to the efficient operation of energy systems and plays a decisive role in both generation planning and the prevention of supply interruptions. Anticipating demand with precision enables energy generation and distribution to be adjusted effectively, reducing risks for both industrial and residential consumers. However, forecasting is challenged by climatic variations, demographic changes, and evolving consumption patterns, which limit the effectiveness of traditional approaches. Advanced machine learning techniques such as artificial neural networks have demonstrated potential to address these challenges, although their performance depends strongly on hyperparameter optimization. This study applies a multinodal forecasting methodology based on the Fuzzy ARTMAP network to predict short-term electricity demand at nine substations in New Zealand. The method involves an exhaustive search for network parameters, particularly the vigilance parameters

ρ_{a}

and

ρ_{b}

and the learning rate

β

, which are critical to model performance. The input data were extended with statistical measures—maximum, minimum, mean, and standard deviation—to evaluate their contribution to forecast accuracy. The results showed that the standard deviation provided the most consistent improvements among the windowing techniques, reducing the Mean Absolute Percentage Error (MAPE) in most substations. Parameter analysis further indicated that specific combinations such as

ρ_{a}

and

β

strongly influence category formation within the network, and consequently the precision of the forecasts.

Keywords:

adaptive resonance theory; artificial neural networks; short-term forecasting; electricity system distribution; machine learning; multinodal load forecasting

1. Introduction

For electricity companies, providing accurate load forecasts is crucial to ensuring that the grid meets demand efficiently and reliably. Accurate forecasts make it possible to optimize supply while maintaining a continuous, stable, and economically viable service that avoids system interruptions and overloads. However, utilities face growing challenges in achieving higher accuracy, particularly with the increasing complexity of the grid due to the integration of intermittent renewable sources and varying consumption patterns [1].

Electricity load forecasting can be classified into four time horizons: very short term (VSLF), covering fractions of a second up to fifteen minutes; short term (STLF), covering hours to days; medium term (MTLF), covering weeks to months; and long term (LTLF), involving annual projections [1,2]. Traditionally, most forecasting approaches focus on overall system demand, which is essential for operational functions such as economic dispatch, safety assessment, and maintenance scheduling [3].

However, with the increasing complexity of power systems, multinodal forecasting that analyzes individual loads in substations, consumption zones, bars, or specific nodes has become equally important [3,4]. While global forecasting helps with the systemic balance between supply and demand, the multinodal approach is indispensable for static and dynamic system analysis such as load flow and voltage stability assessment [4]. Despite the challenges inherent in its greater complexity, this approach enables more effective infrastructure management on a regional scale [4,5]. Both global and multinodal forecasting, however, face challenges arising from the nonlinear and nonstationary nature of load data, which are influenced by factors such as extreme weather conditions, seasonality, and socioeconomic events.

Faced with these complexities, traditional linear models such as ARIMA and linear regression often prove inadequate and unable to capture the required dynamic and nonlinear relations [6,7]. To address these limitations, a wide range of advanced data-driven methods, particularly those based on machine learning and Artificial Neural Networks (ANN), have emerged as superior alternatives for modeling these intricate relationships while offering greater precision in real-world scenarios.

A study by [8] concluded that while traditional statistical and machine learning methods have their strengths, deep learning techniques are more suitable for the dynamic and nonlinear characteristics of electricity load data. Studies on the Greek electricity system have highlighted that even simple feedforward ANNs can achieve a decent level of predictive performance, though their results heavily depend on the careful selection and quality assurance of the input data [9].

The literature reveals a variety of methods for predicting electrical loads, including both classic and innovative approaches. For instance, ref. [10] obtained promising results by combining Autoregressive Integrated Moving Average (ARIMA) with ANNs for short-term electrical load forecasting, while [11] conducted a comprehensive evaluation of various techniques based on artificial intelligence and time series analysis. Ref. [12] demonstrated the importance of feature extraction and influencing factor analysis, showing that data mining improved results when combined with regression approaches such as Random Forest.

In recent years, more sophisticated approaches have been developed. Ref. [13] proposed a Bidirectional Long Short-Term Memory (BiLSTM) architecture that reduced Mean Absolute Error (MAE) by 25% and 46% compared to unidirectional Recurrent Neural Networks (RNNs) and Long Short-Term Memory networks (LSTMs) in 24-h forecasts. Similarly, ref. [14] further advanced the use of BiLSTM by incorporating stochastic weight averaging to improve generalization in substation-level forecasting.

Ref. [15] integrated a deep neural network with time series analysis and feature selection, achieving an MAPE of 3.78% and RMSE of 432.4, which outperformed ARIMA, Random Forest, and Support Vector Regression. Ref. [16] employed Temporal Convolutional Networks (TCNs) within a hierarchical framework that was resilient against missing or erroneous data, achieving reliable short-term forecasts for residential loads. At the distribution level, ref. [17] introduced a lean forecasting approach combining clustering, neural networks, and Monte Carlo methods, while [18] proposed a step-by-step framework that explicitly incorporates electric vehicle charging stations into load forecasting.

To further overcome existing limitations, a wide range of sophisticated and hybrid models has been explored. For example, ref. [19] highlighted that while conventional models such as Autoregressive Integrated Moving Average (ARIMA) have their place, more complex approaches like Fuzzy Time Series (FTS) outperform them due to their ability to handle nonlinear and uncertain data at low computational cost. Other hybrid approaches show similar promise: Ref. [20] proposed a method for multi-energy loads using a Radial Basis Function–AutoRegressive with eXogenous inputs (RBF-ARX) model for deterministic components and a Gaussian Mixture Model (GMM) for stochastic ones, improving forecast quality by up to 50%. Likewise, ref. [21] combined an Autoencoder (AE) for dimensionality reduction with a Radial Basis Function Neural Network (RBFNN) for load forecasting, demonstrating better accuracy and efficiency compared to a standalone RBFNN.

Additionally, ref. [22] compared the performance of ANN and Autoregressive with eXogenous inputs (ARX) models for net demand forecasting in distribution substations, concluding that the ANN model was better for direct forecasting due to the nonlinear nature of demand. Hybrid models have proven valuable in other domains as well, as seen in the work of [23], who used a Particle Filtering–ARIMA (PF-ARIMA) fusion model to predict battery useful life, achieving a 70% improvement over other methods by handling data specificities such as capacity regeneration.

A particularly relevant advance was presented by [24] with the Fuzzy-ARTMAP (FAM-ANN) continuous learning network, specifically developed for smart grids. This innovative solution proved capable of incorporating new data without requiring complete retraining, achieving a MAPE of ≈2% (60% better than the static version) and maintaining a consistent performance of 2.32% MAPE over 12 months, thereby combining precision, computational efficiency, and remarkable adaptability to dynamic scenarios.

Recently, multinodal forecasts have gained special prominence for their ability to integrate global planning with detailed local analysis. In this scenario, contributions include that of [25], who proposed a single-stage forecasting method. This approach simultaneously considers scenarios with and without the load contribution factor, offering a more efficient solution compared to conventional two-stage methods.

While traditional approaches require first predicting the global load and then predicting the local loads, the proposed method significantly simplifies computational complexity by performing these steps in an integrated manner while also demonstrating improvements in the precision of the results. Another notable advance was presented by [26], who introduced a methodology for forecasting consumption at all system nodes. Their method was based on the direct determination of specific rating coefficients for each voltage level, providing a comprehensive approach that encompasses everything from substations to supply zones and other critical components of the energy infrastructure. These advances in multinodal techniques pave the way for practical applications in real systems, such as the one presented in this paper.

However, it is crucial to note that the performance of these models depends fundamentally on the proper selection and optimization of their hyperparameters, such as learning rates and the number of neurons. An inadequate configuration can lead to problems of underfitting or overfitting, compromising the generalization of the model [27,28]. In this context, parameter optimization has proven to be a crucial strategy for improving the precision of load forecasting models. Techniques such as grid search, evolutionary algorithms, and Bayesian optimization make it possible to efficiently explore the parameter space in order to identify configurations that maximize predictive performance [29]. This importance is corroborated by work such as that of [30], which showed how hyperparameter optimization is decisive in reducing errors in short-term forecasting models for urban smart grids. Similarly, ref. [31] obtained significant performance gains by applying optimization techniques to a fuzzy ARTMAP neural network for forecasting at disaggregated levels, combining it with singular spectrum analysis for noise treatment.

Despite the advances achieved with machine learning, deep learning, and hybrid approaches, several challenges remain. Many models, including deep learning architectures such as LSTM and transformer models, are highly sensitive to hyperparameter configurations, prone to overfitting, and often require large datasets and long training times. In multinodal forecasting, where the complexity is higher and data availability may be limited, systematic strategies for parameter optimization and data enrichment are still underexplored. These gaps open space for approaches such as Fuzzy ARTMAP, derived from Adaptive Resonance Theory (ART), which offers resistance to outliers, incremental learning without catastrophic forgetting, reduced dependence on large datasets, and fast convergence [32]. These characteristics make ARTMAP a suitable alternative for scenarios that demand adaptability, accuracy, and efficiency.

This study applies and expands the methodology proposed by [33] for short-term (24 h ahead) multinodal electricity load forecasting, using data from nine substations of a system in New Zealand. The approach focuses on implementing the fuzzy ARTMAP neural network [34], emphasizing an exhaustive search for optimal parameters, detailed analysis of parameter effects, and enrichment of input data with descriptive statistics (maximum, minimum, mean, and standard deviation) calculated from the participation factors in the periods

t - 1

,

t - 2

, and

t - 3

. Its performance is compared with well-known benchmark models, including Multilayer Perceptron (MLP), Random Forest, and Support Vector Regression (SVR), to evaluate the method’s effectiveness and investigate how network configurations and input variables influence the accuracy of short-term electricity demand forecasts.

The remainder of this article is organized as follows: Section 2 outlines the key contributions of this study to the field of electrical load forecasting; Section 3 describes the methodology, including the dataset, the fundamentals of the Fuzzy ARTMAP network, the strategies for global and multinodal load forecasting, the evaluation criteria, and the windowing techniques applied; Section 4 presents and discusses the results, offering a comparative analysis of the performance of different techniques and a detailed investigation of the effect of network parameters on the forecasts; finally, Section 5 summarizes the study’s conclusions and suggests future research directions.

2. Contributions

This study offers contributions to the field of electricity load forecasting, focusing on the application and optimization of the Fuzzy ARTMAP network in a multinodal context. The main contributions are as follows:

Systematic Parameter Optimization: We present a rigorous approach based on an exhaustive search for critical Fuzzy ARTMAP network parameters ( $ρ_{a}$ , $ρ_{b}$ , and $β$ ). This goes beyond a simple application of the model, providing a detailed analysis of how parameter interactions influence network behavior, cluster formation, and forecast accuracy.
Statistical Feature Validation for Data Enrichment: The enrichment of input data with statistical features such as the standard deviation is shown to improve forecasting performance across multiple substations. This provides a new and effective strategy for preprocessing time series data.
Detailed Multinodal Forecasting Analysis: A comprehensive assessment of forecasting performance at a disaggregated level across nine individual substations is provided. The multinodal approach offers insight into local complexities of power networks that are often overlooked in global forecasting studies.
Discussion of Model Robustness and Limitations: Key limitations of the proposed method are analyzed, including its sensitivity to parameter configurations and the challenge of applying a single optimized parameter set to large-scale systems.

3. Load Forecasting

This section details the methodology used for multinodal electricity load forecasting. It starts with a description of the dataset and then presents the Fuzzy ARTMAP neural network. This section also explains how global and multinodal forecasts are made, including the mathematical formulas for the participation factor. Finally, the criteria for comparing the results are presented.

3.1. Dataset

The database was supplied by the Electricity Commission of New Zealand and collected from the Centralized Dataset (CDS) from January 2007 to March 2009. It contains active power values measured every half an hour at each substation along with indicators for the day of the month, month, year, day of the week, holidays, time, and load sample value. The substations studied are in the Waikato region, located in the northern part of New Zealand’s North Island. The substations considered were Kopu, Waikino, Waihou, Hamilton 11, Hamilton 33, Cambridge, Te Awamutu, Hinuera, and Kinleith.

For the training and validation stages of the Fuzzy ARTMAP network, a specific subset of this data was used covering the period from 8 December 2007 to 7 January 2008, totaling 1488 samples. The hyperparameter optimization for all models was performed on data from 8 December 2007 to 6 January 2008, with the objective of minimizing the error on the final day, 7 January 2008. The forecast period consisted of 48 load values (half-hourly) for a 24-h day (8 January 2008).

Figure 1 highlights various substations, distribution lines, and power stations on the map of the power transmission system in the north of New Zealand’s North Island.

3.2. Fuzzy ARTMAP

The Fuzzy ARTMAP supervised learning network [34] consists of two Adaptive Resonance Theory modules, ART_a and ART_b. These modules perform calculations based on fuzzy set theory [36], specifically using the AND (∧) as defined by (1).

{(x \land y)}_{i} = min \{x_{i}, y_{i}\}

(1)

Figure 2 illustrates the ART modules of the network. The input data and desired outputs are entered into the ART_a and ART_b modules, respectively; the inter-ART associative memory module is responsible for checking the correspondence between the input patterns and the expected output categories. A critical component of the network is the Match-Tracking used by the inter-ART module, which dynamically adjusts the vigilance parameter to achieve an optimal balance. This minimizes the prediction error while maximizing the model’s overall ability to handle a variety of input patterns while at the same time producing accurate results even for unseen data. Prior to training, the weight matrices associated with the ART_a, ART_b, and inter-ART modules are initialized with values of 1, indicating that no categories are active in the network. During the training process, as the network is exposed to the input data, resonance occurs between the input patterns and the predicted output categories, leading to the activation of the corresponding categories in the network. In turn, this triggers the updating of these weight matrices to capture the relationships learned between the input patterns and the output categories. Algorithm 1 shows the pseudocode for the Fuzzy ARTMAP network.

The input data for the global forecasting model is composed of a set of 11 binary bits related to the exogenous data plus three global load values from previous time steps, resulting in a 14-dimensional input space. For each substation, the input of the Fuzzy ARTMAP network (ART_a) is composed of the global load at the current time (h) and the participation factors from three past time steps, resulting in a four-dimensional input space. It should be noted that the raw data are not preprocessed or normalized externally, as the network performs L1 normalization internally.

The Fuzzy ARTMAP network’s performance is fundamentally influenced by the vigilance parameters

ρ_{a}

and

ρ_{b}

and the learning rate

β

. In this study, we perform an extensive hyperparameter search on these critical parameters, while other parameters such as

ρ_{a b}

,

α

, and

ϵ

are kept fixed. The search ranges were defined as follows:

ρ_{a}

from 0.90 to 0.99 (increment of 0.01),

ρ_{b}

from 0.980 to 0.999 (increment of 0.001), and

β

from 0.02 to 1.00 (increment of 0.01).

The vigilance parameter (

ρ

) acts as a similarity constraint for category formation. A higher

ρ

value enforces stricter similarity, leading to the creation of more categories (clusters). This is particularly relevant for

ρ_{b}

, which clusters the network’s output. A large number of clusters results in a narrower range of values per cluster; conversely, a low

ρ_{b}

generates fewer clusters, grouping a wider range of output values. This can lead to a “discretization” or “straight-line” effect on the forecast curve, as diverse output values are mapped to a single oversimplified category. The

α

parameter, or learning rate, is a constant that indicates the extent to which new patterns influence a category’s weight update. The

β

parameter, or training rate, controls the speed of learning, with a value of 1 implying faster training.

The inter-ART vigilance parameter

ρ_{a b}

acts as a match-tracking mechanism between the ART_a (input) and ART_b (output) modules. Its primary function is to ensure consistency between the input pattern and the expected output category. Because its role is to measure the compatibility between categories rather than to fine-tune the forecast precision itself, we considered its optimization to be non-critical for this study, and consequently kept its value fixed.

Algorithm 1 Pseudocode for training the Fuzzy ARTMAP neural network.

Input: a: Input Data;

b: Desired output;

α

: Learning Rate (

α > 0

);

β

: Training Parameter (

0 < β \leq 1

);

ρ_{a}

: Vigilance Parameter Module A (

0 < ρ_{a} \leq 1

);

ρ_{b}

: Vigilance Parameter Module B (

0 < ρ_{b} \leq 1

);

ρ_{a b}

: Vigilance Parameter Inter-ART Module (

0 < ρ_{a b} \leq 1

);

ϵ

: Increase.

Output:

W^{a b}

1:: procedure
2:: $W^{a} = 1$ $W^{b} = 1$ $W^{a b} = 1$
3:: $\bar{a} = \frac{a}{|\sum_{i}^{M} a_{i}|}$ $\bar{b} = \frac{b}{|\sum_{i}^{M} b_{i}|}$
4:: ${\bar{a}}_{i}^{c} = 1 - {\bar{a}}_{i} I_{a} = [\bar{a} {\bar{a}}^{c}] {\bar{b}}_{i}^{c} = 1 - {\bar{b}}_{i} I_{b} = [\bar{b} {\bar{b}}^{c}]$
5:: while Data for training exists do
6:: for all clusters $W_{k}^{b}$ do
7:: $T_{k}^{b} = \frac{| I_{b} \land W_{k}^{b} |}{α + | W_{k}^{b} |}$
8:: $f l a g = T r u e$
9:: while flag = True do
10:: $K = max {T_{k}^{b} : k = 1, . . ., N_{b}}$
11:: if $\frac{| I_{b} \land W_{K}^{b} |}{| I_{b} |} \geq ρ_{b}$ then
12:: $W_{K}^{b} = β (I_{b} \land W_{K}^{b}) + (1 - β) \cdot W_{K}^{b}$
13:: $y^{b} = \{\begin{matrix} 0, & k \neq K, \\ 1, & k = K . \end{matrix}$
14:: $f l a g = F a l s e$
15:: else
16:: $T_{J}^{b} = 0$
17:: for all clusters $W_{k}^{a}$ do
18:: $T_{j}^{a} = \frac{| I_{a} \land W_{j}^{a} |}{α + | W_{j}^{a} |}$
19:: $f l a g = T r u e$
20:: while flag = True do
21:: $J = max {T_{j}^{a} : j = 1, . . ., N_{a}}$
22:: if $\frac{| I_{a} \land W_{J}^{a} |}{| I_{a} |} \geq ρ_{a}$ then
23:: if $\frac{| y^{b} \land W_{J}^{a b} |}{| y^{b} |} \geq ρ_{a b}$ then
24:: $W_{J}^{a} = β (I_{a} \land W_{J}^{a}) + (1 - β) \cdot W_{J}^{a}$
25:: $W_{J K}^{a b} = \{\begin{matrix} 0, & j \neq J, k \neq K, \\ 1, & j = J, k = K . \end{matrix}$
26:: $f l a g = F a l s e$
27:: else
28:: $T_{J}^{a} = 0$
29:: $ρ_{a} = \frac{| I_{a} \land W_{J}^{a} |}{| I_{a} |} + ϵ$
30:: $T_{J}^{a} = 0$

3.3. Global Load Forecasting

To forecast the multinodal load, it is necessary to obtain the global (forecast) load; in turn, this is used to calculate the participation factor provided by Equation (2), which is fundamental to multinodal forecasting. Global forecasting is carried out by a system called the global load forecasting system, which is based on the Fuzzy ARTMAP neural network [33]. The input data set of the

{ART}_{a}

module in this system contains exogenous data such as time of day, day of the week, month, holidays, daylight savings time, and global load (

C G

) at three previous time instants (

t - 1

,

t - 2

, and

t - 3

). This representation of the global load includes the “sliding window” technique proposed by [37] and aims to identify the behavior of the load profile. The forecasting result consists of the

C G

at time t, which is provided by the output of the

{ART}_{b}

module of the Fuzzy ARTMAP network. The exogenous variables are described in detail in Table 1, and are composed of 11 binary bits related to the exogenous data plus the three global load values from previous time steps, resulting in a 14-dimensional input space. Figure 3 details the global load forecasting strategy with the Fuzzy ARTMAP network.

3.4. Multinodal Load Forecasting

The multinodal load forecasting system (

P C M

) is based on the Fuzzy ARTMAP neural network. Forecasting in this system is carried out in modules, where each module contains a neural network responsible for predicting the load at individual locations such as substations, transformers, and feeders. To perform substation load forecasting, the multinodal load forecasting system uses the global participation factor (

F P G

), defined as the percentage of each substation’s load within the total system load, as specified in (2) [37,38]:

F P G_{j} (t) = \frac{C M_{j} (t)}{C G (t)}

(2)

in which:

$C M_{j} (t)$ is the multinodal load associated with substation j at time t.
$C G (t)$ is the global load, which corresponds to the addition of the loads of each substation at time t.

For each substation, the load forecast is obtained by the multinodal forecast. The input dataset for the Fuzzy ARTMAP network for each substation j consists of the global load at time t,

C G (t)

, and the participation factors of substation j at times

t - 1

,

t - 2

, and

t - 3

. The network output in the

P C M

system corresponds to the predicted participation factor at time t. The flowchart of the multinodal forecasting system is shown in Figure 4.

Within this system, statistical features derived from the global participation factors (

F P G_{j}

) at times

t - 1

,

t - 2

, and

t - 3

were evaluated as potential input extensions to the network. The considered statistics included the mean, standard deviation, maximum, and minimum values, which were assessed individually in order to determine their impact on forecasting performance.

These statistical features are part of a feature engineering strategy. Feature engineering aims to transform raw data into informative input variables that help the model to capture relevant patterns. By deriving statistics from the recent participation factors, each feature provides a distinct perspective on the substation’s recent load behavior, such as its central tendency, variability, or extreme values [39,40,41]. Evaluating these features separately allows the most effective input to be identified for each substation and forecasting horizon.

From a theoretical perspective, each derived feature adds a new dimension to the input space. For example, using only the mean of the past three

F P G_{j}

values extends the original three-dimensional lag space by one dimension. This dimensionality increase provides the Fuzzy ARTMAP network with richer information, enhancing its ability to separate patterns corresponding to different load behaviors and improving generalization [42,43].

The mean is obtained as described by (3):

{\bar{F P G}}_{j} = \frac{1}{N} \sum_{i = 1}^{N} F P G_{j} (t - i)

(3)

in which:

$N = 3$ (considering the values at $t - 1$ , $t - 2$ , and $t - 3$ ).
$F P G_{j} (t - i)$ is the global participation factor of substation j at time $t - i$ .

The standard deviation, which measures the variability of the data around the mean, is calculated by (4).

s_{F P G_{j}} = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(F P G_{j} (t - i) - {\bar{F P G}}_{j})}^{2}}

(4)

The maximum and minimum values are determined by Equations (5) and (6).

F P G_{j}^{\max} = max (F P G_{j} (t - 1), F P G_{j} (t - 2), F P G_{j} (t - 3))

(5)

F P G_{j}^{\min} = min (F P G_{j} (t - 1), F P G_{j} (t - 2), F P G_{j} (t - 3))

(6)

By evaluating each statistic individually, the most informative feature can be selected for each substation, thereby optimizing the network’s predictive performance. Empirical studies indicate that networks using feature-augmented inputs outperform those trained solely on raw lagged values, particularly in capturing trends, variability, and extreme events.

After the participation factor has been predicted at time t, the substation load

P C M_{j} (t)

is obtained by multiplying the predicted participation factor

{\hat{F P G}}_{j} (t)

by the predicted global load

\hat{C G} (t)

(7) [33,37]:

P C M_{j} (t) = {\hat{F P G}}_{j} (t) \times {\hat{C G}}_{j} (t) .

(7)

3.5. Benchmark Models and Hyperparameter Optimization

To validate the effectiveness of our proposed Fuzzy ARTMAP methodology, we compared its performance with three widely used machine learning models implemented in the Scikit-learn library in Python [44]: Multi-layer Perceptron (MLP), Random Forest, and Support Vector Regression (SVR). For a fair comparison, each model went through a hyperparameter optimization process to find its best configuration, using the same training and validation period as Fuzzy ARTMAP. The optimization used a grid search, with MAPE as the main criterion for selecting the best model.

These models were chosen because they represent different approaches to time series forecasting. MLP is a type of artificial neural network that captures nonlinear relationships by learning patterns across multiple layers of neurons, making it a stronger alternative to linear models [45,46]. Random Forest combines many decision trees built from random subsets of the data, which lowers the risk of overfitting and produces more stable predictions [47]. SVR, an extension of Support Vector Machines, models nonlinear relationships through kernels, is less sensitive to outliers, and balances model complexity with generalization by defining a margin of tolerance around the regression function [48].

The hyperparameter search spaces for each model were defined as follows:

MLP
-
Hidden layer sizes: (5), (10), (5, 5), (10, 5), (5, 10), (20), (20, 15), (20, 30), (100), (200), (100, 50)
-
Maximum number of iterations: 1500
-
Activation function: ‘logistic’, ‘tanh’, ‘relu’
-
Optimization method (solver):
∗
‘LBFGS’ (Limited-memory Broyden–Fletcher–Goldfarb–Shanno)
∗
‘SGD’ (Stochastic Gradient Descent)
∗
‘ADAM’ (Adaptive Moment Estimation)
Random Forest
-
Number of trees (estimators): 50, 100, 200
-
Maximum depth: 2, 5, 8, 10, 15, 20, 25, 30, None
-
Minimum samples required to split: 2, 5, 10
-
Splitting criterion: ‘squared_error’, ‘absolute_error’, ‘friedman_mse’, ‘poisson’
SVR
-
Kernel type: ‘RBF’, ‘poly’, ‘sigmoid’
-
Regularization parameter (C): 1, 10, 100
-
Tolerance margin (epsilon): 0.01, 0.1, 0.5

3.6. Criteria for Comparing Results

The Mean Absolute Percentage Error (MAPE) is a metric that is commonly used to assess the precision of forecasting models. It is calculated by taking the mean of the absolute percentage errors between the predicted values and the actual values. The MAPE formula is provided by (8):

MAPE = \frac{1}{n} \sum_{t = 1}^{n} |\frac{F P G_{t} - {\hat{F P G}}_{t}}{F P G_{t}}| \times 100

(8)

in which:

$F P G_{t}$ represents the actual value of the participation factor at time t.
${\hat{F P G}}_{t}$ represents the predicted value of the participation factor at time t.
n is the number of hours, with a reading every half hour.

MAPE is widely used due to its interpretability; it expresses the error as a percentage, which makes it easier to understand the model’s performance [49]. Smaller MAPE values indicate more precise forecasts. The use of MAPE is particularly well suited for this study, as it provides a normalized error measure across different substations and load profiles. This allows for a direct comparison of forecasting quality between substations with varying load levels, since the metric is not biased by the scale of the load. Lower MAPE values mean better forecasts.

4. Load Forecast Result

This section presents and discusses the results obtained from the multinodal electricity load forecasting. It begins with a detailed analysis of the MAPE values for the nine substations, comparing the performance of different windowing techniques, followed by an in-depth look at the behavior of the Fuzzy ARTMAP parameters and their impact on forecasting precision.

4.1. Results

The results presented in Table 2 provide a detailed analysis of the MAPE values for nine substations, using windowing methods that consider the maximum, minimum, mean, and standard deviation values of the samples calculated considering the participation factor at times

t - 1

,

t - 2

, and

t - 3

.

Analysis of the results reveals different behavior between the substations. The Kopu substation showed relatively stable performance in terms of MAPE, with values of 10.35% (maximum), 9.68% (minimum), and 9.64% (mean). On the other hand, the standard deviation introduced a significant improvement, reducing the MAPE to 4.79%. This indicates a considerable improvement in the forecast. Figure 5 shows the load curve, where it can be seen that the predicted values follow the actual load well throughout the day. The range around the forecast curve is relatively narrow, indicating that the variability within the cluster of the forecast values is low.

Waikino, on the other hand, revealed a trend of continuous improvement in forecasts when using different windowing techniques. The MAPE showed a significant reduction when using the minimum value (5.24%) compared to the maximum value (6.93%) and the mean (7.11%). In addition, using the standard deviation resulted in an MAPE of 5.00%, the lowest value observed for Waikino. Figure 6 shows the forecast and actual load curves as well as the range obtained by the standard deviation within the cluster of forecasts. This range is slightly wider than that observed in Kopu, suggesting greater variability in the predicted values. However, the predictions still align well with the actual load, demonstrating the model’s effectiveness even in the face of this variability.

For Waihou, the MAPE results varied from 3.99% (minimum) to 3.68% (mean). This variation suggests that the different windowing techniques do not have a significant impact on the precision of the forecasts for this substation. The mean showed the best result, with an MAPE of 3.68%. The forecast curve in the Figure 7 closely follows the actual load and the range around the forecast is relatively narrow, indicating that the model captures load fluctuations well and has low variability in the forecast clusters.

Hamilton 11 showed a moderate variation in MAPE values, with a maximum difference of 0.57% between the different methods. The best result was obtained using the mean (3.68%), followed by the minimum (3.79%). The moderate variation suggests that all the windowing techniques offer relatively accurate prediction, although the mean was able to better capture the trend of the data in this substation. As shown in Figure 8, the confidence interval around the forecasts is narrow, indicating that the forecast is quite accurate and that the variability in the forecasts is low.

For Hamilton 33, all MAPE values were lower than 4.00%, with the best result obtained by the mean (3.21%). The corresponding prediction curve can be seen in Figure 9. The consistency of the results suggests that windowing techniques are highly effective for this substation. In particular, inclusion of the mean seems to capture important characteristics that contribute to more precise predictions.

Cambridge showed a smooth and constant variation in MAPE values, from 3.16% (maximum) to 2.72% (minimum), with the minimum providing the best result, as shown in Figure 10. It is worth noting that the minimum in Cambridge was not only the best result among all the substations but also had the lowest MAPE among the four techniques used, showing uniform and accurate performance. The forecast curve followed the actual curve well, with a slight divergence observed between 5:30 and 8:30. The interval around the forecast is quite narrow as a result of the low variability of the clusters.

Te Awamutu showed MAPE values of less than 4.5%, with the best result obtained with the minimum value (3.07%) and the highest with the maximum (4.34%). The forecast curve shown in Figure 11 shows predicted behavior very close to the real values. In addition, the forecast showed no variability within the clusters, suggesting that for this substation there were no significant differences between the load values within the clusters.

Hinuera maintained a similar variation in MAPE values, between 3.39% and 3.60%, showing consistent performance across the different windowing techniques. This performance reveals that all the techniques offer relatively accurate predictions, with the standard deviation being slightly more effective in this substation. The range around the prediction is slightly wider, as shown in Figure 12, indicating more variability in the predicted values; however, the predictions still follow the actual load well.

Finally, Kinleith showed an improvement in MAPE with the standard deviation (5.15%), standing out from the other techniques that kept MAPE constant at 5.40%. The prediction curve can be seen in Figure 13. It shows greater variability in the interval around the prediction curve, indicating that there was a greater variety of values within the clusters.

In general, substations such as Kopu, Waikino and Kinleith stand out in the forecasts when using the standard deviation. On the other hand, Waihou, Hamilton 11, and Hamilton 33 showed better results when using the minimum value. Other substations such as Cambridge, Te Awamutu, and Hinuera also showed improvements with different strategies. The standard deviation and minimum value proved to be effective techniques in most cases, with each offering its own advantages for predicting electrical load. The combination of parameters that produced the best results can be analyzed in Table 3.

When analyzing the table with the configuration of parameters that obtained the best results for each substation, we observed a relationship between these parameters and those substations that showed the greatest variability in the intervals around the prediction curve. The Kopu, Waikino, Hamilton 11, and Kinleith substations showed the greatest variations. This is directly related to the

ρ_{b}

parameter, which controls the required similarity for a given piece of data to belong to the cluster. Lower values of

ρ_{b}

relax this restriction, allowing slightly different values to be included. On the other hand, values closer to 1 restrict similarity so that relatively close data does not fit into the cluster, which can result in the creation of a new cluster to house it. The Kinleith substation, although it had a wider range with

ρ_{b}

close to 1, had high variability due to the

β

parameter responsible for the learning rate being close to zero. This reduced the creation of new categories, increasing the variability within the existing clusters.

It can be seen that the Waihou, Hamilton 33, Cambridge, and Te Awamutu substations have high values for

ρ_{a}

,

ρ_{b}

, and

β

, which is directly related to greater variability and standard deviation within the clusters formed by the prediction labels that represent the actual load. These parameters indicate greater rigidity in the formation of these clusters, ensuring that only values with high similarity are grouped together. On the other hand, substations such as Waikino, Hamilton 11, and Kinleith have lower

ρ_{a}

and

β

values and show greater variability, as reflected in larger shaded areas around the prediction curves.

The results show notable differences in forecasting performance across the nine substations, reflecting the unique characteristics of each substation’s load profile. Substations such as Kopu and Kinleith benefited most from including the standard deviation as an input, suggesting more volatile or irregular load patterns; in contrast, Waihou and Hamilton 33 performed best when the mean was used, indicating more stable and predictable consumption. This demonstrates that the Fuzzy ARTMAP model adapts to the most informative statistical feature for each case, thereby maximizing prediction accuracy. A single forecasting strategy does not fit all substations in a multinodal system; tailoring the approach to each substation’s characteristics is necessary to achieve the best results.

4.2. Computational Performance and Method Practicality

The computational performance of the Fuzzy ARTMAP method was evaluated during the forecasting phase to assess its practical applicability. Figure 14 shows the prediction time per sample (measured in microseconds,

μ

s) for each substation and each statistical windowing technique.

The results indicate that the proposed method is highly efficient. For most substations and windowing approaches, the prediction time per sample remains below 1

μ

s. The mean, maximum, and minimum techniques consistently show the lowest computational cost across all substations.

The standard deviation approach, which often yields the most accurate forecasts, requires slightly more computation. Notable increases in prediction time occur at the Waihou and Te Awamutu substations, reaching approximately 4

μ

s and 2

μ

s per sample, respectively. Despite these peaks, the times remain in the microsecond range, supporting real-time application.

This efficiency highlights an advantage of the Fuzzy ARTMAP architecture. Unlike complex deep learning models, which demand extended training and specialized hardware, this method provides fast predictions with minimal computational resources. The combination of rapid training and prediction times makes the Fuzzy ARTMAP model suitable for scalable real-time load forecasting in dynamic smart grid environments.

4.3. Analysis of Parameters

Analysis of the parameters of the Fuzzy ARTMAP network applied to predicting electrical loads in substations is crucial to optimizing its performance, since it is an architecture that is sensitive to changes in parameters. To adjust the parameters,

β

was varied from 0.02 to 1.00 with an increment of 0.02,

ρ_{a}

of the ART_a module from 0.91 to 0.99 with an increment of 0.02, and

ρ_{b}

of the ART_b module from 0.98 to 0.999 with an increment of 0.001. This analysis was applied to all substations, taking into account the expansion of the input data dimension with the maximum, minimum, mean, and standard deviation values of the load participation factor at times

t - 1

,

t - 2

, and

t - 3

. The discussion on the behavior of the parameters is reported for the Kopu substation; similar behavior was observed for the other substations, especially in relation to the parameters

ρ_{a}

,

ρ_{b}

,

β

, and the number of categories in the ART_a and ART_b modules of Fuzzy ARTMAP.

Figure 15 illustrates the three-dimensional relationship between the parameters

ρ_{a}

,

β

, and the number of categories (neurons) created in module A of the Fuzzy ARTMAP network. In Figure 15a, it can be observed that high values of

ρ_{a}

combined with high values of

β

result in a greater number of categories. This combination of parameters makes the network more prone to creating multiple categories due to the greater restriction imposed by

ρ_{a}

, which prevents new data from being allocated to existing clusters, necessitating the creation of new categories to accommodate them whenever the similarity does not reach the stipulated threshold (

ρ

). In contrast, although the

β

parameter allows for fast learning without many training cycles when

β = 1

, it prevents the old information contained in the clusters from remaining. Instead, this information is replaced by the current information, leading to the creation of more categories. Similar behavior is observed for Figure 15b–d, which illustrates the same relationship with the insertion of the minimum, mean, and standard deviation of the three previous loads.

Figure 16 follows a similar structure to Figure 15 but focuses on module B of the Fuzzy ARTMAP network. In Figure 16a, it can be observed that high values of

ρ_{b}

combined with high values of

β

favor the creation of a greater number of categories, similar to what was observed in Figure 16a, but generating a smaller number of clusters because these clusters are responsible for housing the network’s output information.

The interaction between the parameters

ρ_{b}

,

β

, and the Mean Absolute Percentage Error (MAPE) can be seen in Figure 17 when inserting the maximum, minimum, mean, and standard deviation of the three previous loads into the input data. This makes it easier to understand which values of

β

and

ρ_{b}

produced the best results.

In Figure 17a, the lowest MAPE values are observed for combinations of high

ρ_{b}

values, while

β

shows more flexibility, oscillating within the

[0, 1]

range. This relationship between low MAPE values and high

ρ_{b}

values indicates that the network needs to create a considerable number of categories in module B during training in order to obtain good results. This allows the network to predict results more accurately when it comes to forecasting. However, the value of

β

does not seem to have a significant impact on the result, unlike its influence on the creation of clusters reported in Figure 16.

Similar behavior is observed when the other statistics are used to expand the input dimension of the data, as shown in Figure 17b–d. In addition, it can be seen that for Figure 17d there is a greater fluctuation of the MAPE for the walls, causing more irregularities on the surface compared to the others. This unevenness indicates that the model’s response is more unstable when the standard deviation is used as one of the inputs, reflecting greater sensitivity to variations in the data. The more uneven surface also reveals that small adjustments to the

ρ_{b}

and

β

parameters can lead to significant changes in MAPE.

Despite this irregularity, the standard deviation produced the best results for most substations, as shown in Table 2. This is because the standard deviation captures the variability of the data more effectively, allowing the model to better adjust to fluctuations and make more precise predictions. This highlights the importance of selecting the appropriate statistical feature for each unique dataset, which is a key contribution of this work.

Figure 18 explores the relationship between

ρ_{a}

,

β

, and MAPE. Analyzing Figure 18a,b,d, it can be seen that the highest error rates occur in regions where

ρ_{a}

and

β

are high; in other words, a low constraint in module A (low

ρ_{a}

) combined with a high learning rate (high

β

) can lead to greater imprecision in predictions.

However, a different behavior is observed when using the standard deviation to scale the input data, as shown in Figure 18d, where it can be seen that the lowest prediction errors are achieved when

ρ_{a}

is medium or low and

β

oscillates throughout the range, with values close to 0.6 obtaining the lowest MAPE.

Finally, Figure 19 shows the relationship between the parameters

ρ_{a}

,

ρ_{b}

, and MAPE. Figure 19a–c illustrates that the highest error rates occur when both

ρ_{a}

and

ρ_{b}

are low. In other words, relaxing the similarity of modules A and B can lead to greater inaccuracy in predictions. On the other hand, Figure 19d reveals that by including the standard deviation, the behavior of

ρ_{b}

is modified. Higher values of

ρ_{b}

tend to reduce the model’s performance, which means that a greater restriction on

ρ_{b}

(resulting in a greater number of categories) compromises the precision of the forecasts when the standard deviation is taken into account.

4.4. Global Parameter Optimization Analysis

A global optimization was performed to evaluate whether a single set of parameters could provide satisfactory performance across all nine substations. The results are shown in Table 4. Using the standard deviation as the windowing feature yielded the lowest global MAPE of 9.98%.

The single best global parameter set optimized across all substations resulted in an MAPE of 14.85%. This is considerably higher than the substation-specific results, where some values reached 2.72% (Table 2). This outcome confirms that the load characteristics varied significantly between substations.

Although the global approach is less computationally demanding and may be suitable for large-scale grids, local optimization for each substation provides higher forecasting accuracy.

4.5. Comparison with Other Models

The Fuzzy ARTMAP model was evaluated against three established machine learning techniques: Multi-Layer Perceptron (MLP), Support Vector Regression (SVR), and Random Forest (RF). All models were applied to the same dataset using identical statistical windowing techniques, with performance assessed by MAPE as reported in Table 5.

Fuzzy ARTMAP achieved the lowest error in several substations. For example, at Waihou and Hamilton 11, the mean windowing method yielded MAPEs of 3.68%, outperforming all other models. Cambridge and Te Awamutu achieved the best results with the minimum windowing method, reaching 2.72% and 3.07%, respectively. For Kopu and Waikino, the standard deviation windowing method provided the lowest errors.

These characteristics make Fuzzy ARTMAP a reliable alternative for multinodal electrical load forecasting, consistently providing accurate predictions across substations with diverse load profiles. Its ability to adjust to different statistical features allows it to maintain high precision while remaining computationally efficient, confirming its suitability for real-time forecasting in multinodal power systems.

4.6. Advantages and Limitations of the Study

The methodology presented in this paper combines data enrichment with the Fuzzy ARTMAP network. Using statistical features such as the mean, standard deviation, maximum, and minimum to expand the input data allows the model to adjust to different load profiles, improving the precision of predictions. This feature enables more stable loads to benefit from the mean, while more volatile loads benefit from the standard deviation. These results demonstrate the method’s adaptability to various conditions. Comparisons with other techniques such as MLP, SVR, and Random Forest show that Fuzzy ARTMAP achieves smaller errors in several substations.

Another clear advantage is its computational efficiency. The model makes predictions in microseconds without requiring specialized hardware or long training periods, which ensures its viability in real-time applications and scenarios with limited computational resources.

The flexibility gained from using statistics to enrich the data suggests that the proposed approach can be applied to other types of time series beyond electrical load data. There is potential for use in predicting vibrations in mechanical systems, as part of an equipment’s useful life analysis, or in other data series such as network traffic or financial indicators. The ability to recognize patterns and group information in an adaptive way allows the proposed method to be explored in different domains with internal variability and dynamic behavior.

Despite its advantages, this study has limitations that open the door for new work. The dataset we used is from 2010, a period before the widespread adoption of renewable energy and smart meters, which have altered consumption patterns. This means that application of the proposed method in modern grids still needs validation. In addition, the work was performed on a limited number of substations, which raises the challenge of scalability. For larger networks, optimization by exhaustive search becomes unfeasible. An exploration of distributed computing or other more efficient adjustment methods to process data in parallel could be a solution.

Another point is that the model’s performance on abnormal days such as holidays or under extreme weather conditions was not evaluated. These events generate load patterns that deviate from regular behavior. An analysis in these scenarios would be important to verifying the model’s reliability. It is also relevant to note that while using multiple statistics enriches the data, it also increases the complexity of the input set. Future work could investigate which combinations of statistics are the most relevant for different load types, thereby reducing redundancies and maintaining efficiency.

Finally, integration with probabilistic methods which can provide confidence intervals for predictions would be a valuable enhancement. Expanding the study to larger networks or other time series domains would help to evaluate the generality of the proposed approach as well as the adjustments needed for different contexts.

5. Conclusions

Precise forecasting of electricity demand is an essential tool for the efficient and sustainable management of today’s energy systems. This study applied the Fuzzy ARTMAP network to multinodal short-term electricity load forecasting at nine substations in New Zealand, presenting an approach that includes exhaustive parameter analysis and search along with the incorporation of statistics such as maximum, minimum, mean, and standard deviation.

The results showed that standard deviation was the most effective among the windowing techniques we explored in improving the precision of the forecasts, resulting in a reduction in Mean Absolute Percentage Error (MAPE) for most of the substations. In addition, our analysis of the Fuzzy ARTMAP parameters highlighted that the combination of the

ρ_{a}

and

β

parameters has a significant impact on the creation of categories, and consequently on the precision of forecasts.

The shaded area around the load curve, which represents the variability of the data and the uncertainty of the forecasts, was particularly impacted by the configuration of the parameters. Substations with higher values of

ρ_{a}

,

β

, and

ρ_{b}

, such as Waihou, Hamilton 33, Cambridge, and Te Awamutu, showed a significant reduction in the shaded area of the load curve, indicating greater stability in the demand forecast. On the other hand, stations with lower parameters, such as Waikino, Hamilton 11, and Kinleith, showed greater dispersion, reflecting a larger shaded area and less accurate forecasting.

This information indicates that fine adjustments to parameter settings, especially those such as

ρ_{b}

that are responsible for checking the similarity of data within clusters, have a direct effect on the model’s ability. When

ρ_{a}

and

ρ_{b}

are lower, this allows for greater variation in the data, resulting in greater imprecision and a larger shaded area.

To assess its scalability and robustness of this methodology in different contexts, future work could explore its application to other data types, such as demand from industrial consumers or individual households. Furthermore, a comparative analysis with other advanced machine learning models such as transformers or hybrid models could be conducted to more comprehensively evaluate the performance of Fuzzy ARTMAP.

Author Contributions

Conceptualization, J.R.M.J., R.J.d.S., C.R.d.S.J. and M.L.M.L.; methodology, J.R.M.J. and R.J.d.S.; software, J.R.M.J. and R.J.d.S.; validation, J.R.M.J., R.J.d.S., C.R.d.S.J., T.A. and M.L.M.L.; formal analysis, J.R.M.J., R.J.d.S., C.R.d.S.J., T.A. and M.L.M.L.; investigation, J.R.M.J., R.J.d.S. and M.L.M.L.; resources, J.R.M.J., R.J.d.S. and M.L.M.L.; data curation, J.R.M.J. and R.J.d.S.; writing—original draft preparation, J.R.M.J. and R.J.d.S.; writing—review and editing, J.R.M.J., R.J.d.S., C.R.d.S.J., T.A. and M.L.M.L.; visualization, J.R.M.J.; supervision, M.L.M.L.; project administration, M.L.M.L.; funding acquisition, M.L.M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hong, T.; Fan, S. Probabilistic electric load forecasting: A tutorial review. Int. J. Forecast. 2016, 32, 914–938. [Google Scholar] [CrossRef]
Alfares, H.K.; Nazeeruddin, M. Electric load forecasting: Literature survey and classification of methods. Int. J. Syst. Sci. 2002, 33, 23–34. [Google Scholar] [CrossRef]
Amjady, N. Short-Term Bus Load Forecasting of Power Systems by a New Hybrid Method. IEEE Trans. Power Syst. 2007, 22, 333–341. [Google Scholar] [CrossRef]
Amorim, A.J.; Abreu, T.A.; Tonelli-Neto, M.S.; Minussi, C.R. A new formulation of multinodal short-term load forecasting based on adaptive resonance theory with reverse training. Electr. Power Syst. Res. 2020, 179, 106096. [Google Scholar] [CrossRef]
Panapakidis, I.P. Clustering based day-ahead and hour-ahead bus load forecasting models. Int. J. Electr. Power Energy Syst. 2016, 80, 171–178. [Google Scholar] [CrossRef]
Liu, D.; Zeng, L.; Li, C.; Ma, K.; Chen, Y.; Cao, Y. A Distributed Short-Term Load Forecasting Method Based on Local Weather Information. IEEE Syst. J. 2018, 12, 208–215. [Google Scholar] [CrossRef]
Hussain, A.; Rahman, M.; Memon, J.A. Forecasting electricity consumption in Pakistan: The way forward. Energy Policy 2016, 90, 73–80. [Google Scholar] [CrossRef]
Dong, Q.; Huang, R.; Cui, C.; Towey, D.; Zhou, L.; Tian, J.; Wang, J. Short-Term Electricity-Load Forecasting by Deep Learning: A Comprehensive Survey. arXiv 2024, arXiv:2408.16202. Available online: http://arxiv.org/abs/2408.16202 (accessed on 7 April 2025). [CrossRef]
Stamatellos, G.; Stamatelos, T. Short-Term Load Forecasting of the Greek Electricity System. Appl. Sci. 2023, 13, 2719. [Google Scholar] [CrossRef]
Tarmanini, C.; Sarma, N.; Gezegin, C.; Ozgonenel, O. Short term load forecasting based on ARIMA and ANN approaches. Energy Rep. 2023, 9, 550–557. [Google Scholar] [CrossRef]
Habbak, H.; Mahmoud, M.; Metwally, K.; Fouda, M.M.; Ibrahem, M.I. Load Forecasting Techniques and Their Applications in Smart Grids. Energies 2023, 16, 1480. [Google Scholar] [CrossRef]
Kamath, A.K.; Rajalakshmi, S.B.L. Data Mining and Identification of Influencing Factors for Effective Load Forecasting of A Substation Using Machine Learning Regressors. In Proceedings of the 2024 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER), Mangalore, India, 18–19 October 2024; pp. 14–19. [Google Scholar] [CrossRef]
Pavlatos, C.; Makris, E.; Fotis, G.; Vita, V.; Mladenov, V. Enhancing Electrical Load Prediction Using a Bidirectional LSTM Neural Network. Electronics 2023, 12, 4652. [Google Scholar] [CrossRef]
Zhu, Q.; Zeng, S.; Chen, M.; Wang, F.; Zhang, Z. Short-Term Load Forecasting Method Based on Bidirectional Long Short-Term Memory Model with Stochastic Weight Averaging Algorithm. Electronics 2024, 13, 3098. [Google Scholar] [CrossRef]
Waheed, W.; Xu, Q. Data-driven short term load forecasting with deep neural networks: Unlocking insights for sustainable energy management. Electr. Power Syst. Res. 2024, 232, 110376. [Google Scholar] [CrossRef]
Türkoğlu, A.S.; Erkmen, B.; Eren, Y.; Erdinç, O.; Küçükdemiral, İ. Integrated Approaches in Resilient Hierarchical Load Forecasting via TCN and Optimal Valley Filling Based Demand Response Application. Appl. Energy 2024, 360, 122722. [Google Scholar] [CrossRef]
Wen, H.; Li, X.; Chen, B.; Chen, J.; Yang, S.; Wu, Q. Lean load forecasting method for distribution substation area. In Proceedings of the 2024 3rd International Conference on Energy, Power and Electrical Technology (ICEPET), Chengdu, China, 17–19 May 2024; pp. 1320–1324. [Google Scholar] [CrossRef]
Huang, J.; Zhu, C.; Liang, Q.; Jiang, N.; Luo, S.; Wu, Q. A step-by-step load forecasting method considering electric vehicle charging stations. In Proceedings of the 2024 3rd International Conference on Energy, Power and Electrical Technology (ICEPET), Chengdu, China, 17–19 May 2024; pp. 1325–1329. [Google Scholar] [CrossRef]
Züge, C.V.; Coelho, L.d.S. Granular Weighted Fuzzy Approach Applied to Short-Term Load Demand Forecasting. Technologies 2024, 12, 182. [Google Scholar] [CrossRef]
Xie, X.; Ding, Y.; Sun, Y.; Zhang, Z.; Fan, J. A novel time-series probabilistic forecasting method for multi-energy loads. Energy 2024, 306, 132456. [Google Scholar] [CrossRef]
Veeramsetty, V.; Konda, P.K.; Dongari, R.C.; Salkuti, S.R. Short-Term Load Forecasting in Distribution Substation Using Autoencoder and Radial Basis Function Neural Networks: A Case Study in India. Computation 2025, 13, 75. [Google Scholar] [CrossRef]
Garcia-Garrido, E.; Mendoza-Villena, M.; Lara-Santillan, P.M.; Zorzano-Alba, E.; Falces, A. Net demand short-term forecasting in a distribution substation with PV power generation. E3S Web Conf. 2020, 152, 01001. [Google Scholar] [CrossRef]
He, N.; Yang, Z.; Qian, C.; Li, R.; Gao, F.; Cheng, F. Remaining useful life prediction of lithium-ion battery based on fusion model considering capacity regeneration phenomenon. J. Energy Storage 2024, 85, 111068. [Google Scholar] [CrossRef]
da Silva, M.A.; Abreu, T.; Santos-Júnior, C.R.; Minussi, C.R. Load forecasting for smart grid based on continuous-learning neural network. Electr. Power Syst. Res. 2021, 201, 107545. [Google Scholar] [CrossRef]
Rai, S.; De, M. NARX: Contribution-factor-based short-term multinodal load forecasting for smart grid. Int. Trans. Electr. Energy Syst. 2021, 31, e12726. [Google Scholar] [CrossRef]
Osgonbaatar, T.; Matrenin, P.; Safaraliev, M.; Zicmane, I.; Rusina, A.; Kokin, S. A Rank Analysis and Ensemble Machine Learning Model for Load Forecasting in the Nodes of the Central Mongolian Power System. Inventions 2023, 8, 114. [Google Scholar] [CrossRef]
Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Probst, P.; Boulesteix, A.L.; Bischl, B. Tunability: Importance of Hyperparameters of Machine Learning Algorithms. J. Mach. Learn. Res. 2019, 20, 1–32. [Google Scholar]
Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020, 415, 295–316. [Google Scholar] [CrossRef]
Li, C. Designing a short-term load forecasting model in the urban smart grid system. Appl. Energy 2020, 266, 114850. [Google Scholar] [CrossRef]
Müller, M.R.; Gaio, G.; Carreno, E.M.; Lotufo, A.D.P.; Teixeira, L.A. Electrical load forecasting in disaggregated levels using Fuzzy ARTMAP artificial neural network and noise removal by singular spectrum analysis. SN Appl. Sci. 2020, 2, 1218. [Google Scholar] [CrossRef]
Grossberg, S. A Path Toward Explainable AI and Autonomous Adaptive Intelligence: Deep Learning, Adaptive Resonance, and Models of Perception, Emotion, and Action. Front. Neurorobotics 2020, 14, 36. [Google Scholar] [CrossRef]
Abreu, T.; Amorim, A.J.; Santos-Junior, C.R.; Lotufo, A.D.; Minussi, C.R. Multinodal load forecasting for distribution systems using a fuzzy-artmap neural network. Appl. Soft Comput. 2018, 71, 307–316. [Google Scholar] [CrossRef]
Carpenter, G.A.; Grossberg, S.; Markuzon, N.; Reynolds, J.H.; Rosen, D.B. Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans. Neural Netw. 1992, 3, 698–713. [Google Scholar] [CrossRef]
Mukhtar, K.; Ingham, M.; Rodger, C.J.; Mac Manus, D.H.; Divett, T.; Heise, W.; Bertrand, E.; Dalzell, M.; Petersen, T. Calculation of GIC in the North Island of New Zealand Using MT Data and Thin-Sheet Modeling. Space Weather 2020, 18, e2020SW002580. [Google Scholar] [CrossRef]
Zadeh, L.A. Fuzzy sets. Inf. Control 1965, 8, 338–353. [Google Scholar] [CrossRef]
Altran, A.B. Sistema Inteligente Para Previsão de Carga Multinodal em Sistemas Elétricos de Potência. Ph.D. Thesis, Departamento de Engenharia Elétrica, Universidade Estadual Paulista, Ilha Solteira, Brazil, 2010. [Google Scholar]
Nose-Filho, K.; Lotufo, A.D.P.; Minussi, C.R. Short-Term Multinodal Load Forecasting Using a Modified General Regression Neural Network. IEEE Trans. Power Deliv. 2011, 26, 2862–2869. [Google Scholar] [CrossRef]
Christ, M.; Braun, N.; Neuffer, J.; Kempa-Liehr, A.W. Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh—A Python package). Neurocomputing 2018, 307, 72–77. [Google Scholar] [CrossRef]
Hochma, Y.; Felendler, Y.; Last, M. Efficient Feature Ranking and Selection Using Statistical Moments. IEEE Access 2024, 12, 105573–105587. [Google Scholar] [CrossRef]
Felice, F.; Ley, C.; Bordas, S.P.A.; Groll, A. Boosting any learning algorithm with Statistically Enhanced Learning. Sci. Rep. 2025, 15, 1605. [Google Scholar] [CrossRef]
Bishop, C.M. Pattern Recognition and Machine Learning (Information Science and Statistics); Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Zheng, A.; Casari, A. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists, 1st ed.; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2018. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Mouloodi, S.; Rahmanpanah, H.; Gohery, S.; Burvill, C.; Davies, H.M. Feedforward backpropagation artificial neural networks for predicting mechanical responses in complex nonlinear structures: A study on a long bone. J. Mech. Behav. Biomed. Mater. 2022, 128, 105079. [Google Scholar] [CrossRef]
Shrestha, A.; Mahmood, A. Review of Deep Learning Algorithms and Architectures. IEEE Access 2019, 7, 53040–53065. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Drucker, H.; Burges, C.J.C.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. In Proceedings of the 10th International Conference on Neural Information Processing Systems (NIPS’96), Cambridge, MA, USA, 2–5 December 1996; pp. 155–161. [Google Scholar]
de Myttenaere, A.; Golden, B.; Le Grand, B.; Rossi, F. Mean Absolute Percentage Error for regression models. Neurocomputing 2016, 192, 38–48. [Google Scholar] [CrossRef]

Figure 1. Map of the power transmission network in the north of New Zealand’s North Island (adapted from [35]).

Figure 2. Fuzzy ARTMAP architecture.

Figure 3. Strategy for global load forecasting with FAM-ANN.

Figure 4. Multinodal load forecasting system.

Figure 5. Actual and forecast values for Kopu substation.

Figure 6. Actual and forecast values for Waikino substation.

Figure 7. Actual and forecast values for Waihou substation.

Figure 8. Actual and forecast values for Hamilton 11 substation.

Figure 9. Actual and forecast values for Hamilton 33 substation.

Figure 10. Actual and forecast values for the Cambridge substation.

Figure 11. Actual and forecast figures for the Te Awamutu substation.

Figure 12. Actual and forecast figures for the Hinuera substation.

Figure 13. Actual and forecast values for the Kinleith substation.

Figure 14. Prediction time per sample for all substations and windowing techniques.

Figure 15. Three-dimensional visualization of the relation between the parameters

ρ_{a}, β

, and number of categories in module A: (a) maximum, (b) minimum, (c) mean, and (d) standard deviation.

Figure 15. Three-dimensional visualization of the relation between the parameters

ρ_{a}, β

, and number of categories in module A: (a) maximum, (b) minimum, (c) mean, and (d) standard deviation.

Figure 16. Three-dimensional visualization of the relation between the parameters

ρ_{b}, β

, and number of categories in module B: (a) maximum, (b) minimum, (c) mean, and (d) standard deviation.

Figure 16. Three-dimensional visualization of the relation between the parameters

ρ_{b}, β

, and number of categories in module B: (a) maximum, (b) minimum, (c) mean, and (d) standard deviation.