You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

9 February 2024

Resilient Electricity Load Forecasting Network with Collective Intelligence Predictor for Smart Cities †

and
Graduate School of Sciences and Technology for Innovation, Yamaguchi University, Yamaguchi 753-8511, Japan
*
Authors to whom correspondence should be addressed.
This paper is an extended version of our paper published in Bin Kamilin, M.H.; Yamaguchi, S.; Bin Ahmadon, M.A. Fault-Tolerance and Zero-Downtime Electricity Forecasting in Smart City. In Proceedings of the 2023 IEEE 12th Global Conference on Consumer Electronics (GCCE), Nara, Japan, 10–13 October 2023; pp. 298–301.
This article belongs to the Special Issue Security and Privacy in Networks and Multimedia

Abstract

Accurate electricity forecasting is essential for smart cities to maintain grid stability by allocating resources in advance, ensuring better integration with renewable energies, and lowering operation costs. However, most forecasting models that use machine learning cannot handle the missing values and possess a single point of failure. With rapid technological advancement, smart cities are becoming lucrative targets for cyberattacks to induce packet loss or take down servers offline via distributed denial-of-service attacks, disrupting the forecasting system and inducing missing values in the electricity load data. This paper proposes a collective intelligence predictor, which uses modular three-level forecasting networks to decentralize and strengthen against missing values. Compared to the existing forecasting models, it achieves a coefficient of determination score of 0.98831 with no missing values using the base model in the Level 0 network. As the missing values in the forecasted zone rise to 90% and a single-model forecasting method is no longer effective, it achieves a score of 0.89345 with a meta-model in the Level 1 network to aggregate the results from the base models in Level 0. Finally, as missing values reach 100%, it achieves a score of 0.81445 by reconstructing the forecast from other zones using the meta-model in the Level 2 network.

1. Introduction

With digital technologies becoming more incorporated into smart city management systems, machine learning (ML) is widely proposed as a forecasting model to predict the electricity load with high accuracy in smart cities []. Having the capability to accurately forecast the electricity load is necessary to allow the smart grids to distribute the electric power in advance to avoid overloading the electricity delivery network [], better renewable energy integration with traditional energy to generate electricity [], and minimize the operation loss during the peak hours []. As the scale of electricity infrastructure and reliability directly correlates to economic growth, it is crucial to maintain reliable service to avoid financial losses and interruption of other essential services [].
However, digitalizing essential infrastructures in smart cities opens up new problems, such as cyberattacks against the infrastructures in smart cities. IBM Security observes this trend where 10.7% of cyberattacks in 2022 happened in the energy sector alone []. Looking deeper into distributed denial-of-service (DDoS) attacks that could cause packet loss and bring the server offline [], the Azure Network Security Team reported that 89% of DDoS attacks span up to one hour [], which may add missing values (MV) in the electricity load data and disrupt a centralized forecasting system. Due to the importance of energy services, the attack on electricity infrastructure in Ukraine during the Russo–Ukrainian War in 2016 shows the potential weakness of the current system that enemies could exploit []. Hence, it is necessary to create a decentralized and resilient forecasting method to solve these issues.
Still, recent studies in forecasting the electricity load showed most ML implementations overlooked the issue posed by MV [,], which could occur due to the packet loss and potentially impacting forecasting accuracy in real-world applications. Several methods exist to tackle this problem. Jung et al. [] proposed a novel imputation technique to fill the MV accurately. In addition, there are also lightweight alternatives that sacrifice accuracy to train and evaluate MLs that use artificial neural networks (ANN), such as padding, which replaces MV with a placeholder value, and masking, which excludes MV from the computation, as noted by Rodenburg et al. []. As well as inadequate MV handling, recent studies also disregarded the single point of failure (SPoF) vulnerability, which could bring the entire forecasting system down when the server hosting the centralized ML architecture is offline []. Although existing distributed ML architectures could solve this [], they are inefficient, and the data heterogeneity could negatively impact the accuracy [].
In this study, we tackle the issues with MV in the electricity load data due to packet loss and SPoF due to the server hosting the forecasting system being taken offline from the DDoS attacks by proposing the Collective Intelligence Predictor (CIP) implementation, forming modular three-level forecasting networks of distributed MLs shown in Figure 1 to forecast the next one hour of electricity load data, matching the DDoS duration. Although weather and calendar data are proven to improve the electricity load forecasting accuracy in existing studies [,], this paper focuses solely on the electricity load data to investigate how well the design in CIP could perform against existing methods without relying on external data to negate the accuracy penalty when forecasting with MV.
Figure 1. Generalized overview of a Collective Intelligence Predictor forming modular three-level forecasting networks to forecast the electricity load.
Each level in CIP represents three forecasting methods that it can use to forecast the electricity load. The modularity comes from CIP behaviors in activating the networks based on the MV percentages in the electricity load data used as independent variables, reducing unnecessary computation to forecast the electricity load. In addition, it increases the effective range the CIP can handle the MV to forecast the electricity load.
During regular operations where the independent variables have no MV, CIP relies exclusively on the base model trained with 0% MV in Level 0 to “predict” the electricity load in the zone CIP was assigned. As there is no MV, the forecasting accuracy from a single base model trained with 0% MV is sufficient to forecast the electricity load accurately. When the MV percentages in the independent variables range from 1% to 90%, CIP uses the meta-model in Level 1 to “improvise” the forecast by combining and refining the predictions from the base models in Level 0. Each base model is trained with different percentages of MV to contribute diversity in handling different MV percentages, allowing a broader effective range of CIP to forecast the electricity load as the MV percentages rise. Finally, when the MV percentages in the independent variables range from 91% to 100%, and it is no longer potent to use Level 1 to forecast, CIP uses the meta-model in Level 2 to create a “copycat” by reconstructing the forecasts taken from other CIPs meta-models in Level 1. Figure 2 summarized the CIP behaviors in activating the networks.
Figure 2. Collective Intelligence Predictor behaviors in activating the networks to handle different missing values percentages.
The primary contribution of this paper lies in developing a decentralized multi-level network of MLs that has modularity in its structure, the capability to handle a broader range of MV percentages, and a failsafe mechanism in Level 2 to reconstruct the forecast when handling MV the MV percentages is too high, which is unattainable with existing electricity load forecasting methods. In addition, with the implementation of multiple levels of networks, CIP could reduce unnecessary computation to forecast the electricity load by activating only the necessary MLs to forecast and increase the effective range of MV percentages CIP can handle when needed. Furthermore, CIP uses two feature selections to choose the best electric load data to improve forecasting accuracy and reconstruction. The contributions are significant in pioneering research predicting the electricity load to address security and reliability issues.
After the introduction in Section 1, Section 2 provides the preliminary for the dataset, feature selection algorithms, hyperparameter optimization, network construction, and comparison with the previous studies in this field. Section 3 provides the concept, application, and model training to implement CIP. Section 4 presented the evaluation of CIP with different MV percentages and compared forecasting accuracy with the existing centralized model architectures. Finally, the work is summarized, and we conclude the future planning for this research in Section 5.

3. Implementation

3.1. Overview

This section presents the CIP concept to implement modular three-level forecasting networks, its application on feature selections, hyperparameter optimization, and network construction to forecast the electricity load in zone W E S T of New York State, and the training methods used to train the ML models in Level 0, Level 1, and Level 2 networks in CIP. Figure 7 shows the high-level summary for the CIP concept, implementation, and training to implement CIP.
Figure 7. High-level summary of Collective Intelligence Predictor concept, application to forecast the electricity load in zone W E S T of New York State, and the methods used to train the models.

3.2. Concept

Referring to the generalized overview of CIP in Figure 1, CIP utilizes multi-level networks to distribute the ML models as a countermeasure against SPoF vulnerability. The models are connected to form forecasting networks similar to the multi-layer stacking ensemble learning, shown in Figure 6 to reduce the accuracy penalty when forecasting with MV. Figure 8 represented the CIP networks architecture to forecast the electricity load in zone α , where we define the C I P α ’s forecast using the “predict” method as α P r e d i c t 12 t < 24 ^ , “improvise” method as α I m p r o v i s e 12 t < 24 ^ , and “copycat” method as α C o p y c a t 12 t < 24 ^ .
Figure 8. Collective Intelligence Predictor implementation C I P α to forecast the electricity load in zone α using “predict”, “improvise”, and “copycat” methods.
Referring to the summarized CIP behavior in Figure 2, CIP has a hierarchical network structure of Level 0, Level 1, and Level 2 to handle different MV percentages accordingly. With Predict(), Improvise(), and Copycat() functions representing “predict,” “improvise,” and “copycat” forecasting methods in C I P α , Algorithm 1 shows the pseudocode to choose either “predict”, “improvise”, or “copycat” forecasting method based on the total MV percentage in the independent variables ( α 0 t < 12 , β 0 t < 12 , γ 0 t < 12 ).
Algorithm 1 Networks activation in C I P α .
Input:  α 0 t < 12 , β 0 t < 12 , γ 0 t < 12
Output:  α 12 t < 24 ^ α P r e d i c t 12 t < 24 ^ , α I m p r o v i s e 12 t < 24 ^ , α C o p y c a t 12 t < 24 ^
  1:
c o n c a t e n a t e α 0 t < 12 + β 0 t < 12 + γ 0 t < 12
  2:
m v _ c o u n t { c i c o n c a t e n a t e : c i = null }
  3:
m v _ p e r c e n t a g e m v _ c o u n t / c o n c a t e n a t e × 100 %
  4:
if  m v _ p e r c e n t a g e = 0  then
  5:
     α P r e d i c t 12 t < 24 ^ Predict ( )
  6:
    return  α P r e d i c t 12 t < 24 ^
  7:
else if  1 m v _ p e r c e n t a g e 90  then
  8:
     α I m p r o v i s e 12 t < 24 ^ Improvise ( )
  9:
    return  α I m p r o v i s e 12 t < 24 ^
10:
else if  91 m v _ p e r c e n t a g e 100  then
11:
     α C o p y c a t 12 t < 24 ^ Copycat ( )
12:
    return  α C o p y c a t 12 t < 24 ^
13:
end if

3.2.1. Level 0

When the sum of MV percentages in the independent variables is 0%, CIP relies on the “predict” forecasting method in Level 0, where CIP uses only the B a s e 0 % α base model in Level 0 to obtain α P r e d i c t 12 t < 24 ^ , reducing unnecessary computation and operation cost in regular operation to forecast the electricity load during normal operation. The only time CIP will activate all the base models in Level 0 is when the MV percentages in the independent variables are more than 1%, as the meta-model in Level 1 needs to combine the forecasts from the base models to obtain α I m p r o v i s e 12 t < 24 ^ .
CIP uses ten base models B a s e M V α in C I P α trained using the dataset simulated with MV in Section 2.2 to introduce diversity in handling a wide range of MV percentages during deployment. The base-model architecture shown in Figure 9 is a multivariable stacked LSTM that utilizes hyperbolic tangent (TanH) as the activation function in each layer, where k and l represent the number of LSTM units in the first and second layers of the base-model.
Figure 9. Multivariable stacked Long Short-Term Memory architecture implementation for the base model in Level 0 network.
Assuming the electricity load zones from β and γ could improve the electricity load forecast in zone α , we choose multivariable stacked LSTM as the architecture in the base model due to its capability to grasp the dependencies from the independent variable CIP wants to forecast ( α 0 t < 12 ) and independent variables that have strong correlation to improve the forecasting accuracy ( β 0 t < 12 , γ 0 t < 12 ) in zone α , allowing each of the base model in Level 0 to have a broader range of MV percentages it can handle before the forecasting accuracy degrade.
With B a s e 0 % α ( ) representing the base model trained with dataset that has 0% of MV, Algorithm 2 shows the pseudocode for Predict() function to obtain α P r e d i c t 12 t < 24 ^ .
Algorithm 2 Predict() function in C I P α .
Input:  α 0 t < 12 , β 0 t < 12 , γ 0 t < 12
Output:  α P r e d i c t 12 t < 24 ^
  1:
α P r e d i c t 12 t < 24 ^ B a s e 0 % α ( α 0 t < 12 , β 0 t < 12 , γ 0 t < 12 )
  2:
return  α P r e d i c t 12 t < 24 ^

3.2.2. Level 1

When the MV percentages in the independent variables ranged from 1% to 90%, CIP relies on the “improvise” forecasting method in Level 1, where all B a s e M V α in C I P α ’s Level 0 are activated for the M e t a α 1 meta-model in Level 1 to combine their forecasts. As each B a s e M V α has its effective MV percentage range to forecast the electricity load, combining the result with M e t a α 1 ensures minimal forecasting accuracy degradation as the MV percentage rises, which is impossible with the bagging ensemble learning that averages the forecasts from the base models.
Following the same assumption in Level 0, CIP combine the forecasts from all B a s e M V α in Level 0 ( α B a s e _ 0 % 12 t < 24 ^ , α B a s e _ 10 % 12 t < 24 ^ , α B a s e _ 20 % 12 t < 24 ^ , , α B a s e _ 90 % 12 t < 24 ^ ) and the same electricity load data ( α 0 t < 12 , β 0 t < 12 , γ 0 t < 12 ) used by B a s e M V α using M e t a α 1 . The meta-model architecture shown in Figure 10 uses a multivariable deep neural network (DNN) model that utilizes TanH as the activation function in each dense layer. The numbers 156, 75, and 75 represent the dense unit numbers in the first, second, and third layers of M e t a α 1 . We choose multivariable DNN as the architecture in the meta-model due to its capability to fine-tune the combined forecasts from the B a s e M V α by the amount of MV that exists in the electricity load data used to forecast in zone α , which is impossible with other algorithms that do not consider the amount of MV in the independent variables.
Figure 10. Multivariable Deep Neural Network architecture implementation for the meta-model in Level 1 network.
With M e t a α 1 ( ) function representing the multivariable DNN meta-model in Level 1 network, Algorithm 3 shows the pseudocode to concatenate the forecasts from all B a s e M V α and the electricity load data used by B a s e M V α to obtain α I m p r o v i s e 12 t < 24 ^ .
Algorithm 3 Improvise() function in C I P α .
Input:  α 0 t < 12 , β 0 t < 12 , γ 0 t < 12
       and α B a s e _ 0 % 12 t < 24 ^ , α B a s e _ 10 % 12 t < 24 ^ , α B a s e _ 20 % 12 t < 24 ^ , , α B a s e _ 90 % 12 t < 24 ^
Output:  α I m p r o v i s e 12 t < 24 ^
  1:
c o n c a t _ a α 0 t < 12 + β 0 t < 12 + γ 0 t < 12
  2:
c o n c a t _ b α B a s e _ 0 % 12 t < 24 ^ + α B a s e _ 10 % 12 t < 24 ^ + α B a s e _ 20 % 12 t < 24 ^ + + α B a s e _ 90 % 12 t < 24 ^
  3:
c o n c a t _ c c o n c a t _ a + c o n c a t _ b
  4:
α I m p r o v i s e 12 t < 24 ^ M e t a α 1 ( c o n c a t _ c )
  5:
return  α I m p r o v i s e 12 t < 24 ^

3.2.3. Level 2

When the MV percentages in the independent variables exceed 90%, CIP relies on the “copycat” forecasting method in Level 2, where M e t a α 2 in Level 2 reconstruct the forecasts in zone α by combining the forecast from the M e t a 1 meta-models in Level 1 taken from C I P β , C I P γ , and C I P δ . Although it is inferior in accuracy, it performs well in high MV environments where “predict” and “improvise” failed.
Similar to the M e t a α 1 in Level 1, the meta-model M e t a α 2 shown in Figure 11 uses a multivariable DNN model that utilizes TanH as the activation function in each dense layer. The only differences are the number of dense units in the first, second, third, and fourth layers, which are 144, 144, 72, and 36. With the assumption that electricity load zones β , γ , and δ could reconstruct the electricity load forecast in zone α , M e t a 2 α combines the forecasts with strong causality taken from the M e t a 1 in Level 1 of C I P β , C I P γ , and C I P δ ( β I m p r o v i s e 12 t < 24 ^ , γ I m p r o v i s e 12 t < 24 ^ , δ I m p r o v i s e 12 t < 24 ^ ) together with the electricity load data corresponding to the zones other CIPs are assigned ( β 0 t < 12 , γ 0 t < 12 , δ 0 t < 12 ) to reconstruct the electricity load forecast in zone α as α C o p y c a t 12 t < 24 ^ . As redundancy is necessary for reconstruction, M e t a 2 α uses Granger causality to avoid selecting the same data chosen by the Kendall rank correlation coefficient used in Level 0 and Level 1 networks.
Figure 11. Multivariable Deep Neural Network architecture implementation for the meta-model in Level 2 network.
With M e t a 2 α ( ) function representing the multivariable DNN meta-model in Level 2 network, Algorithm 4 shows the pseudocode to concatenate the electricity load data and the forecasts from the M e t a 1 in Level 1 in C I P β , C I P γ , and C I P δ to obtain α C o p y c a t 12 t < 24 ^ .
Algorithm 4 Copycat() function in C I P α .
Input:  β 0 t < 12 , γ 0 t < 12 , δ 0 t < 12
       and β M e t a _ 1 12 t < 24 ^ , γ M e t a _ 1 12 t < 24 ^ , δ M e t a _ 1 12 t < 24 ^
Output:  α I m p r o v i s e 12 t < 24 ^
  1:
c o n c a t _ a β 0 t < 12 + γ 0 t < 12 + δ 0 t < 12
  2:
c o n c a t _ b β M e t a _ 1 12 t < 24 ^ + γ M e t a _ 1 12 t < 24 ^ + δ M e t a _ 1 12 t < 24 ^
  3:
c o n c a t _ c c o n c a t _ a + c o n c a t _ b
  4:
α C o p y c a t 12 t < 24 ^ M e t a 2 α ( c o n c a t _ c )
  5:
return α C o p y c a t 12 t < 24 ^

3.3. Application

3.3.1. Feature Selections

In this study, C I P W E S T was implemented to forecast the electricity load in zone W E S T of the New York State. To construct the Level 0, Level 1, and Level 2 networks in C I P W E S T , Kendall rank correlation coefficient and Granger causality introduced in Section 2.3 used on the training dataset prepared in Section 2.2 with 0% of MV. Figure A1 and Figure A2 shown in Appendix A are the generated feature selection heatmap on the training dataset. Using the Kendall rank correlation coefficient, zones G E N E S E and C E N T R L are used to construct the Level 0 and Level 1 networks in C I P W E S T . Using Granger causality, zones G E N E S E , N O R T H , and M H K V L are suggested to construct Level 2 network in C I P W E S T . However, as the Kendall rank correlation coefficient has selected G E N E S E , we replace it with N O R T H as the next zone with high causality to ensure redundancy.
Figure 12 shows the C I P W E S T network implementation based on the zones selected by Kendall rank correlation coefficient and Granger causality to construct the Level 0, Level 1, and Level 2 networks. As the M e t a 2 W E S T in C I P W E S T requires the M e t a 1 forecasts taken from C I P N O R T H , C I P M H K V L , and C I P C A P I T L , we implemented the CIPs up to Level 1 network, where the selected zones for each CIP network shown in Table 2.
Figure 12. C I P W E S T networks implementation based on the recommendation zones that may improve the forecasting accuracy and reconstruction in zone W E S T .
Table 2. The zones selected by Kendall rank correlation coefficient to create the Level 0 and Level 1 networks in C I P N O R T H , C I P M H K V L , and C I P C A P I T L .

3.3.2. Hyperparameter Optimization

Using the Keras Tuner introduced in Section 2.4, we optimized the hyperparameters for the base models implemented in C I P W E S T , C I P N O R T H , C I P M H K V L , and C I P C A P I T L with Bayesian optimization. Using fixed randomization, we tuned the base models using the training dataset with 0% of MV prepared in Section 2.2, optimization objective set to minimize the root-mean-square error (RMSE) score, five initial random points to start, and a maximum number of trials set to 5. Furthermore, we set the search range for the first and second LSTM layers units from 32 to 256 with 32 steps and the learning rate for Adam to choose from 0.001, 0.0001, and 0.00001.
Table 3 shows the hyperparameter optimization outcome. Most base models share the same hyperparameters, except the base model in C I P N O R T H . Most likely, it is due to most zones having a weak correlation zone N o r t h , leading to a different optimization outcome.
Table 3. The hyperparameters obtained for the base models in C I P W E S T , C I P N O R T H , C I P M H K V L , and C I P C A P I T L with Bayesian optimization.

3.4. Training

3.4.1. Level 0

To train the B a s e M V W E S T in the Level 0 network of C I P W E S T , ten untrained base models are prepared based on the hyperparameters defined in Table 3, where the first and second LSTM layers use TanH with 192 units in the first layer, and 96 units in the second layer, and 0.001 as the learning rate for Adam optimizer. Using the random seed to replicate the weight initialization, each base model trained with a dataset with different MV percentages prepared in Section 2.2, where M V = 0 % , 10 % , 20 % , , 90 % , 1000 batch size, 100 training epoch, and the early stop set to 3 with 0.0001 as the minimum observable improvement on mean squared error (MSE).
The same method are used to train the B a s e M V in the Level 0 network of C I P N O R T H , C I P M H K B L , and C I P C A P I T L for the M e t a 2 W E S T to use in reconstructing the forecast in zone W E S T .

3.4.2. Level 1

To train the M e t a 1 W E S T in the Level 1 network of C I P W E S T , the forecasts from the base models B a s e M V W E S T = B a s e 0 % W E S T , B a s e 10 % W E S T , B a s e 20 % W E S T , , B a s e 90 % W E S T done with different MV percentages in the training dataset are aggregated. Using the same hyperparameters described in Section 3.2 for M e t a 1 α , M e t a 1 W E S T is prepared and trained with the training dataset, and the aggregated forecasts from the B a s e M V W E S T , where the batch size is 1000, 100 training epoch, 0.0001 learning rate for Adam, and the early stop set to 3 with 0.0001 as the minimum observable improvement on MSE.
The same method are used to train the M e t a 1 in the Level 1 network of C I P N O R T H , C I P M H K B L , and C I P C A P I T L for the M e t a 2 W E S T to use in reconstructing the forecast in zone W E S T .

3.4.3. Level 2

To train the M e t a 2 W E S T in the Level 2 network of C I P W E S T , the forecasts from the M e t a 1 N O R T H , M e t a 1 M H K V L , and M e t a 1 C A P I T L taken from the Level 1 networks of C I P N O R T H , C I P M H K B L , and C I P C A P I T L done with different MV percentages in the training dataset are aggregated. Using the same hyperparameters described in Section 3.2 for M e t a 2 α , M e t a 2 W E S T is prepared and trained with the training dataset and the aggregated M e t a 1 forecasts from the C I P N O R T H , C I P M H K B L , and C I P C A P I T L , where the batch size is 1000, 100 training epoch, 0.0001 learning rate for Adam, and the early stop set to 3 with 0.0001 as the minimum observable improvement on MSE.

4. Evaluation

4.1. Overview

This section presents the Transformer, boosting ensemble learning, TCN, and stacked LSTM as the previous methods to compare against CIP in forecasting the electricity load in zone W E S T , forecasting outcome on different percentages of simulated MV, and the forecasting outcome when part of the CIP networks was offline due to the DDoS attack. Figure 13 shows the high-level summary for the previous methods, various MV percentages simulation and compromised network simulation.
Figure 13. High-level summary of the previous forecasting methods, forecasting outcome on various simulated missing values percentages, and forecasting outcome with compromised network.

4.2. Previous Methods

4.2.1. Transformer

Figure 14 shows the Transformer model implementation to forecast the electricity load in zone W E S T , where the head_size represents the size of the attention heads, the num_head represents the number of attention heads in the multi-head attention layer, the ff_dim represents the size of the feed-forward layer inside the Transformer block, and the num_transformer_blocks as the number of Transformer blocks stacked in the model. In addition, the mlp_units represents the number of units in each fully connected layer of the multi-layer perceptron (MLP) following the Transformer blocks, mlp_dropout represents the dropout rate in the output of each fully connected layer in the MLP, and ovl_dropout represents the dropout rate in the output of the multi-head attention layer in each Transformer block.
Figure 14. Transformer-based electricity load forecasting model to forecast the electricity load in zone W E S T .
As the Transformer model tends to overfit when trained with ten training datasets that have the same electricity load data with varying MV percentages prepared in Section 2.2, we took the training dataset with 0% of MV and simulated 25% of MV in it instead, which is the technique we used in our previous study to prevent overfitting. Models that exhibit overfitting will show unexpected behavior, where the forecasting accuracy will only increase as the MV percentages increase, making it unpractical for normal operations.
We trained the Transformer model with a batch size of 1000, 100 training epoch, 0.0001 learning rate for Adam, and the early stop set to 3 with 0.0001 as the minimum observable improvement on MSE.

4.2.2. Boosting Ensemble Learning

Figure 15 shows the boosting ensemble learning model implementation to forecast the electricity load in zone W E S T . We implemented the boosting ensemble learning based on eXtreme Gradient Boosting (XGBoost) [], where the max_depth represents the maximum depth of each tree in the boosting process, the learning_rate represents the step size at each iteration while moving toward a minimum of the loss function, and the objective as reg:squarederror represents the specified learning task and objective function, which show the model trained for regression problem to minimize the MSE.
Figure 15. Boosting ensemble learning-based electricity load forecasting model to forecast the electricity load in zone W E S T .
Similar to the Transformer model, even with the early stop function set to stop the training when the MSE score no longer improves after three times, the XGBoost model exhibits overfitting tendencies when trained with ten training datasets with varying MV percentages prepared in Section 2.2. We solved this issue using the training dataset with 25% of MV used on the Transformer model to train the XGBoost model in 500 epochs.

4.2.3. Temporal Convolutional Network

Figure 16 shows the TCN-based model implementation to forecast the electricity load in zone W E S T , where the first and second convolutional layers use 64 filters, kernel size set to 3, and padding set to causal to ensure the current output depends only on the current and past input, while the third dense layer has 50 units. The convolutional and the dense layers use rectified linear units (ReLU) as the activation function.
Figure 16. Temporal Convolutional Network-based electricity load forecasting model to forecast the electricity load in zone W E S T .
As the TCN model does not exhibit the overfitting behavior shown in the Transformer and XGBoost model, we used ten training datasets that have the same electricity load data with varying MV percentages prepared in Section 2.2, concatenated into one long sequence to train the TCN model with a batch size of 1000, 100 training epoch, 0.0001 learning rate for Adam, and the early stop set to 3 with 0.0001 as the minimum observable improvement on MSE.

4.2.4. Stacked Long Short-Term Memory

Figure 17 shows the stacked LSTM-based model implementation to forecast the electricity load in zone W E S T , where the first and second LSTM layers use 32 units and TanH as the activation function.
Figure 17. Stacked Long Short-Term Memory-based electricity load forecasting model to forecast the electricity load in zone W E S T .
Similar to the TCN model, we use ten training datasets that have the same electricity load data with varying MV percentages prepared in Section 2.2, concatenated into one long sequence to train the LSTM model with a batch size of 1000, 100 training epoch, 0.0001 learning rate for Adam, and the early stop set to 3 with 0.0001 as the minimum observable improvement on MSE.

4.3. 0–90% Missing Values Simulation

In this test, we used evaluation datasets with different MV percentages prepared in Section 2.2 to evaluate the forecasting accuracy of CIP, TCN, boosting ensemble learning, Transformer, and stacked LSTM models on the electricity load in zone W E S T . Table 4 and Figure 18 show the r 2 forecasting scores on zone W E S T . We used r 2 to calculate the forecasting accuracy, as the ease of interpretability gives us a generalized idea of how similar the forecast would match with the plotted real values [].
Table 4. Coefficient of determination ( r 2 ) scores comparison between multiple forecasting methods on different missing values percentages in zone W E S T .
Figure 18. Plotted coefficient of determination ( r 2 ) scores comparison between multiple forecasting methods on different missing values percentages in zone W E S T .
With 0% of MV in zones W E S T , G E N E S E , and C E N T R L , CIP utilizes the “predict” forecasting method in the Level 0 network to forecast the electricity load, which achieves the highest r 2 score of 0.98831 when compared to the previous forecasting methods. As the MV percentages in W E S T , G E N E S E , and C E N T R L rise from 1% to 90%, CIP utilizes the “improvise” forecasting method to combine the forecast from base models in Level 0 network and fine-tune them into one forecast using the meta-model in Level 1 network, which yields r 2 score of 0.96225 with 80% of MV in the independent variables. In contrast, none of the previous forecasting methods achieve r 2 score of 0.9 and above with 80% of MV. Even with 90% of MV, the r 2 score on CIP only falls to 0.89345, showing the resilience of our proposed method against MV, as the r 2 scores for the previous methods already fall below 0.7.
Examining the forecasting accuracies for the previous forecasting methods, TCN and stacked LSTM models are the only previous methods that equally perform well and could maintain a r 2 score of 0.95 with 70% of MV in the independent variable. These results show the TCN and stacked LSTM capability in capturing the dependencies in the independent variables with either convolutional layers or gating mechanisms without MV negatively affecting the forecast.
Table 5 and Figure 19 show the RMSE forecasting scores on zone W E S T , which support the results shown in Table 4 and Figure 18 where CIP surpass previous forecasting methods in resiliency against MV.
Table 5. Root-mean-square error scores comparison between multiple forecasting methods on different missing values percentages.
Figure 19. Plotted root-mean-square error scores comparison between multiple forecasting methods on different missing values percentages in zone W E S T .

4.4. 100% Missing Values Simulation

In this test, we set the MV percentages to 100% for zones W E S T , G E N E S E , and C E N T R L . As it is impossible to forecast with 100% of MV, CIP relies on the “copycat” forecasting method to reconstruct the forecast for zone W E S T , where the MV percentages in each zone rise from 0% to 90% with a 10% increment. Table 6 and Figure 20 show the results where CIP obtained an r 2 score of 0.81445 with 0% of MV. In addition, the score only drops to 0.74013 with 90% of MV, which is a 9.56142% degradation.
Table 6. Coefficient of determination ( r 2 ) and root-mean-square error scores obtained from the reconstructed electricity load forecast for zone W E S T .
Figure 20. Plotted coefficient of determination ( r 2 ) and root-mean-square error scores obtained from the reconstructed electricity load forecast for zone W E S T .
Although the forecast accuracy using the “copycat” method is inferior, it constructed the forecast in zone W E S T even with 100% of MV in W E S T , G E N E S E , and C E N T R L , which is unattainable with previous methods.

4.5. Compromised Network Simulation

In the final test, we simulated a scenario where Level 1 and Level 2 networks in C I P W E S T are offline. Using the base model trained with an MV percentage close to the MV percentage in the input data, we could obtain an accurate prediction similar to the meta-model in Level 1. Table 7 and Figure 21 show the forecasting outcome using the individual base models.
Table 7. Coefficient of determination ( r 2 ) and root-mean-square error scores obtained from the individual base-model load forecast for zone W E S T .
Figure 21. Plotted coefficient of determination ( r 2 ) and root-mean-square error scores obtained from the individual base-model load forecast for zone W E S T .

5. Conclusions

The digitalization of essential infrastructures in smart cities introduces new challenges. With the increasing threat of cyberattacks targeting the electricity infrastructure, we must design countermeasures to ensure the service will not be interrupted, which could negatively impact the economy and other essential services. We proposed CIP, a distributed forecasting network that could handle a high percentage of MV and solve the SPoF vulnerability to prevent interruption. CIP works by utilizing multi-level networks to forecast the electricity load based on the MV percentage in the input sequence. When there is no MV, we rely solely on the base model in Level 0 to “predict” the electricity load to reduce unnecessary computation, with an r 2 score of 0.98831. As the MV rises from 1% to 90%, CIP utilizes the meta-model in the Level 1 network to “improvise” the “prediction” from the base models from Level 0, which allows our proposed method to handle up to 80% of MV while maintaining r 2 score of 0.96225. Even when one of the data sources providing the electricity load data is offline, we reconstruct the forecast using a meta-model in Level 2 to create a “copycat” forecast, which CIP reconstructs from electricity load data from other zones with r 2 score of 0.81445. Finally, as our proposed forecast method is modular, the predictions from the individual base models trained with the MV percentage close to the input data are accessible with comparable accuracy with the meta-model in Level 1.
For future works, we aim to expand the capability of our CIP to handle concept drift by integrating our previous research using radian scaling [], detecting data falsification, and improving the forecasting accuracy in Level 2 using different types of data, as our current research is limited only to the electricity load data from other zones.

Author Contributions

Data curation, M.H.B.K.; Formal analysis, M.H.B.K.; Funding acquisition, S.Y.; Investigation, M.H.B.K.; Methodology, M.H.B.K. and S.Y.; Project administration, S.Y. Resources, M.H.B.K.; Software, M.H.B.K.; Supervision, S.Y.; Validation, M.H.B.K.; Writing—original draft, M.H.B.K.; Writing—review and editing, S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by JST SPRING, Grant Number JPMJSP2111, and Interface Corporation, Japan.

Data Availability Statement

Data presented in this study are openly available from New York Independent System Operator at https://www.nyiso.com/load-data (accessed on 6 December 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MLMachine Learning
DDoSDistributed Denial-of-Service
MVMissing Values
ANNArtificial Neural Networks
SPoFSingle Point of Failure
CIPCollective Intelligence Predictor
NYISONew York Independent System Operator
RNNRecurrent Neural Networks
LSTMLong Short-Term Memory
GRUGated Recurrent Unit
TCNTemporal Convolutional Network
LPLinear Programming
MAEMean Absolute Error
TanHHyperbolic Tangent
DNNDeep Neural Networks
RMSERoot-Mean-Square Error
MSEMean squared Error
MLPMulti-Layer Perceptron
XGBoosteXtreme Gradient Boosting
ReLURectified Linear Units

Appendix A

Figure A1. Kendall rank correlation coefficient heatmap on the New York Independent System Operator’s dataset.
Figure A2. Granger causality heatmap on the New York Independent System Operator’s dataset.

References

  1. Nti, I.K.; Teimeh, M.; Nyarko-Boateng, O.; Adekoya, A.F. Electricity load forecasting: A systematic review. J. Electr. Syst. Inf. Technol. 2020, 7, 13. [Google Scholar] [CrossRef]
  2. Kruse, J.; Schäfer, B.; Witthaut, D. Predictability of Power Grid Frequency. IEEE Access 2020, 8, 149435–149446. [Google Scholar] [CrossRef]
  3. Sweeney, C.; Bessa, R.J.; Browell, J.; Pinson, P. The future of forecasting for renewable energy. WIREs Energy Environ. 2020, 9, e365. [Google Scholar] [CrossRef]
  4. Klyuev, R.V.; Morgoev, I.D.; Morgoeva, A.D.; Gavrina, O.A.; Martyushev, N.V.; Efremenkov, E.A.; Mengxu, Q. Methods of Forecasting Electric Energy Consumption: A Literature Review. Energies 2022, 15, 8919. [Google Scholar] [CrossRef]
  5. Sue Wing, I.; Rose, A.Z. Economic consequence analysis of electric power infrastructure disruptions: General equilibrium approaches. Energy Econ. 2020, 89, 104756. [Google Scholar] [CrossRef]
  6. IBM Security. X-Force Threat Intelligence Index 2023. Available online: https://www.ibm.com/reports/threat-intelligence/ (accessed on 21 November 2023).
  7. Li, Y.; Liu, Q. A comprehensive review study of cyber-attacks and cyber security; Emerging trends and recent developments. Energy Rep. 2021, 7, 8176–8186. [Google Scholar] [CrossRef]
  8. Azure Network Security Team. 2022 in Review: DDoS Attack Trends and Insights. Microsoft. Available online: https://www.microsoft.com/en-us/security/blog/2023/02/21/2022-in-review-ddos-attack-trends-and-insights/ (accessed on 10 August 2023).
  9. Gjesvik, L.; Szulecki, K. Interpreting cyber-energy-security events: Experts, social imaginaries, and policy discourses around the 2016 Ukraine blackout. Eur. Secur. 2023, 32, 104–124. [Google Scholar] [CrossRef]
  10. Rodrigues, F.; Cardeira, C.; Calado, J.M.F.; Melicio, R. Short-Term Load Forecasting of Electricity Demand for the Residential Sector Based on Modelling Techniques: A Systematic Review. Energies 2023, 16, 4098. [Google Scholar] [CrossRef]
  11. Wazirali, R.; Yaghoubi, E.; Abujazar, M.S.S.; Ahmad, R.; Vakili, A.H. State-of-the-art review on energy and load forecasting in microgrids using artificial neural networks, machine learning, and deep learning techniques. Electr. Power Syst. Res. 2023, 225, 109792. [Google Scholar] [CrossRef]
  12. Jung, S.; Moon, J.; Park, S.; Rho, S.; Baik, S.W.; Hwang, E. Bagging Ensemble of Multilayer Perceptrons for Missing Electricity Consumption Data Imputation. Sensors 2020, 20, 1772. [Google Scholar] [CrossRef] [PubMed]
  13. Rodenburg, F.J.; Sawada, Y.; Hayashi, N. Improving RNN Performance by Modelling Informative Missingness with Combined Indicators. Appl. Sci. 2019, 9, 1623. [Google Scholar] [CrossRef]
  14. Myllyaho, L.; Raatikainen, M.; Männistö, T.; Nurminen, J.K.; Mikkonen, T. On misbehaviour and fault tolerance in machine learning systems. J. Syst. Softw. 2022, 183, 111096. [Google Scholar] [CrossRef]
  15. Dehghani, M.; Yazdanparast, Z. From distributed machine to distributed deep learning: A comprehensive survey. J. Big Data 2023, 10, 158. [Google Scholar] [CrossRef]
  16. Drainakis, G.; Pantazopoulos, P.; Katsaros, K.V.; Sourlas, V.; Amditis, A.; Kaklamani, D.I. From centralized to Federated Learning: Exploring performance and end-to-end resource consumption. Comput. Netw. 2023, 225, 109657. [Google Scholar] [CrossRef]
  17. Aguilar Madrid, E.; Antonio, N. Short-Term Electricity Load Forecasting with Machine Learning. Information 2021, 12, 50. [Google Scholar] [CrossRef]
  18. Jiang, W. Deep learning based short-term load forecasting incorporating calendar and weather information. Internet Technol. Lett. 2022, 5, e383. [Google Scholar] [CrossRef]
  19. New York Independent System Operator. Load Data. Available online: https://www.nyiso.com/load-data/ (accessed on 18 July 2023).
  20. Puth, M.-T.; Neuhäuser, M.; Ruxton, G.D. Effective use of Spearman’s and Kendall’s correlation coefficients for association between two measured traits. Anim. Behav. 2015, 102, 77–84. [Google Scholar] [CrossRef]
  21. Makowski, D.; Ben-Shachar, M.S.; Patil, I.; Lüdecke, D. Methods and algorithms for correlation analysis in R. J. Open Source Softw. 2020, 5, 2306. [Google Scholar] [CrossRef]
  22. Pandas 2.1.3. 2023. Available online: https://pandas.pydata.org (accessed on 18 November 2023).
  23. Shojaie, A.; Fox, E.B. Granger Causality: A Review and Recent Advances. Annu. Rev. Stat. Its Appl. 2022, 9, 289–319. [Google Scholar] [CrossRef] [PubMed]
  24. Statsmodels 0.14.0. 2023. Available online: https://www.statsmodels.org (accessed on 17 June 2023).
  25. Kadhim, Z.S.; Abdullah, H.S.; Ghathwan, K.I. Artificial Neural Network Hyperparameters Optimization: A Survey. Int. J. Online Biomed. Eng. 2022, 18, 59–87. [Google Scholar] [CrossRef]
  26. Wu, J.; Chen, X.-Y.; Zhang, H.; Xiong, L.-D.; Lei, H.; Deng, S.-H. Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar]
  27. Keras Tuner 1.4.6. 2023. Available online: https://github.com/keras-team/keras-tuner (accessed on 3 December 2023).
  28. TensorFlow 2.13.1. 2023. Available online: https://www.tensorflow.org (accessed on 4 September 2023).
  29. Shafieian, S.; Zulkernine, M. Multi-layer stacking ensemble learners for low footprint network intrusion detection. Complex Intell. Syst. 2023, 9, 3787–3799. [Google Scholar] [CrossRef]
  30. Abumohsen, M.; Owda, A.Y.; Owda, M. Electrical Load Forecasting Using LSTM, GRU, and RNN Algorithms. Energies 2023, 16, 2283. [Google Scholar] [CrossRef]
  31. Wan, R.; Mei, S.; Wang, J.; Liu, M.; Yang, F. Multivariate Temporal Convolutional Network: A Deep Neural Networks Approach for Multivariate Time Series Forecasting. Electronics 2019, 8, 876. [Google Scholar] [CrossRef]
  32. Lara-Benítez, P.; Carranza-García, M.; Luna-Romera, J.M.; Riquelme, J.C. Temporal Convolutional Networks Applied to Energy-Related Time Series Forecasting. Appl. Sci. 2020, 10, 2322. [Google Scholar] [CrossRef]
  33. Zhao, Z.; Xia, C.; Chi, L.; Chang, X.; Li, W.; Yang, T.; Zomaya, A.Y. Short-Term Load Forecasting Based on the Transformer Model. Information 2021, 12, 516. [Google Scholar] [CrossRef]
  34. L’Heureux, A.; Grolinger, K.; Capretz, M.A.M. Transformer-Based Model for Electrical Load Forecasting. Energies 2022, 15, 4993. [Google Scholar] [CrossRef]
  35. Stratigakos, A.; Andrianesis, P.; Michiorri, A.; Kariniotakis, G. Towards Resilient Energy Forecasting: A Robust Optimization Approach. IEEE Trans. Smart Grid 2024, 15, 874–885. [Google Scholar] [CrossRef]
  36. Mienye, I.D.; Sun, Y. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access 2022, 10, 99129–99149. [Google Scholar] [CrossRef]
  37. Grotmol, G.; Furdal, E.H.; Dalal, N.; Ottesen, A.L.; Rørvik, E.-L.H.; Mølnå, M.; Sizov, G.; Gundersen, O.E. A robust and scalable stacked ensemble for day-ahead forecasting of distribution network losses. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 15503–15511. [Google Scholar]
  38. Gupta, H.; Agarwal, P.; Gupta, K.; Baliarsingh, S.; Vyas, O.P.; Puliafito, A. FedGrid: A Secure Framework with Federated Learning for Energy Optimization in the Smart Grid. Energies 2023, 16, 8097. [Google Scholar] [CrossRef]
  39. Shi, B.; Zhou, X.; Li, P.; Ma, W.; Pan, N. An IHPO-WNN-Based Federated Learning System for Area-Wide Power Load Forecasting Considering Data Security Protection. Energies 2023, 16, 6921. [Google Scholar] [CrossRef]
  40. Shi, Y.; Xu, X. Deep Federated Adaptation: An Adaptative Residential Load Forecasting Approach with Federated Learning. Sensors 2022, 22, 3. [Google Scholar] [CrossRef] [PubMed]
  41. eXtreme Gradient Boosting 2.0.2. 2023. Available online: https://github.com/dmlc/xgboost (accessed on 19 November 2023).
  42. Chicco, D.; Warrens, M.J.; Jurman, G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput. Sci. 2021, 7, e623. [Google Scholar] [CrossRef] [PubMed]
  43. Bin Kamilin, M.H.; Yamaguchi, S.; Bin Ahmadon, M.A. Radian Scaling: A Novel Approach to Preventing Concept Drift in Electricity Load Prediction. In Proceedings of the 2023 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), Busan, Republic of Korea, 23–25 October 2023; pp. 1–4. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.