Next Article in Journal
A Novel Criticality Analysis Technique for Detecting Dynamic Disturbances in Human Gait
Next Article in Special Issue
A Framework for Knowledge Management System Adoption in Small and Medium Enterprises
Previous Article in Journal
Computational and Communication Infrastructure Challenges for Resilient Cloud Services
Previous Article in Special Issue
Implementation of a Programmable Electronic Load for Equipment Testing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Platform-Independent Web Application for Short-Term Electric Power Load Forecasting on 33/11 kV Substation Using Regression Tree

by
Venkataramana Veeramsetty
1,†,
Modem Sai Pavan Kumar
2,† and
Surender Reddy Salkuti
3,*,†
1
Center for AI and Deep Learning, Department of Electrical and Electronics Engineering, SR University, Warangal 506371, India
2
Department of Electrical and Electronics Engineering, SR Engineering College, Warangal 506371, India
3
Department of Railroad and Electrical Engineering, Woosong University, Daejeon 34606, Korea
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Computers 2022, 11(8), 119; https://doi.org/10.3390/computers11080119
Submission received: 18 June 2022 / Revised: 19 July 2022 / Accepted: 25 July 2022 / Published: 29 July 2022
(This article belongs to the Special Issue Computing, Electrical and Industrial Systems 2022)

Abstract

:
Short-term electric power load forecasting is a critical and essential task for utilities in the electric power industry for proper energy trading, which enables the independent system operator to operate the network without any technical and economical issues. From an electric power distribution system point of view, accurate load forecasting is essential for proper planning and operation. In order to build most robust machine learning model to forecast the load with a good accuracy irrespective of weather condition and type of day, features such as the season, temperature, humidity and day-status are incorporated into the data. In this paper, a machine learning model, namely a regression tree, is used to forecast the active power load an hour and one day ahead. Real-time active power load data to train and test the machine learning models are collected from a 33/11 kV substation located in Telangana State, India. Based on the simulation results, it is observed that the regression tree model is able to forecast the load with less error.

1. Introduction

An electric power distribution substation takes the power from one or more transmission or subtransmission lines and delivers this power to residential, commercial, and industrial customers through multiple feeders. Short-term load forecasting at the distribution level estimates the active power load on a substation in a time horizon ranging from 30 min to 1 week [1]. The load forecasting of a distribution system gives advance alarms to the operator about the overloading of feeders and substations. Load forecasting helps the distribution substation operator to schedule and dispatch the storage batteries to shave the peak load in a smart grid environment [2]. Electrical power load forecasting is classified as very short term, short term, medium term and long term based on the length of the prediction horizon [3,4,5]. Due to deregulated power system structure and more liberalization in energy markets, electric power load forecasting has become more essential [6]. Long-term load forecasting is generally used for planning and investment profitability analysis, determining upcoming sites, or acquiring fuel sources for production plants. Medium-term load forecasting is usually preferred for risk management, balance sheet calculations, and derivatives pricing [7]. An accurate short-term load forecasting will help an electric power distribution utility to optimize the power grid load and strengthen reliability, reduce the electricity consumption cost and emphasize electric energy trading possibilities.
Forecasting the distribution-level load is far more challenging than forecasting the system-level load, such as the Telangana State’s electric power demand, due to the intricate load characteristics, huge number of nodes, and probable switching actions in distribution systems. Since the end-user behaviour has a far greater influence on distribution systems than it does on transmission systems, the load profiles of distribution systems will have more stochastically abrupt departures. Operating an independent distribution system successfully necessitates significantly more precise and high-resolution load forecasting than today’s approach can deliver [8]. Load estimates over a vast region are highly accurate because the aggregated load is steady and consistent. The distribution-level load, on the other hand, may be dominated by a few major clients, such as industrial businesses or schools, and the load pattern may not be as regular as that of a vast region. Furthermore, due to reconfigurations caused by switching activities, the load may be temporarily moved from one feeder to another, causing significant changes in distribution-level load profiles and affecting the trend at a given time. The main challenge in electric power load forecasting is data loss. Some works are available in the literature to deal with data loss, e.g., by designing a voltage hierarchical controller against communication delay and data loss [9]. The communication delay was treated using delay-tolerant power compensation control (DTPCC) in [10]. PCC uses normal PCC for effective operation when the communication delay is within the maximum tolerable communication delay, or switches to predictive PCC under abnormal communication delay conditions. However, in that paper, the authors collected historical data from the distribution company not through any communication channels.
Four general categories are identified for short-term load forecasting, i.e., similar day, variable selection, hierarchical forecasting, and weather station selection [11]. The similar-day technique identifies the load data as a set of related daily load profiles, whereas the variable selection method assumes that the load data act as a series of variables that are either correlated or independent of one another. The hierarchical technique, on the other hand, treats data as an aggregated load that is extremely variable due to changes in the load at the lower levels of the hierarchy. Finally, weather station selection is a strategy for determining which weather data are best fitted into the load model [12]. Load forecasting is regarded as one of the most critical duties for power system operators in the demand management system (DMS) as shown in Figure 1.
Many researchers have been working on short-term load forecasting of distribution systems. An ANN-based methodology was developed in [13] to forecast the load on a 33/11 kV substation near Kakatiya University in Warangal, Telangana State. In that study, the authors used the load from the previous three hours and the load at the same time but in the previous four days as input features. Load forecasting on an electric power distribution system using various regression models was proposed in [14] by considering the load from the previous three hours and the load at the same time but on the previous day as input features. Short-term load forecasting on an electric power distribution system using factor analysis and long short-term memory was proposed in [15] by considering the load from the previous three hours, the load at the same time but in the previous three days, and the load at the same time but in the previous three weeks as input features. Electric power load forecasting at the distribution level using a random forest and gated recurrent unit was proposed in [16], by considering previous three hours load, load at same time but previous three days and load at same time but previous three weeks as input features. Electric power load forecasting at the distribution level using a correlation concept and an ANN was proposed in [17] by considering the load from the previous two hours and the load at the same time but in the previous three days as input features. Similarly, the active power demand on a 33/11 kV electric power distribution system using principal component analysis and a recurrent neural network was proposed in [18], by considering the load from the previous three hours, the load at the same time but in previous three days and the load at the same time but in previous three weeks as input features.
Electric power load forecasting on a medium voltage level based on regression models and ANN was proposed in [19] using time series DSO telemetry data and weather records from the Portuguese Institute of Sea and Atmosphere, and applied to the urban area of Évora, one of Portugal’s first smart cities. A new top-down algorithm based on a similar day-type method to compute an accurate short-term distribution loads forecast, using only SCADA data from transmission grid substations was proposed in [20]. That study was evaluated on the RBTS test system with real power consumption data to demonstrate its accuracy. A convolutional-neural-network-based load forecasting methodology was proposed in [21]. Electric demand forecasting with a jellyfish search extreme learning machine, a Harris hawks extreme learning machine, and a flower pollination extreme learning machine was discussed in [22]. Electric power load forecasting using gated recurrent units with multisource data was discussed in [23]. Short-term load forecasting using a niche immunity lion algorithm and convolutional neural network was studied in [24]. Electricity demand forecasting using a dynamic adaptive entropy-based weighting was discussed in [25]. A demand-side management technique by identifying and mitigating the peak load of a building was studied in [26]. Electric power demand forecasting using a vector autoregressive state-space model was discussed in [27].
Electric power load forecasting using a random forest model was discussed in [28]. In that study, authors considered wind speed, wind direction, humidity, temperature, air pressure, and irradiance as input features. Electric power load forecasting using a group method of data handling and support vector regression was discussed in [29]. The electric power load prediction at the building and district levels for day-ahead energy management using a genetic algorithm (GA) and artificial neural network (ANN) power predictions was discussed in [30]. Short-term electric power load forecasting using feature engineering, Bayesian optimization algorithms with a Bayesian neural network was discussed in [31]. Active power load forecasting using a sparrow search algorithm (ISSA), Cauchy mutation, and opposition-based learning (OBL) and the long short-term memory (LSTM) network was studied in [32]. A new hybrid model was proposed in [33] based on CNN, LSTM, CNN_LSTM, and MLP for electric power load forecasting. All these methodologies provided valuable contributions towards the load forecasting problem, but these studies did not include the weather impact, season and day status in load forecasting.
The main contributions of this paper are as follows:
  • A new active power load dataset is developed to work on load forecasting problem by collecting the data from a 33/11 kV distribution substation in Godishala (village), Telangana State, India and available at https://data.mendeley.com/datasets/tj54nv46hj/1, accessed on 1 May 2022.
  • A machine learning model, i.e., a regression tree model is used to forecast the load on a 33/11 kV distribution substation in Godishala.
  • The active power load on a 33/11 kV substation is forecast one hour ahead based on input features L(T-1), L(T-2), L(T-24), L(T-48), day, season, temperature, and humidity.
  • The active power load on a 33/11 kV substation is forecast one day ahead based on input features L(T-24), L(T-48), day, season, temperature, and humidity.
  • A web application is developed based on a regression tree model to forecast the load on a 33/11 kV distribution substation in Godishala.
  • The impact of weather and days on short-term load forecasting is analysed by incorporating the season and day-status (weekday/weekend) in the data.
  • A practical implementation of the system in a prototype web application, where a regression tree model is deployed and execute the forecasts on a daily and hourly basis.

2. Methodology

This section presents the active power load data that are used to train and test the machine learning models. Furthermore, we discuss about the regression tree model that is used for electric power load forecasting on a 33/11 kV distribution substation in Godishala. This substation has four feeders: the first feeder (F1) supplies load to Godishala (town), the second feeder supplies load to Bommakal, the third feeder supplies load to the Godishala (village), and the fourth feeder (F4) supplies load to Raikal. The complete pipeline to develop the web application for short-term load forecasting using a regression model is presented in Figure 2.

2.1. Active Power Load Data Analysis

To train and test the machine learning model, active power load data are required. Hourly data consisting of voltage (V), current (I) and power factor (cos( ϕ )) from a 33/11 kV distribution substation in Godishala were collected from 1 January 2021 to 31 December 2021. Based on these data, the hourly active power load was calculated using Equation (1) and the sample load data are presented Table 1.
P = ( 3 ) V I c o s ( ϕ )

2.2. Features Information and Data Preparation

In this paper, the load at a particular time of the day “L(T)” was predicted based on the last two hours of load data, i.e., L(T-1), L(T-2), the load at the same time but in the last two days, i.e., L(T-24), L(T-48), the temperature, the humidity, the season, and the day. Hence, data that were prepared based on collected information from the 33/11 kV substation were rearranged as shown in Table 2. This was the approach used for hour-ahead forecasting, whereas for day-ahead forecasting, the load at a particular time of the day “L(T)” was predicted based on the load at the same time but in the last two days, i.e., L(T-24), L(T-48), the temperature, the humidity, the season, and the day as presented in Table 3. The dataset for hour-ahead forecasting had 8712 samples, 8 input features and 1 output feature. Similarly, the dataset for day-ahead forecasting had 8712 samples, 6 input features and one output feature.

2.3. Machine Learning Models

In this paper, a regression tree model was used to forecast the load on a 33/11 kV substation one hour ahead and one day ahead. The problem discussed here is a regression problem. Models need to predict the load on the substation based on input features such as L(T-1), L(T-2), L(T-24), L(T-48), day, season, temperature, and humidity in the case of hour-ahead forecasting, and based on input features such as L(T-24), L(T-48), day, season, temperature, and humidity in the case of day-ahead forecasting. The performance of each machine learning model for electric power load forecasting on a 33/11 kV substation was observed in terms of the MSE as shown in Equation (2)
T r a i n i n g M S E = 1 n s 1 n s ( L a ( T ) L ( T ) ) 2 T e s t i n g M S E = 1 n t 1 n t ( L a ( T ) L ( T ) ) 2

Regression Tree

A regression tree is basically a decision tree model that is used for the task of regression, which can be used to predict continuous valued output. In this paper, a regression tree was used to forecast the load. For a regression problem, a tree is constructed by splitting the input features such that the mean squared error shown in Equation (3) is minimum. A step-by-step procedure to construct the regression tree with sample data is explained in Appendix B, as mentioned in Algorithm 1. The performance of the regression tree model on short-term load forecasting problem for Godishala substation was measured in terms of error metrics such as the MSE [34], RMSE [35,36,37], and MAE [38]. A decision tree can also used for classification problems [39,40,41].
M S E = 1 n s ( L a ( T ) L p ( T ) ) 2
Algorithm 1 Regression tree model formulation
1:
 Read data, initialize max-depth.
2:
 for Depth ϵ range(max-depth) do
3:
  for Feature ϵ data table do
4:
   for Value ϵ feature do
5:
    Find the MSE for each unique split of the feature
6:
    Find the best split among all feature values based on the minimum MSE
7:
    Split the data table based on the feature corresponding to the best split
8:
    Start building tree by identifying the root/decision node among all features
9:
   end for
10:
end for
11:
end for
Figure 2. Workflow to develop web application.
Figure 2. Workflow to develop web application.
Computers 11 00119 g002

3. Result Analysis

All the machine learning models were developed based on the data available in [42] using Google Colab. this section presents the data analysis, training and testing performance of machine learning models, and the web application developed to predict the load. Out of 8712 samples, 95% of samples were used for training and the remaining 5% of samples were used for testing. The data processing techniques for observing the data distribution and outliers and for data normalization were used before using these data to train and test the regression model. A stochastic gradient descent optimizer was used to train the regression models.

3.1. Regression Tree Model

The performance of the regression tree model that was developed to forecast the load L(T) based on features L(T-1), L(T-2), L(T-24), L(T-48), day and season status, temperature, and humidity was observed based on training and testing errors for hour-ahead load forecasting (HALF). The training and testing error metrics of the regression tree model are presented in Table 4 for HALF. From Table 4, it is observed that the regression model with a depth “5” had lowest testing MSE, i.e., 0.005 and was also well fitted without much difference between training and testing errors. Hence, the regression tree with a depth “5” was considered as the optimal model to deploy in a web application for hour-ahead forecasting. The complete architecture of the regression tree with a depth “5” for day-ahead load forecasting is shown in Figure 3.
Similarly, for day-ahead forecasting (DALF), the performance of the regression tree model that was developed to forecast the load L(T) based on features L(T-24), L(T-48), day and season status, temperature, and humidity was observed based on training and testing errors. The training and testing error metrics of the regression tree model are presented in Table 5 for DALF. From Table 5, it is observed that regression model with a depth “6” had lowest testing MSE, i.e., 0.00869 and was also well fitted without much difference between training and testing errors. Hence, the regression tree with a depth “6” was considered as the optimal model to deploy in a web application for day-ahead forecasting. The complete architecture of the regression tree with a depth “6” for day-ahead load forecasting is shown in Figure 4.
The distribution of the predicted load with the regression tree model having a training MSE of 0.004 and a testing MSE of 0.005 was compared with actual load samples for the training and testing data for HALF and is presented in Figure 5. Similarly, the distribution of the predicted load with the regression tree model having a training MSE of 0.0065 and a testing MSE of 0.00869 was compared with actual load samples for the training and testing data for DALF and is presented in Figure 6. From Figure 5 and Figure 6, it is observed that most of the predicted and actual load samples are overlapping each other.
The predicted load using the regression tree model was compared with the actual load on 31 December 2021 and presented in Figure 7. From Figure 7, it is observed that the predicted load using the regression tree model is almost following the actual load curve during night time but has more differences during day time, and also that the predicted load is slightly further away from the actual load curve in the case of DALF in comparison with HALF. As in the first case, the load was predicted one day earlier, i.e., a 24 h time horizon.
A web application was developed using the optimal regression tree models to predict the load one hour ahead and one day ahead for a real-time usage as a prototype and is shown in Figure 8. This web application is accessible through the link https://loadforecasting-godishala-rt.herokuapp.com/, accessed on 1 May 2022 or through the QR code shown in Figure 8.

3.2. Impact of Season and Day on Regression Model Prediction

The forecasting performance of the trained machine learning model for HALF on various seasons, i.e., rainy, winter, and summer, is presented in Figure 9. From Figure 9, it is observed that the developed regression tree model is able to forecast the load with almost the same level of error (with very minor changes in error) for all seasons.
The forecasting performance of the trained machine learning model for HALF on various seasons, i.e., rainy, winter, and summer, is presented in Figure 10. From Figure 10, it is observed that the developed regression tree model is able to forecast the load with almost the same level of error irrespective of whether it is weekday or weekend.

3.3. Comparative Analysis

The performance of the machine learning model, i.e., the regression tree model, was compared with a linear regression model in terms of training and testing mean squared error for both HALF and DALF and presented in Figure 11. From Figure 11, it is observed that the regression tree model is forecasting the load with less error in comparison with the linear regression model for both HALF and DALF. In case of DALF, the model is forecasting the load with more error than HALF, as the latter one forecast the load just one hour ahead. The regression tree model is well fitted without much difference between training and testing errors.

4. Conclusions

Electric power load forecasting one hour ahead and one day ahead are required for utilities to place a bid successfully in hour-ahead energy markets and day-ahead energy markets. In this paper, the active power on a 33/11 kV substation was predicted one hour ahead based on the load available in the last two hours and last two days at the time of prediction, and the day status, season status, temperature, and humidity. Similarly, the load was predicted one day ahead based on the load available in the last two days at the time of prediction, and the day status, season status, temperature, and humidity. A robust machine learning model was developed to forecast the load with good accuracy irrespective of the weather conditions and types of the day by incorporating features such as season, temperature, humidity, and day-status.
In this work, a machine learning model, i.e., a regression tree model was developed to predict the active power load on a 33/11 kV substation located in the Godishala village in Telangana State, India. Based on the results, it was observed that the regression tree model predicted the load one hour and one day ahead with less mean squared error in comparison with the linear regression model.
This work can be further extended by considering deep neural networks, sequence models, and conventional time series data prediction models. In this paper, the temperature and humidity data at the time of prediction were considered from an open-source website. However, we are currently further extending the model by integrating temperature and humidity forecasting models with the current load forecasting models.

Author Contributions

Conceptualization, V.V.; methodology, V.V.; software, V.V. and M.S.P.K.; validation, V.V.; formal analysis, V.V.; investigation, V.V.; resources, M.S.P.K.; data curation, M.S.P.K.; writing—original draft preparation, V.V.; writing—review and editing, V.V., M.S.P.K. and S.R.S.; visualization, V.V.; supervision, V.V.; project administration, V.V. All authors have read and agreed to the published version of the manuscript.

Funding

Woosong University’s Academic Research Funding—2022.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Active power load data used to train and test machine learning models are available at https://data.mendeley.com/datasets/tj54nv46hj/1, accessed on 7 February 2022.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
L(T-1)Active power load one hour before the time of prediction
L(T-2)Active power load two hours before the time of prediction
L(T-24)Active power load one day before the time of prediction
L(T-48)Active power load two days before the time of prediction
L(T)Active power load at hour “T”
MSEMean squared error
HALFHour-ahead load forecasting
DALFDay-ahead load forecasting
RMSERoot-mean-square error
MSEMean absolute error
L a ( T ) Actual load at hour “T”
L p ( T ) Predicted load at hour “T”
n s Number of training samples
n t Number of testing samples

Appendix A. Conversion of Continuous Data into Categorical Data

In this section, the step-by-step procedure that was used to build the regression tree model is presented. For this purpose, a sample dataset that was built from a few samples of the original dataset is shown in Table A1.
Table A1. Sample data to build the regression tree.
Table A1. Sample data to build the regression tree.
L(T-24)L(T-48)DAYSEASONTEMPHUMIDITYL(T)
2176412116788432
235418290168882260
277726470270832681
311232031275673343
166315491075731579
101010270071931018
To convert continuous features into categorical features, multiple subtables were formed from Table A1. Each subtable consisted of one input feature and one target feature. We sorted each subtable in ascending order of input feature in that table. We calculated the average between every two continuous input feature values. We converted continuous input features into categorical features based on their average value. We found the mean squared value for each average and the average value that gave lowest “MSE” were treated as threshold for that feature.
  • Prepare the subtable for input feature L(T-24) and output feature L(T) as shown in Table A2.
Table A2. Sorted subtable—L(T-24) vs. L(T).
Table A2. Sorted subtable—L(T-24) vs. L(T).
L(T-24)L(T)
10101018
16631579
2176432
23542260
27772681
31123343
  • Calculate the average between every two continuous input feature values for L(T-24) and the average values are [1336, 1919, 2265, 2566, 2945].
  • Convert the continuous input feature L(T-24) into a categorical feature based on the average value 1336 and Table A2, as shown in Table A3. The predicted value against each category of input feature is the average of all output variables for that category.
Table A3. Categorical subtable—L(T-24) vs. L(T).
Table A3. Categorical subtable—L(T-24) vs. L(T).
L(T-24)L(T) L P (T)
<133610181018
≥13361579 1579 + 432 + 2260 + 2681 + 3343 5 = 2059
≥13364322059
≥133622602059
≥133626812059
≥133633432059
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A3 and presented below
( 1018 1018 ) 2 + ( 1579 2059 ) 2 + ( 432 2059 ) 2 + ( 2260 2059 ) 2 + ( 2681 2059 ) 2 + ( 3343 2059 ) 2 6 = 825,622.
  • Convert the continuous input feature L(T-24) into a categorical feature based on the average value 1919 and Table A2, as shown in Table A4. The predicted value against each category of input feature is the average of all output variables for that category.
Table A4. Categorical subtable—L(T-24) vs. L(T) with average value 1919.
Table A4. Categorical subtable—L(T-24) vs. L(T) with average value 1919.
L(T-24)L(T) L P (T)
<191910181298
<191915791298
≥1919432 432 + 2260 + 2681 + 3343 4 = 2179
≥191922602179
≥191926812179
≥191933432179
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A4 and presented below
( 1018 1298 ) 2 + ( 1579 1298 ) 2 + ( 432 2179 ) 2 + ( 2260 2179 ) 2 + ( 2681 2179 ) 2 + ( 3343 2179 ) 2 6 = 803,903.
  • Convert the continuous input feature L(T-24) into a categorical feature based on the average value 2265 and Table A2, as shown in Table A5. The predicted value against each category of input feature is the average of all output variables for that category.
Table A5. Categorical subtable—L(T-24) vs. L(T) with average value 2265.
Table A5. Categorical subtable—L(T-24) vs. L(T) with average value 2265.
L(T-24)L(T) L P (T)
<22651018 1018 + 1579 + 432 3 = 1010
<226515791010
<22654321010
≥22652260 2260 + 2681 + 3343 3 = 2761
≥226526812761
≥226533432761
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A5 and presented below
( 1018 1010 ) 2 + ( 1579 1010 ) 2 + ( 432 1010 ) 2 + ( 2260 2761 ) 2 + ( 2681 2761 ) 2 + ( 3343 2761 ) 2 6 = 209,064.
  • Convert the continuous input feature L(T-24) into a categorical feature based on the average value 2566 and Table A2, as shown in Table A6. The predicted value against each category of input feature is the average of all output variables for that category.
Table A6. Categorical subtable—L(T-24) vs. L(T) with average value 2566.
Table A6. Categorical subtable—L(T-24) vs. L(T) with average value 2566.
L(T-24)L(T) L P (T)
<25661018 1018 + 1579 + 432 + 2260 4 = 1322
<256615791322
<25664321322
<256622601322
≥25662681 2681 + 3343 2 = 3012
≥256633433012
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A6 and presented below
( 1018 1322 ) 2 + ( 1579 1322 ) 2 + ( 432 1322 ) 2 + ( 2260 1322 ) 2 + ( 2681 3012 ) 2 + ( 3343 3012 ) 2 6 = 341,675.
  • Convert the continuous input feature L(T-24) into a categorical feature based on the average value 2945 and Table A2, as shown in Table A7. The predicted value against each category of input feature is the average of all output variables for that category.
Table A7. Categorical subtable—L(T-24) vs. L(T) with average value 2945.
Table A7. Categorical subtable—L(T-24) vs. L(T) with average value 2945.
L(T-24)L(T) L P (T)
<29451018 1018 + 1579 + 432 + 2260 + 2681 5 = 1594
<294515791594
<29454321594
<294522601594
<294526811594
≥294533433343
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A7 and presented below
( 1018 1594 ) 2 + ( 1579 1594 ) 2 + ( 432 1594 ) 2 + ( 2260 1594 ) 2 + ( 2681 1594 ) 2 + ( 3343 3343 ) 2 6 = 551,591.
  • Prepare subtable for input feature L(T-48) and output feature L(T) as shown in Table A8.
Table A8. Sorted subtable—L(T-48) vs. L(T).
Table A8. Sorted subtable—L(T-48) vs. L(T).
L(T-48)L(T)
412432
10271018
15491579
18292260
26472681
32033343
  • Calculate the average between every two continuous input feature values for L(T-48) and the average values are [720, 1288, 1689, 2238, 2925].
  • Convert the continuous input feature L(T-48) into a categorical feature based on the average value 720 and Table A8, as shown in Table A9. The predicted value against each category of input feature is the average of all output variables for that category.
Table A9. Categorical subtable—L(T-48) vs. L(T).
Table A9. Categorical subtable—L(T-48) vs. L(T).
L(T-48)L(T) L P (T)
<720432432
≥7201018 1018 + 1579 + 2260 + 2681 + 3343 5 = 2176.2
≥72015792176.2
≥72022602176.2
≥72026812176.2
≥72033432176.2
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A9 and presented below
( 432 432 ) 2 + ( 1018 2176.2 ) 2 + ( 1579 2176.2 ) 2 + ( 2260 2176.2 ) 2 + ( 2681 2176.2 ) 2 + ( 3343 2176.2 ) 2 6 = 553,557.1333.
  • Convert the continuous input feature L(T-48) into a categorical feature based on the average value 1288 and Table A8, shown in Table A10. The predicted value against each category of input feature is the average of all output variables for that category.
Table A10. Categorical subtable—L(T-48) vs. L(T).
Table A10. Categorical subtable—L(T-48) vs. L(T).
L(T-48)L(T) L P (T)
<1288432 432 + 1018 2 = 725
<12881018725
≥12881579 1579 + 2260 + 2681 + 3343 4 = 2465.75
≥128822602465.75
≥128826812465.75
≥128833432465.75
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A10 and presented below
( 432 725 ) 2 + ( 1018 725 ) 2 + ( 1579 2465.75 ) 2 + ( 2260 2465.75 ) 2 + ( 2681 2465.75 ) 2 + ( 3343 2465.75 ) 2 6 = 302,709.4583.
  • Convert the continuous input feature L(T-48) into a categorical feature based on the average value 1689 and Table A8, as shown in Table A11. The predicted value against each category of input feature is the average of all output variables for that category.
Table A11. Categorical subtable—L(T-48) vs. L(T).
Table A11. Categorical subtable—L(T-48) vs. L(T).
L(T-48)L(T) L P (T)
<1689432 432 + 1018 + 1579 3 = 1009.666667
<168910181009.666667
<168915791009.666667
≥16892260 2260 + 2681 + 3343 3 = 2761.333333
≥168926812761.333333
≥168933432761.333333
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A11 and presented below
( 432 1009.67 ) 2 + ( 1018 1009.67 ) 2 + ( 1579 1009.67 ) 2 + ( 2260 2761.33 ) 2 + ( 2681 2761.33 ) 2 + ( 3343 2761.33 ) 2 6 = 209,005.5556.
  • Convert the continuous input feature L(T-48) into a categorical feature based on the average value 2238 and Table A8, as shown in Table A12. The predicted value against each category of input feature is the average of all output variables for that category.
Table A12. Categorical subtable—L(T-48) vs. L(T).
Table A12. Categorical subtable—L(T-48) vs. L(T).
L(T-48)L(T) L P (T)
<2238432 432 + 1018 + 1579 + 2260 4 = 1322.25
<223810181322.257
<223815791322.25
<223822601322.25
≥22382681 2681 + 3343 2 = 3012
≥223833433012
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A12 and presented below
( 432 1322.25 ) 2 + ( 1018 1322.25 ) 2 + ( 1579 1322.25 ) 2 + ( 2260 1322.25 ) 2 + ( 2681 3012 ) 2 + ( 3343 3012 ) 2 6 = 341,588.4583.
  • Convert the continuous input feature L(T-48) into a categorical feature based on the average value 2925 and Table A8, as shown in Table A13. The predicted value against each category of input feature is the average of all output variables for that category.
Table A13. Categorical subtable—L(T-48) vs. L(T).
Table A13. Categorical subtable—L(T-48) vs. L(T).
L(T-48)L(T) L P (T)
<2925432 432 + 1018 + 1579 + 2260 + 2681 5 = 1594
<292510181594
<292515791594
<292522601594
<292526811594
≥292533433343
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A13 and presented below
( 432 1594 ) 2 + ( 1018 1594 ) 2 + ( 1579 1594 ) 2 + ( 2260 1594 ) 2 + ( 2681 1594 ) 2 + ( 3343 3343 ) 2 6 = 551,228.3333.
  • Prepare subtable for input feature L(TEMP) and output feature L(T) and shown in Table A14
Table A14. Sorted subtable—L(TEMP) vs. L(T).
Table A14. Sorted subtable—L(TEMP) vs. L(T).
L(TEMP)L(T)
67432
682260
702681
711018
753343
751579
  • Calculate average between every two continuous input feature values for L(TEMP) and the average values are [67.5, 69, 70.5, 73, 75].
  • Convert the continuous input feature L(TEMP) into a categorical feature based on the average value 67.5 and Table A14, as shown in Table A15. The predicted value against each category of input feature is the average of all output variables for that category.
Table A15. Categorical subtable—L(TEMP) vs. L(T).
Table A15. Categorical subtable—L(TEMP) vs. L(T).
L(TEMP)L(T) L P (T)
<67.5432432
≥67.52260 2260 + 2681 + 1018 + 3343 + 1579 5 = 2176.2
≥67.526812176.2
≥67.510182176.2
≥67.533432176.2
≥67.515792176.2
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A15 and presented below
( 432 432 ) 2 + ( 2260 2176.2 ) 2 + ( 2681 2176.2 ) 2 + ( 1018 2176.2 ) 2 + ( 3343 2176.2 ) 2 + ( 1579 2176.2 ) 2 6 = 553,557.
  • Convert the continuous input feature L(TEMP) into a categorical feature based on the average value 69 and Table A14, as shown in Table A16. The predicted value against each category of input feature is the average of all output variables for that category.
Table A16. Categorical subtable—L(TEMP) vs. L(T).
Table A16. Categorical subtable—L(TEMP) vs. L(T).
L(TEMP)L(T) L P (T)
<69432 432 + 2260 2 = 1346
<6922601346
≥692681 2681 + 1018 + 3343 + 1579 4 = 2155.25
≥6910182155.25
≥6933432155.25
≥6915792155.25
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A16 and presented below
( 432 1346 ) 2 + ( 2260 1346 ) 2 + ( 2681 2155.25 ) 2 + ( 1018 2155.25 ) 2 + ( 3343 2155.25 ) 2 + ( 1579 2155.25 ) 2 6 = 830,559.
  • Convert the continuous input feature L(TEMP) into a categorical feature based on the average value 70.5 and Table A14, as shown in Table A17. The predicted value against each category of input feature is the average of all output variables for that category.
Table A17. Categorical subtable—L(TEMP) vs. L(T).
Table A17. Categorical subtable—L(TEMP) vs. L(T).
L(TEMP)L(T) L P (T)
<70.5432 432 + 2260 + 2681 3 = 1791
<70.522601791
<70.526811791
≥70.51018 1018 + 3343 + 1579 3 = 1980
≥70.533431980
≥70.515791980
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A17 and presented below
( 432 1791 ) 2 + ( 2260 1791 ) 2 + ( 2681 1791 ) 2 + ( 1018 1980 ) 2 + ( 3343 1980 ) 2 + ( 1579 1980 ) 2 6 = 967,159.
  • Convert the continuous input feature L(TEMP) into a categorical feature based on the average value 73 and Table A14, as shown in Table A18. The predicted value against each category of input feature is the average of all output variables for that category.
Table A18. Categorical subtable—L(TEMP) vs. L(T).
Table A18. Categorical subtable—L(TEMP) vs. L(T).
L(TEMP)L(T) L P (T)
<73432 432 + 2260 + 2681 + 1018 4 = 1597.75
<7322601597.75
<7326811597.75
<7310181597.75
≥733343 3343 + 1579 2 = 2461
≥7315792461
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A18 and presented below
( 432 1597.75 ) 2 + ( 2260 1597.75 ) 2 + ( 2681 1597.75 ) 2 + ( 1018 1597.75 ) 2 + ( 3343 2461 ) 2 + ( 1579 2461 ) 2 6 = 810,489.
  • Convert the continuous input feature L(TEMP) into a categorical feature based on the average value 75 and Table A14, as shown in Table A19. The predicted value against each category of input feature is the average of all output variables for that category.
Table A19. Categorical subtable—L(TEMP) vs. L(T).
Table A19. Categorical subtable—L(TEMP) vs. L(T).
L(TEMP)L(T) L P (T)
<75432 432 + 2260 + 2681 + 1018 + 3343 5 = 1946.8
<7522601946.8
<7526811946.8
<7510181946.8
<7533431946.8
≥7515791579
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A19 and presented below
( 432 1946.8 ) 2 + ( 2260 1946.8 ) 2 + ( 2681 1946.8 ) 2 + ( 1018 1946.8 ) 2 + ( 3343 1946.8 ) 2 + ( 1579 1579 ) 2 6 = 957,301.
  • Prepare subtable for input feature Humidity and output feature L(T) and shown in Table A20
Table A20. Sorted subtable—Humidity vs. L(T).
Table A20. Sorted subtable—Humidity vs. L(T).
HumidityL(T)
673343
731579
832681
88432
882260
931018
  • Calculate average between every two continuous input feature values for Humidity and the average values are [70, 78, 85.5, 88, 90.5].
  • Convert the continuous input feature Humidity into a categorical feature based on the average value 70 and Table A20, as shown in Table A21. The predicted value against each category of input feature is the average of all output variables for that category.
Table A21. Categorical subtable—Humidity vs. L(T).
Table A21. Categorical subtable—Humidity vs. L(T).
HumidityL(T) L P (T)
<7033433343
≥701579 432 + 2260 + 1018 + 3343 + 1579 5 = 1594
≥7026811594
≥704321594
≥7022601594
≥7010181594
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A21 and presented below
( 3343 3343 ) 2 + ( 1579 1594 ) 2 + ( 2681 1594 ) 2 + ( 432 1594 ) 2 + ( 2260 1594 ) 2 + ( 1018 1594 ) 2 6 = 551,228.
  • Convert the continuous input feature Humidity into a categorical feature based on the average value 78 and Table A20, as shown in Table A22. The predicted value against each category of input feature is the average of all output variables for that category.
Table A22. Categorical subtable—Humidity vs. L(T).
Table A22. Categorical subtable—Humidity vs. L(T).
HumidityL(T) L P (T)
<783343 3343 + 1579 2 = 2461
<7815792461
≥782681 2681 + 432 + 2260 + 1018 4 = 1597.75
≥784321597.75
≥7822601597.75
≥7810181597.75
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A22 and presented below
( 3343 2461 ) 2 + ( 1579 2461 ) 2 + ( 2681 1597.75 ) 2 + ( 432 1597.75 ) 2 + ( 2260 1597.75 ) 2 + ( 1018 1597.75 ) 2 6 = 810,489.
  • Convert the continuous input feature Humidity into a categorical feature based on the average value 85.5 and Table A20, as shown in Table A23. The predicted value against each category of input feature is the average of all output variables for that category.
Table A23. Categorical subtable—Humidity vs. L(T).
Table A23. Categorical subtable—Humidity vs. L(T).
HumidityL(T) L P (T)
<85.53343 3343 + 1579 + 2681 3 = 2534.33
<85.515792534.33
<85.526812534.33
≥85.5432 432 + 2260 + 1018 3 = 1236.67
≥85.522601236.67
≥85.510181236.67
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A23 and presented below
( 3343 2534.33 ) 2 + ( 1579 2534.33 ) 2 + ( 2681 2534.33 ) 2 + ( 432 1236.67 ) 2 + ( 2260 1236.67 ) 2 + ( 1018 1236.67 ) 2 6 = 555,105
  • Convert the continuous input feature Humidity into a categorical feature based on the average value 88 and Table A20, as shown in Table A24. The predicted value against each category of input feature is the average of all output variables for that category.
Table A24. Categorical subtable—Humidity vs. L(T).
Table A24. Categorical subtable—Humidity vs. L(T).
HumidityL(T) L P (T)
<883343 3343 + 1579 + 2681 + 432 4 = 2008.75
<8815792008.75
<8826812008.75
<884322008.75
≥882260 2260 + 1018 2 = 1639
≥8810181639
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A24 and presented below
( 3343 2008.75 ) 2 + ( 1579 2008.75 ) 2 + ( 2681 2008.75 ) 2 + ( 432 2008.75 ) 2 + ( 2260 1639 ) 2 + ( 1018 1639 ) 2 6 = 945,708
  • Convert the continuous input feature Humidity into a categorical feature based on the average value 90.5 and Table A20, as shown in Table A25. The predicted value against each category of input feature is the average of all output variables for that category.
Table A25. Categorical subtable—Humidity vs. L(T).
Table A25. Categorical subtable—Humidity vs. L(T).
HumidityL(T) L P (T)
<90.53343 3343 + 1579 + 2681 + 432 + 2260 5 = 2059
<90.515792059
<90.526812059
<90.54322059
<90.522602059
≥90.510181018
  • Calculate the mean squared error based on the actual and predicted load values shown in Table A25 and presented below
( 3343 2059 ) 2 + ( 1579 2059 ) 2 + ( 2681 2059 ) 2 + ( 432 2059 ) 2 + ( 2260 2059 ) 2 + ( 1018 1018 ) 2 6 = 825,578
From all the above calculations, the minimum MSE value for the feature “T-24” is 209,064 for the split ≥2265, the minimum MSE value for the feature “T-48” is 209,006 for the split ≥1689, the minimum MSE value for the feature “Temperature” is 553,557 for the split ≥67.5, the minimum MSE value for the feature “Humidity” is 551,228 for the split ≥70. Hence, these splits against each feature were used to convert the continuous data shown in Table A1 into categorical data, as shown in Table A26. Furthermore, the MSE value for the day with categories (1 and 0) is 965,922 and the MSE value for the season with categories (0, 1, and 2) is 747157. All the MSE values are presented in Table A1 with bold font.
Table A26. Sample categorical data to build regression tree.
Table A26. Sample categorical data to build regression tree.
L(T-24)L(T-48)DAYSEASONTEMPHUMIDITYL(T)
<2265<168911<67.5≥70432
≥2265≥168901≥67.5≥702260
≥2265≥168902≥67.5≥702681
≥2265≥168912≥67.5<703343
<2265<168910≥67.5≥701579
<2265<168900≥67.5≥701018
209,064209,006965,922747,157553,557551,228

Appendix B. Regression Tree Model Formulation

From Table A26, we observe that L(T-48) has a minimum MSE value, i.e., 209,006 in comparison with all the remaining features. Hence, the input feature L(T-48) is considered as a root node for the regression tree and that node has two branches ≥1689 and <1689. In order to identify the decision node under each branch, Table A26 is divided into two subtables, presented in Table A27 and Table A28.
Table A27. Subtable: L(T-48) < 1689.
Table A27. Subtable: L(T-48) < 1689.
L(T-24)DAYSEASONTEMPHUMIDITYT
<226511<67.5≥70432
<226510≥67.5≥701579
<226500≥67.5≥701018
Table A28. Subtable: L(T-48) ≥ 1689.
Table A28. Subtable: L(T-48) ≥ 1689.
L(T-24)DAYSEASONTEMPHUMIDITYT
≥226501≥67.5≥702260
≥226502≥67.5≥702681
≥226512≥67.5<703343
  • In order to identify the decision node among L(T-24), day, season, temperature, and humidity under branch <1689 Table A28 is further divided into multiple subtables based on each input feature.
  • A subtable based on input feature L(T-24) and target variable L(T) is presented in Table A29. From Table A29, it is observed that input feature L(T-24) has an MSE value of 219,303.
Table A29. L(T-24) vs. L(T) for L(T-48) < 1689.
Table A29. L(T-24) vs. L(T) for L(T-48) < 1689.
L(T-24)L(T) PredictionSquared ErrorMSE
<22654321010333,699
<226515791010324,140219,303
<22651018101069
  • A subtable based on input feature day and target variable L(T) is presented in Table A30. From Table A30, it is observed that input feature day has an MSE value of 219,268.
Table A30. Day vs. L(T) for L(T-48) < 1689.
Table A30. Day vs. L(T) for L(T-48) < 1689.
DayL(T) PredictionSquared ErrorMSE
14321005.5328,902.25
115791005.5328,902.25219,268
0101810180
  • A subtable based on input feature season and target variable L(T) is presented in Table A31. From Table A31, it is observed that input feature season has an MSE value of 52,454.
Table A31. Season vs. L(T) for L(T-48) < 1689.
Table A31. Season vs. L(T) for L(T-48) < 1689.
SeasonL(T) PredictionSquared ErrorMSE
14324320
015791298.578,680.2552,454
010181298.578,680.25
  • A subtable based on input feature temperature and target variable L(T) is presented in Table A32. From Table A32, it is observed that input feature temperature has an MSE value of 52,454.
Table A32. Temperature vs. L(T) for L(T-48) < 1689.
Table A32. Temperature vs. L(T) for L(T-48) < 1689.
TemperatureL(T) PredictionSquared ErrorMSE
<67.54324320
≥67.515791298.578,680.2552454
≥67.510181298.578,680.25
  • A subtable based on input feature humidity and target variable L(T) is presented in Table A33. From Table A33, it is observed that input feature humidity has an MSE value of 219,303.
Table A33. Humidity vs. L(T) for L(T-48) < 1689.
Table A33. Humidity vs. L(T) for L(T-48) < 1689.
HumidityL(T) PredictionSquared ErrorMSE
≥704321009.67333,698.78
≥7015791009.67324,140.44219,303
≥7010181009.6769.44
It is observed from the above calculations that season and temperature have a minimum MSE, i.e., 52,454. Here, season is considered as a decision node under branch L(T-48) < 1689. Now, the node season has two branches, i.e., season “1” and “0”. In order to identify the decision/leaf node under each branch, Table A27 is divided into two subtables, presented in Table A34 and Table A35. From Table A34, it is observed that the branch corresponding to season “1” has a leaf node with value 432.
Table A34. Subtable: L(T-48) < 1689 and season = “1”.
Table A34. Subtable: L(T-48) < 1689 and season = “1”.
T-24DAYTEMPHUMIDITYT
<22651<67.5≥70432
Table A35. Subtable: L(T-48) < 1689 and season = “0”.
Table A35. Subtable: L(T-48) < 1689 and season = “0”.
T-24DAYTEMPHUMIDITYT
<22651≥67.5≥701579
<22650≥67.5≥701018
In order to identify the decision node among L(T-24), day, temperature, and humidity under branch season “0”, Table A35 is divided into multiple subtables with respect to each feature.
  • A subtable based on input feature L(T-24) and target variable L(T) is presented in Table A36. From Table A36, it is observed that input feature L(T-24) has an MSE value of 78,680.
Table A36. L(T-24) vs. L(T) for L(T-48) < 1689 and season “0”.
Table A36. L(T-24) vs. L(T) for L(T-48) < 1689 and season “0”.
L(T-24)L(T)PredictionSquared ErrorMSE
<226515791298.578,68078,680
<226510181298.578,680
  • A subtable based on input feature day and target variable L(T) is presented in Table A37. From Table A37, it is observed that input feature day has an MSE value of 0.
Table A37. Day vs. L(T) for L(T-48) < 1689 and season “0”.
Table A37. Day vs. L(T) for L(T-48) < 1689 and season “0”.
DayL(T)PredictionSquared ErrorMSE
11579157900
0101810180
  • A subtable based on input feature temperature and target variable L(T) is presented in Table A38. From Table A38, it is observed that input feature temperature has an MSE value of 78,680.
Table A38. Temperature vs. L(T) for L(T-48) < 1689 and season “0”.
Table A38. Temperature vs. L(T) for L(T-48) < 1689 and season “0”.
TemperatureL(T)PredictionSquared ErrorMSE
≥67.515791298.578,68078,680
≥67.510181298.578,680
  • A subtable based on input feature humidity and target variable L(T) is presented in Table A39. From Table A39, it is observed that input feature humidity has an MSE value of 78,680.
Table A39. Humidity vs. L(T) for L(T-48) < 1689 and season “0”.
Table A39. Humidity vs. L(T) for L(T-48) < 1689 and season “0”.
HumidityL(T)PredictionSquared ErrorMSE
≥7015791298.578,68078,680
≥7010181298.578,680
It is observed from the above calculations that feature “Day” has a minimum MSE, i.e., 0. Here, “Day” is considered as decision node under the season “0” branch. Now, node “Day” has two branches, i.e., day “0” and “1” as presented in Table A37. From Table A37, it is observed that the branch corresponding to day “1” has a leaf node with value 1579 and day “0” has a leaf node with value 1018.
  • In order to identify the decision node among L(T-24), day, season, temperature, and humidity under branch ≥1689, Table A40 is further divided into multiple subtables based on each input feature.
  • A subtable based on input feature L(T-24) and target variable L(T) is presented in Table A40. From Table A40, it is observed that input feature L(T-24) has an MSE value of 198,708.
Table A40. L(T-24) vs. L(T) for L(T-48) ≥ 1689.
Table A40. L(T-24) vs. L(T) for L(T-48) ≥ 1689.
L(T-24)L(T)PredictionSquared ErrorMSE
≥226522602761.33251,335
≥226526812761.336453198,708
≥226533432761.33338,336
  • A subtable based on input feature day and target variable L(T) is presented in Table A41. From Table A41, it is observed that input feature day has an MSE value of 29,540.
Table A41. Day vs. L(T) for L(T-48) ≥ 1689.
Table A41. Day vs. L(T) for L(T-48) ≥ 1689.
DayL(T) PredictionSquared ErrorMSE
022602470.544310
026812470.54431029,540
1334333430
  • A subtable based on input feature season and target variable L(T) is presented in Table A42. From Table A42, it is observed that input feature season has an MSE value of 73,041.
Table A42. Season vs. L(T) for L(T-48) ≥ 1689.
Table A42. Season vs. L(T) for L(T-48) ≥ 1689.
SeasonL(T) PredictionSquared ErrorMSE
1226022600
226813012109,56173,041
233433012109,561
  • A subtable based on input feature temperature and target variable L(T) is presented in Table A43. From Table A43, it is observed that input feature temperature has an MSE value of 198,708.
Table A43. Temperature vs. L(T) for L(T-48) ≥ 1689.
Table A43. Temperature vs. L(T) for L(T-48) ≥ 1689.
TemperatureL(T) PredictionSquared ErrorMSE
≥67.522602761.33251,335
≥67.526812761.336453198,708
≥67.533432761.33338,336
  • A subtable based on input feature humidity and target variable L(T) is presented in Table A44. From Table A44, it is observed that input feature humidity has an MSE value of 219,303.
Table A44. Humidity vs. L(T) for L(T-48) ≥ 1689.
Table A44. Humidity vs. L(T) for L(T-48) ≥ 1689.
HumidityL(T) PredictionSquared ErrorMSE
≥7022602470.544310
≥7026812470.54431029,540
<67.5334333430
It is observed from the above calculations that day and humidity have a minimum MSE, i.e., 29,540. Here, day is considered as a decision node under branch L(T-48) ≥ 1689. Now, the node day has two branches, i.e., day “1” and “0”. In order to identify the decision/leaf node under each branch, Table A28 is divided into two subtables, as presented in Table A45 and in Table A46. From Table A45, it is observed that the branch corresponding to day “1” has a leaf node with value 3343.
Table A45. Subtable: L(T-48) ≥ 1689 and day = “1”.
Table A45. Subtable: L(T-48) ≥ 1689 and day = “1”.
L(T-24)SEASONTEMPHUMIDITYL(T)
≥22652≥67.5<703343
Table A46. Subtable: L(T-48) ≥ 1689 and Day = “0”.
Table A46. Subtable: L(T-48) ≥ 1689 and Day = “0”.
L(T-24)SEASONTEMPHUMIDITYL(T)
≥22651≥67.5≥702260
≥22652≥67.5≥702681
In order to identify the decision node among L(T-24), season, temperature, and humidity under the branch day “0”, Table A46 is divided into multiple subtables with respect to each feature.
  • A subtable based on input feature L(T-24) and target variable L(T) is presented in Table A47. From Table A47, it is observed that input feature L(T-24) has an MSE value of 44,310.25.
Table A47. L(T-24) vs. L(T) for L(T-48) ≥ 1689 and day “0”.
Table A47. L(T-24) vs. L(T) for L(T-48) ≥ 1689 and day “0”.
L(T-24)L(T)PredictionSquared ErrorMSE
≥226522602470.544,310.2544,310.25
≥226526812470.544,310.25
  • A subtable based on input feature deason and target variable L(T) is presented in Table A48. From Table A48, it is observed that input feature season has an MSE value of 0.
Table A48. Season vs. L(T) for L(T-48) ≥ 1689 and day “0”.
Table A48. Season vs. L(T) for L(T-48) ≥ 1689 and day “0”.
SeasonL(T)PredictionSquared ErrorMSE
12260226000
2268126810
  • A subtable based on input feature temperature and target variable L(T) is presented in Table A49. From Table A49, it is observed that input feature temperature has an MSE value of 44,310.25.
Table A49. Temperature vs. L(T) for L(T-48) ≥ 1689 and day “0”.
Table A49. Temperature vs. L(T) for L(T-48) ≥ 1689 and day “0”.
TemperatureL(T)PredictionSquared ErrorMSE
≥67.522602470.544,310.2544,310.25
≥67.526812470.544,310.25
  • A subtable based on input feature humidity and target variable L(T) is presented in Table A50. From Table A50, it is observed that input feature humidity has an MSE value of 44,310.25.
Table A50. Humidity vs. L(T) for L(T-48) ≥ 1689 and day “0”.
Table A50. Humidity vs. L(T) for L(T-48) ≥ 1689 and day “0”.
HumidityL(T)PredictionSquared ErrorMSE
≥7022602470.544,310.2544,310.25
≥7026812470.544,310.25
It is observed from the above calculations that feature “Season” has a minimum MSE, i.e., 0. Here, “Season” is considered as a decision node under branch day “0”. Now node “Season” has two branches, i.e., season “1” and “2” as presented in Table A48. From Table A48, it is observed that the branch corresponding to season “1” has a leaf node with value 2260 and season “2” has a leaf node with value 2681. Finally, the complete decision tree to predict load L(T) based on the features, i.e., L(T-24), L(T-48), day, season, temperature, and humidity is shown in Figure A1. The decision tree shown in Figure A1 was used to predict the load shown in Table A1 and the predicted load is shown in Table A51. From Table A51, it is observed that both actual and predicted load values are equal.
Figure A1. Regression tree architecture with sample data.
Figure A1. Regression tree architecture with sample data.
Computers 11 00119 g0a1
Table A51. Predicted load from sample data using regression tree.
Table A51. Predicted load from sample data using regression tree.
L(T-24)L(T-48)DAYSEASONTEMPHUMIDITYL(T) L p (T)
2176412116788432432
2354182901688822602260
2777264702708326812681
3112320312756733433343
1663154910757315791579
1010102700719310181018

References

  1. Kersting, W.H. Distribution System Modeling and Analysis; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
  2. Willis, H.L. Spatial Electric Load Forecasting; CRC Press: Boca Raton, FL, USA, 2002. [Google Scholar]
  3. Henselmeyer, S.; Grzegorzek, M. Short-Term Load Forecasting Using an Attended Sequential Encoder-Stacked Decoder Model with Online Training. Appl. Sci. 2021, 11, 4927. [Google Scholar] [CrossRef]
  4. Shohan, M.J.A.; Faruque, M.O.; Foo, S.Y. Forecasting of Electric Load Using a Hybrid LSTM-Neural Prophet Model. Energies 2022, 15, 2158. [Google Scholar] [CrossRef]
  5. Grzeszczyk, T.A.; Grzeszczyk, M.K. Justifying Short-Term Load Forecasts Obtained with the Use of Neural Models. Energies 2022, 15, 1852. [Google Scholar] [CrossRef]
  6. Kiprijanovska, I.; Stankoski, S.; Ilievski, I.; Jovanovski, S.; Gams, M.; Gjoreski, H. Houseec: Day-ahead household electrical energy consumption forecasting using deep learning. Energies 2020, 13, 2672. [Google Scholar] [CrossRef]
  7. Shah, I.; Iftikhar, H.; Ali, S.; Wang, D. Short-term electricity demand forecasting using components estimation technique. Energies 2019, 12, 2532. [Google Scholar] [CrossRef] [Green Version]
  8. Jiang, H.; Zhang, Y.; Muljadi, E.; Zhang, J.J.; Gao, D.W. A short-term and high-resolution distribution system load forecasting approach using support vector regression with hybrid parameters optimization. IEEE Trans. Smart Grid 2016, 9, 3341–3350. [Google Scholar] [CrossRef]
  9. Zhang, Z.; Dou, C.; Yue, D.; Zhang, B. Predictive voltage hierarchical controller design for islanded microgrids under limited communication. IEEE Trans. Circuits Syst. I Regul. Pap. 2021, 69, 933–945. [Google Scholar] [CrossRef]
  10. Zhang, Z.; Mishra, Y.; Yue, D.; Dou, C.; Zhang, B.; Tian, Y.C. Delay-tolerant predictive power compensation control for photovoltaic voltage regulation. IEEE Trans. Ind. Inform. 2020, 17, 4545–4554. [Google Scholar] [CrossRef]
  11. Hong, T.; Fan, S. Probabilistic electric load forecasting: A tutorial review. Int. J. Forecast. 2016, 32, 914–938. [Google Scholar] [CrossRef]
  12. Fallah, S.N.; Ganjkhani, M.; Shamshirband, S.; Chau, K.w. Computational intelligence on short-term load forecasting: A methodological overview. Energies 2019, 12, 393. [Google Scholar] [CrossRef] [Green Version]
  13. Veeramsetty, V.; Deshmukh, R. Electric power load forecasting on a 33/11 kV substation using artificial neural networks. SN Appl. Sci. 2020, 2, 855. [Google Scholar] [CrossRef] [Green Version]
  14. Veeramsetty, V.; Mohnot, A.; Singal, G.; Salkuti, S.R. Short term active power load prediction on a 33/11 kv substation using regression models. Energies 2021, 14, 2981. [Google Scholar] [CrossRef]
  15. Veeramsetty, V.; Chandra, D.R.; Salkuti, S.R. Short-term electric power load forecasting using factor analysis and long short-term memory for smart cities. Int. J. Circuit Theory Appl. 2021, 49, 1678–1703. [Google Scholar] [CrossRef]
  16. Veeramsetty, V.; Reddy, K.R.; Santhosh, M.; Mohnot, A.; Singal, G. Short-term electric power load forecasting using random forest and gated recurrent unit. Electr. Eng. 2022, 104, 307–329. [Google Scholar] [CrossRef]
  17. Veeramsetty, V.; Rakesh Chandra, D.; Salkuti, S.R. Short Term Active Power Load Forecasting Using Machine Learning with Feature Selection. In Next Generation Smart Grids: Modeling, Control and Optimization; Springer: Berlin/Heidelberg, Germany, 2022; pp. 103–124. [Google Scholar]
  18. Veeramsetty, V.; Chandra, D.R.; Grimaccia, F.; Mussetta, M. Short Term Electric Power Load Forecasting Using Principal Component Analysis and Recurrent Neural Networks. Forecasting 2022, 4, 149–164. [Google Scholar] [CrossRef]
  19. Chemetova, S.; Santos, P.; Ventim-Neves, M. Load forecasting in electrical distribution grid of medium voltage. In Proceedings of the Doctoral Conference on Computing, Electrical and Industrial Systems, Costa de Caparica, Portugal, 11–13 April 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 340–349. [Google Scholar]
  20. Couraud, B.; Roche, R. A distribution loads forecast methodology based on transmission grid substations SCADA Data. In Proceedings of the 2014 IEEE Innovative Smart Grid Technologies-Asia (ISGT ASIA), Kuala Lumpur, Malaysia, 20–23 May 2014; pp. 35–40. [Google Scholar]
  21. Andriopoulos, N.; Magklaras, A.; Birbas, A.; Papalexopoulos, A.; Valouxis, C.; Daskalaki, S.; Birbas, M.; Housos, E.; Papaioannou, G.P. Short term electric load forecasting based on data transformation and statistical machine learning. Appl. Sci. 2020, 11, 158. [Google Scholar] [CrossRef]
  22. Boriratrit, S.; Srithapon, C.; Fuangfoo, P.; Chatthaworn, R. Metaheuristic Extreme Learning Machine for Improving Performance of Electric Energy Demand Forecasting. Computers 2022, 11, 66. [Google Scholar] [CrossRef]
  23. Wang, Y.; Liu, M.; Bao, Z.; Zhang, S. Short-Term Load Forecasting with Multi-Source Data Using Gated Recurrent Unit Neural Networks. Energies 2018, 11, 1138. [Google Scholar] [CrossRef] [Green Version]
  24. Li, Y.; Huang, Y.; Zhang, M. Short-Term Load Forecasting for Electric Vehicle Charging Station Based on Niche Immunity Lion Algorithm and Convolutional Neural Network. Energies 2018, 11, 1253. [Google Scholar] [CrossRef] [Green Version]
  25. Hu, Z.; Ma, J.; Yang, L.; Li, X.; Pang, M. Decomposition-Based Dynamic Adaptive Combination Forecasting for Monthly Electricity Demand. Sustainability 2019, 11, 1272. [Google Scholar] [CrossRef] [Green Version]
  26. Amoasi Acquah, M.; Kodaira, D.; Han, S. Real-Time Demand Side Management Algorithm Using Stochastic Optimization. Energies 2018, 11, 1166. [Google Scholar] [CrossRef] [Green Version]
  27. Nagbe, K.; Cugliari, J.; Jacques, J. Short-Term Electricity Demand Forecasting Using a Functional State Space Model. Energies 2018, 11, 1120. [Google Scholar] [CrossRef] [Green Version]
  28. Kiptoo, M.K.; Adewuyi, O.B.; Lotfy, M.E.; Amara, T.; Konneh, K.V.; Senjyu, T. Assessing the techno-economic benefits of flexible demand resources scheduling for renewable energy–based smart microgrid planning. Future Internet 2019, 11, 219. [Google Scholar] [CrossRef] [Green Version]
  29. Yu, J.; Park, J.H.; Kim, S. A New Input Selection Algorithm Using the Group Method of Data Handling and Bootstrap Method for Support Vector Regression Based Hourly Load Forecasting. Energies 2018, 11, 2870. [Google Scholar] [CrossRef] [Green Version]
  30. Kampelis, N.; Tsekeri, E.; Kolokotsa, D.; Kalaitzakis, K.; Isidori, D.; Cristalli, C. Development of demand response energy management optimization at building and district levels using genetic algorithm and artificial neural network modelling power predictions. Energies 2018, 11, 3012. [Google Scholar] [CrossRef] [Green Version]
  31. Jin, X.B.; Zheng, W.Z.; Kong, J.L.; Wang, X.Y.; Bai, Y.T.; Su, T.L.; Lin, S. Deep-learning forecasting method for electric power load via attention-based encoder-decoder with bayesian optimization. Energies 2021, 14, 1596. [Google Scholar] [CrossRef]
  32. Han, M.; Zhong, J.; Sang, P.; Liao, H.; Tan, A. A Combined Model Incorporating Improved SSA and LSTM Algorithms for Short-Term Load Forecasting. Electronics 2022, 11, 1835. [Google Scholar] [CrossRef]
  33. Taleb, I.; Guerard, G.; Fauberteau, F.; Nguyen, N. A Flexible Deep Learning Method for Energy Forecasting. Energies 2022, 15, 3926. [Google Scholar] [CrossRef]
  34. Aldhyani, T.H.; Alkahtani, H. A bidirectional long short-term memory model algorithm for predicting COVID-19 in gulf countries. Life 2021, 11, 1118. [Google Scholar] [CrossRef]
  35. Zhang, W.; Wu, P.; Peng, Y.; Liu, D. Roll motion prediction of unmanned surface vehicle based on coupled CNN and LSTM. Future Internet 2019, 11, 243. [Google Scholar] [CrossRef] [Green Version]
  36. Lu, Y.; Li, Y.; Xie, D.; Wei, E.; Bao, X.; Chen, H.; Zhong, X. The application of improved random forest algorithm on the prediction of electric vehicle charging load. Energies 2018, 11, 3207. [Google Scholar] [CrossRef] [Green Version]
  37. Maitah, M.; Malec, K.; Ge, Y.; Gebeltová, Z.; Smutka, L.; Blažek, V.; Pánková, L.; Maitah, K.; Mach, J. Assessment and Prediction of Maize Production Considering Climate Change by Extreme Learning Machine in Czechia. Agronomy 2021, 11, 2344. [Google Scholar] [CrossRef]
  38. López-Espinoza, E.D.; Zavala-Hidalgo, J.; Mahmood, R.; Gómez-Ramos, O. Assessing the impact of land use and land cover data representation on weather forecast quality: A case study in central mexico. Atmosphere 2020, 11, 1242. [Google Scholar] [CrossRef]
  39. Hevia-Montiel, N.; Perez-Gonzalez, J.; Neme, A.; Haro, P. Machine Learning-Based Feature Selection and Classification for the Experimental Diagnosis of Trypanosoma cruzi. Electronics 2022, 11, 785. [Google Scholar] [CrossRef]
  40. Alaoui, A.; Hallama, M.; Bär, R.; Panagea, I.; Bachmann, F.; Pekrun, C.; Fleskens, L.; Kandeler, E.; Hessel, R. A New Framework to Assess Sustainability of Soil Improving Cropping Systems in Europe. Land 2022, 11, 729. [Google Scholar] [CrossRef]
  41. Meira, J.; Carneiro, J.; Bolón-Canedo, V.; Alonso-Betanzos, A.; Novais, P.; Marreiros, G. Anomaly Detection on Natural Language Processing to Improve Predictions on Tourist Preferences. Electronics 2022, 11, 779. [Google Scholar] [CrossRef]
  42. Veeramsetty, V. Electric Power Load Dataset, 2022.
Figure 1. Main functions of a distribution management system.
Figure 1. Main functions of a distribution management system.
Computers 11 00119 g001
Figure 3. Regression tree architecture for HALF.
Figure 3. Regression tree architecture for HALF.
Computers 11 00119 g003
Figure 4. Regression tree architecture for DALF.
Figure 4. Regression tree architecture for DALF.
Computers 11 00119 g004
Figure 5. Distribution of predicted and actual load samples with regression tree model for HALF. (a) Hour-ahead load forecasting: predicted load vs. actual load for training data. (b) Hour-ahead load forecasting: predicted load vs. actual load for testing data.
Figure 5. Distribution of predicted and actual load samples with regression tree model for HALF. (a) Hour-ahead load forecasting: predicted load vs. actual load for training data. (b) Hour-ahead load forecasting: predicted load vs. actual load for testing data.
Computers 11 00119 g005
Figure 6. Distribution of predicted and actual load samples with regression tree model for DALF. (a) Day-ahead load forecasting: predicted load vs. actual load for training data, (b) Day-ahead load forecasting: predicted load vs. actual load for testing data.
Figure 6. Distribution of predicted and actual load samples with regression tree model for DALF. (a) Day-ahead load forecasting: predicted load vs. actual load for training data, (b) Day-ahead load forecasting: predicted load vs. actual load for testing data.
Computers 11 00119 g006
Figure 7. Distribution of predicted and actual load samples with regression tree model on 31 December 2021.
Figure 7. Distribution of predicted and actual load samples with regression tree model on 31 December 2021.
Computers 11 00119 g007
Figure 8. Web application to predict active power load on a 33/11 kV substation in Godishala, Telangana State, India, developed using a regression tree model.
Figure 8. Web application to predict active power load on a 33/11 kV substation in Godishala, Telangana State, India, developed using a regression tree model.
Computers 11 00119 g008
Figure 9. Regression tree performance with respect to various seasons for HALF.
Figure 9. Regression tree performance with respect to various seasons for HALF.
Computers 11 00119 g009
Figure 10. Regression tree performance with respect to various day status for HALF.
Figure 10. Regression tree performance with respect to various day status for HALF.
Computers 11 00119 g010
Figure 11. Machine learning models performance comparison in terms of training and testing mean squared errors.
Figure 11. Machine learning models performance comparison in terms of training and testing mean squared errors.
Computers 11 00119 g011
Table 1. Sample load data for first 5 h on 1 January 2021 at the 33/11 kV substation in Godishala.
Table 1. Sample load data for first 5 h on 1 January 2021 at the 33/11 kV substation in Godishala.
TIMEVOLTAGE (kV)CURRENT (A)cos( ϕ )POWER (kW)
01-0011.61020.961967
02-0011.61020.961967
03-0011.61020.961967
04-0011.31300.962443
05-0011.21480.962756
Table 2. First 6 samples from the dataset which was used to train and test the machine learning models for hour-ahead forecasting.
Table 2. First 6 samples from the dataset which was used to train and test the machine learning models for hour-ahead forecasting.
SampleL(T-1)L(T-2)L(T-24)L(T-48)DAYSEASONTemperatureHumidityL(T)
02175.941446.57471828.9161967.3881165922236.757
12236.7572175.9411828.9161967.3881165922236.757
22236.7572236.7571828.9161967.3881165922354.481
32354.4812236.7571892.2662442.6071177522511.446
42511.4462354.4812532.3452756.2061177522805.756
52805.7562511.4463012.1583203.1581177523212.469
Table 3. First 6 samples from the dataset which was used to train and test the machine learning models for day-ahead forecasting.
Table 3. First 6 samples from the dataset which was used to train and test the machine learning models for day-ahead forecasting.
SampleL(T-24)L(T-48)DAYSEASONTemperatureHumidityL(T)
01828.9161967.3881165922236.757
11828.9161967.3881165922236.757
21828.9161967.3881165922354.481
31892.2662442.6071177522511.446
42532.3452756.2061177522805.756
53012.1583203.1581177523212.469
Table 4. Training and testing errors of regression tree model for HALF.
Table 4. Training and testing errors of regression tree model for HALF.
DepthError MetricsTrainingTestingDepthError MetricsTrainingTesting
5MSE0.0040.00525MSE0.0000.008
RMSE0.0660.072 RMSE0.0010.090
MAE0.0390.044 MAE0.0000.049
10MSE0.0010.00630MSE0.0000.008
RMSE0.0370.080 RMSE0.0010.089
MAE0.0210.043 MAE0.0000.048
15MSE0.0000.00834MSE0.0000.008
RMSE0.0110.087 RMSE0.0010.087
MAE0.0060.047 MAE0.0000.048
Table 5. Training and testing errors of regression tree model for DALF.
Table 5. Training and testing errors of regression tree model for DALF.
DepthError MetricsTrainingTestingDepthError MetricsTrainingTesting
6MSE0.006500.0086924MSE0.000010.01419
RMSE0.080630.09322 RMSE0.004360.11911
MAE0.049310.05647 MAE0.000680.06984
12MSE0.002090.0120130MSE0.000010.01367
RMSE0.045690.10959 RMSE0.003200.11692
MAE0.026210.06301 MAE0.000250.06930
18MSE0.000190.0135836MSE0.000010.01387
RMSE0.013900.11655 RMSE0.003190.11776
MAE0.005470.06793 MAE0.000240.06933
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Veeramsetty, V.; Sai Pavan Kumar, M.; Salkuti, S.R. Platform-Independent Web Application for Short-Term Electric Power Load Forecasting on 33/11 kV Substation Using Regression Tree. Computers 2022, 11, 119. https://doi.org/10.3390/computers11080119

AMA Style

Veeramsetty V, Sai Pavan Kumar M, Salkuti SR. Platform-Independent Web Application for Short-Term Electric Power Load Forecasting on 33/11 kV Substation Using Regression Tree. Computers. 2022; 11(8):119. https://doi.org/10.3390/computers11080119

Chicago/Turabian Style

Veeramsetty, Venkataramana, Modem Sai Pavan Kumar, and Surender Reddy Salkuti. 2022. "Platform-Independent Web Application for Short-Term Electric Power Load Forecasting on 33/11 kV Substation Using Regression Tree" Computers 11, no. 8: 119. https://doi.org/10.3390/computers11080119

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop