An Intelligent Data-Driven Approach for Electrical Energy Load Management Using Machine Learning Algorithms

: Data-driven electrical energy efﬁciency management is the emerging trend in electrical energy forecasting and management. This fusion of data science, artiﬁcial intelligence, and electrical energy management has turned out to be the most precise and robust energy management solution. The Smart Energy Informatics Lab (SEIL) of the Indian Institute of Technology (IIT) conducted an experimental study in 2019 to collect massive data on university campus energy consumption. The comprehensive comparative study preparatory to the recommendation of the best candidate out of 24 machine learning algorithms on the SEIL dataset is presented in this work. In this research work, an exhaustive parametric and empirical comparative study is conducted on the SEIL dataset for the recommendation of the optimal machine learning algorithm. The simulation results established the ﬁndings that Bagged Trees, Fine Trees, and Medium Trees are, respectively, the best-, second-best-, and third-best-performing algorithms in terms of efﬁcacy. On the contrary, a reverse ranking is observed in terms of efﬁciency. This is grounded in the fact that Bagged Trees is most effective algorithm for the said application and Medium Trees is the most efﬁcient one. Likewise, Fine Trees has the optimum tradeoff between efﬁcacy and efﬁciency.


Introduction
Electrical energy is an essential and likely increasingly scarce resource around the globe. Its scarcity could be handled in the following bifold manner. The first solution is to increase generation capacity, and the second is to improve load management and energy demand forecasting [1]. The scope of this study is mainly focused on electrical energy management and demand forecasting. This domain is well represented in the literature and a large number of researchers have submitted their contributions. The research has established that the major venues of energy consumption are buildings. This includes such things as residential buildings, commercial buildings, and industrial buildings [2]. It is also reported in the literature that building energy consumption accounts for 39% of global energy consumption and 38% of greenhouse gas emissions [3]. The SEIL published a study in 2019 endeavoring to collect massive amounts of data on the energy consumption of residential buildings and university campuses. Both datasets are reported as the most recent and benchmark dataset of data-driven energy forecasting systems considering residential buildings and university campuses [4]. In this research study, a university campus energy consumption dataset is under consideration.
It has been observed that large-scale university campuses demand a significant amount of electricity to fulfill their HVAC requirements. This includes, but is not limited to, the power requirements of lab equipment, machines, office equipment, classes, auditorium IT, and support equipment [5]. However, on the other hand, electrical energy is a constrained

Literature Review
This section will provide an in-depth review of the literature on the application of machine learning algorithms to energy predictions. Primarily, the scope of this literature review is bifold. First, the performance evaluation of various machine learning algorithms for electrical energy forecasting is considered. This establishes the logical rationale for the utility of machine learning algorithm energy forecasting. Secondly, the existing work on the benchmark dataset of SEIL University Campus is presented. This clarifies the research gap in the SEIL University Campus dataset.
In 2019, Johannesen et al. [16] presented a study to investigate the response of a regression model to a Sydney dataset. This dataset comprised weather, time stamps, and load demand. The dataset was locally gathered for four years. The research has diffused/mapped the time stamp, weather, and load demand data as common instances. Finally, the authors employed the Random Forest Regressor, k-Nearest Neighbor Regressor and Linear Regression for load forecasting. The authors presented a conclusion that for the Sydney region dataset of electrical consumption, Random Forest is the best candidate for short-term (30 min) prediction. Likewise, the kNN Regression Tool is found to be relatively efficient for long term prediction (24 h) [17]. Another study was conducted on the power consumption of higher education university campuses in Korea. In this study, the authors also mapped the weather data and power consumption data. The authors in this study reduced the features dimension using principal component analysis (PCA) and then employed ANN and SVM for energy demand prediction. The authors then concluded that for the said dataset, the ANN is found to be the better candidate for energy demand prediction [18].
In parallel to the university campus, residential buildings are similar venues for energy consumption. Chou et al. [19] have presented a study on energy demand forecasting for residential buildings. In particular, the authors presented a hybrid model of prediction and optimization. They reported that the hybrid evolutionary-neuro system is found to be better as compared to the classical machine learning network for their respective datasets. The dataset was locally gathered from a residential building. In the same year, another group of authors also presented a study to rationalize the evolutionary-neuro system for energy forecasting. In this study, the authors have presented a hybrid model of evolutionary algorithms, i.e., teaching learning-based optimization (TLBO) and ANN. They also advocated the efficacy of the hybrid model for their respective datasets [20]. In another study, its author has presented a hybrid model for optimization and energy demand forecasting. This approach was tested on a dataset of South Korea's hourly energy consumption. The author claimed that the proposed model could be useful for other datasets. However, a limited rationale of this claim is set forth in their paper [21]. Ahmad et al. [22] have published a survey in Sustainable Cities and Society. In this survey, the authors have presented a comprehensive comparison through a literature survey about the efficacy of machine learning algorithms for energy demand forecasting. They have concluded that the Bayesian regularization back-propagation neural networks (BRB-NNs) and the Levenberg-Marquardt back-propagation neural networks (LMBNNs) are found to be relatively efficient predictors for electrical energy demand forecasting.
The second fold of this literature review is related to the existing investigation into the benchmark dataset of SEIL. In this section, the research contribution of the SEIL University Campus dataset is presented. In addition, the research gap is highlighted in the automatic learning application on electrical power prediction and existing work on the SEIL. A group of SEIL researchers used LSTM and improved the algorithm for optimizing the sinusoidal cosine. It translates into accurate and reliable power consumption predictions for short, medium and long-term forecasts. They argued that the hybridization of the enhanced Sine Cosine and LSTM algorithms has transformed into a robust power consumption model [4]. In a separate publication, SEIL researchers used kCNN-LSTM to provide accurate predictions of energy consumption in buildings. This experiment was based on real-time energy consumption data from the Kanwal Rekhi building, an academic building of the Indian Institute of Technology (IIT), Mumbai. The proposed approach uses k means clustering to conduct cluster analysis and understand the energy usage model. The proposed approaches were formed and tested using real-time energy consumption data from a four-floor building at IIT Bombay, India [4,23,24]. In-addition to the above recent work, Table 1 shows a literature matrix to highlight recent work of energy management using artificial intelligence with the title, proposed work, and the corresponding limitations of the proposed work. Table 1. Literature map on energy management using artificial intelligence.

Ref
Year Proposed Work Limitations [25] 2018 In this literature, the large data is called Big Data for energy management.
The availability of the referenced massive datasets is limited. [26] 2019 Deep learning approaches have outdone themselves in dealing with big data.
There is the challenge of managing the large data package. [27] 2019 The authors have strongly argued for the usefulness of Deep Learning frameworks to design the electrical energy efficiency management system.
Huge standard reference limits set of data for electric energy. [28] 2016 This is a comparative consideration of two varieties of the Deep Learning network: (LSTM) and sequence architecture (S2 S).
In this study, coverage is limited to a single residential customer. The authors stated that they used data on submissions; however, the details of the data were not included in this document.
[31] 2016 This paper presented the Factored Conditional Limited Boltzmann Machine (FCRBM) to forecast energy demands. The model has been tested on the EcoGrid EU data set.
The author of this paper needed to compare his research study with other variations of Deep Learning architecture and currently performed systems.
[32] 2017 The authors have compared the convolutional neural network (CNN/ConvNet) with the study presented in 2016.
No new CNN/ConvNet architecture was presented in this study, and neither was the pre-trained network described. [33] 2017 Initially formed the Recurrent neural networks (RNN) using a data-driven approach.
This model-less, evidence-based approach has surpassed the approach of model-based studies in management of energy.
[34] 2017 This study provided a comprehensive comparison of conventional machine learning algorithms, including vector support machines, Gaussian processes, regression trees, overall amplification and linear regression, and the Deep Learning method.
Validation of the claim related to the energy management system is observed to be unclear.

[35] 2018
This method is optimized for building energy management, and explores two DL algorithms, namely, Deep Q-learning (DQN) and Deep Policy Gradient (DPG), at the same time.
The parametric fringe of the proposed technique proved insufficient. Moreover, the cognitive scope of the gadget seems to be very trendy.
[36] 2018 The authors have used Recurrent neural networks (RNN) to forecast time series data of energy consumption for a university campus.
The robustness of this work could be enhanced if the master data set had been chosen.
[37] 2019 The authors have suggested using the methods of alternating direction of multipliers (ADMM) and accelerated alternating direction of multipliers (AADM) to find the optimum value of operation of the micro network distribution.
This study did not include the data-based approach to energy forecasting. Moreover, it was felt that the parametric comparison was missing in this work.
[38] 2019 This work submitted a data-driven, Deep Learning approach to district-wide energy demand forecasting.
This study appears deficient because of the absence of a baseline data set and extensive comparison with pre-existing models.
[39] 2020 The FS-FCRBM-GWDO hybrid model is superior to the existing models presented in this study.
The gap between the existing real-world reference data set and the pre-established model is deficient.
[40] 2020 Major contributions include device-based real-time power management via a common cloud data monitoring server.
The actual application was outside the scope of study. A new machine learning model for forecasting the energy usage on an hourly basis in a residential building is proposed.
The performance of the machine learning algorithm is compromised as a result of the performance plateau highlighted with big data.
[44] 2021 Deep Learning is the best candidate for power prediction based on time series. The concern in this study is increasingly associated with the fact that the performance of ML and DL is found.
Three small data sets were used to validate the study. the authors could not pursue the novel Deep Learning architecture of a machine learning model for data-driven energy efficiency forecasting.
[45] 2020 Comparison of Machine Learning and DL algorithms with the residential building dataset. The comparison was based on competition between ML and DL.
These works also highlight the urgent need for a master data set for domain-specific applications such as hospitals, schools, universities, residential buildings, etc.
[46] 2020 [47] 2020 [48] 2019 The work continued to use the RNN with the LSTM approach to forecast energy demand. This work also suggested a gap in the development of any new DL architecture and a pre-formed network of master datasets.
This work also suggests a gap in the development of new ML on master datasets.
[50] 2020 SEIL-IIT introduced the database medium for adaptive data visualization of large sensors.
[51] 2020 Smart Energy Informatics Lab (SEIL), the same group, studied the hybrid model for predicting energy consumption in buildings through LSTM networks.
[52] 2017 There has been a push for a data-driven approach to the intelligent energy management system. [53] 2019 In this work reference is made to the data set generated by their research. Another area that this group has targeted is solar photovoltaic optimisation and building thermal modelling [53][54][55]. This is beyond the scope of our study.

Gap Analysis
After careful analysis of the existing work in the domain of data-driven energy management, it has been determined that the utilization of artificial intelligence is now inevitable for robust and precise energy management. In this connection, the benchmarking of the domain specific dataset is an essential need. Moreover, the development of robust machine learning algorithms will facilitate the objective. After detailed analysis, the SEIL dataset is found to be the most suitable dataset for energy forecasting for a university campus. However, the comprehensive empirical comparison of machine learning algorithms in the literature is found to be deficient. In this study, the gap will be bridged by submitting an exhaustive evaluation of large number of machine learning algorithms on the SEIL dataset.
The key deliverable of this study is the recommendation of the best machine learning algorithm. The recommendation will be based on empirical facts and figures.

Dataset Description and System Setup
The SEIL dataset was gathered from an IIT university building. The building has four floors and is divided into three wings. The dataset includes data from December 2016 through July 2018. All datasets are in CSV format. The datasets are all at oneminute granularity with current, voltage, and power as input attributes, and real energy consumption as an output attribute. The dataset is massive, with a volume of 20 GB. Data has been extracted from various units in the university building, such as building level, class level, auditorium level, lab level, office level, etc. In this study, the building-level data is taken into account in order to predict the total energy consumption of the building. Table 2 illustrates the list of attributes of the dataset. Since the dataset is labeled and continuous, the machine learning prediction algorithms have been selected for training and testing. A total of 24 machine learning prediction algorithms were tested to determine the best machine learning algorithm. The grounds for the decision are the functions of RMSE, R-Squared, MSE, MAE, prediction speed, and computation time. Please refer to Table 2. Attributes 1 to 23 served as the input attributes and attribute number 24 is taken as the output attribute. The power meter records the phase voltage (V1, V2, and V3) and the phase current (A1, A2, and A3). Attributes 8 to 10 describe the apparent power, whereas attributes 12 to 14 represent the real power consumed. The reactive power for each phase is shown in attributes 16 to 19, and the power factor for each phase is shown as PF1, PF2, and PF3 in attributes 20 to 22. The output real power in Wh is taken as the response variable. It is to be noted that in the dataset, the real power Wh is identical to the apparent and reactive power. Therefore, real power has been selected as the response variable.

Methodology
A quad-folded cascading methodology was used in this study. Figure 1 shows the pictorial illustration of the proposed methodology. In the first phase, the SEIL dataset is used, and the total energy consumption at the building level is considered. The building level energy consumption includes the auditorium, classroom, conference room, building floor, labs, offices, server room, and sub-server room. In the second phase, the building level dataset is first divided into 70% training samples and 30% testing samples with random permutation. The training set is used to train 24 machine learning algorithms. In the third phase, the parametric performance of each ML algorithm is evaluated as the function of training parameters like RMSE, R-squared, MSE, MAE, and Prediction Speed. Similarly, the same parameters for the testing phase are computed. Finally, the ranking of algorithms based on their efficacy and efficiency are established. Figure 2 illustrates the functional inside view of the training testing phase. The training and testing phase constitutes the inside workings of Phase 2 of the proposed methodology (Figure 1).

Results
This section presents an exhaustive and comprehensive empirical evaluation of the best candidate of machine algorithm for energy demand prediction using the SEIL dataset. Figures 3-8 show the visual inferences of Table 3 for easy reference. Figures 9-20 illustrate the prediction vs. actual and residual of training and testing for each algorithm. In this study, 24 machine learning prediction algorithms were evaluated on benchmark performance parameters. In the predicted vs. actual graph, the x-axis shows the true response and the y-axis represents the predicted response. The approximate linearity of these curves is shown in black and the actual observation is depicted with blue dots. In the predicted vs. actual graph, the perceptual variance can be observed as the distance between the predicted value and the actual value. The larger variation corresponds to the poor prediction performance of the respective algorithm. Moreover, the degree and measure of variation for the respective algorithm are shown in Table 3. The degree and measure of variation are the function of RMSE, R-Squared, MSE, and MAE for training and testing events. Likewise, prediction speed and computation time also establish grounds for the efficiency of the algorithm. The higher value of the error measure translates into the poor performance of the candidate algorithm. Figures 9-20 also include the Residual Error of training and testing for the top three performing algorithms. The x-axis of the residual curve refers to the predicted response, and the y-axis corresponds to the residual error. The close proximity of residual error to the predicted observation corresponds to the higher efficacy of the candidate algorithm. The inference is also correlated with the empirical and absolute values illustrated in Table 3. Foremost in this investigation, the top three performing machine learning algorithms for energy demand prediction at a university campus based on SEIL datasets are selected in general. Subsequently, an in-depth investigation of the performance parameter is performed. The graphical illustration and empirical findings have established that Bagged Trees (1), Fine Trees (2), and Medium Trees (3) are the top three performing algorithms in terms of efficacy. On the contrary, a reverse ranking is observed in terms of efficiency. This can also be inferred from Table 4. The performance measures such as RMSE, R-Squared, MSE, and MAE refer to the efficacy of the algorithm. Likewise, prediction speed and training time reflects the efficiency of the algorithm.                     Table 4 reveals an intriguing fact: on the SEIL dataset, Bagged Trees is the most effective algorithm for said application, while Medium Trees is the most efficient algorithm for electrical energy demand prediction for university campuses. Likewise, Fine Trees have the optimum tradeoff between efficacy and efficiency. The Bagged Trees produces 75%, 56%, and 76% improvement in RMSE, MSE, and MAE, respectively, in training and testing, as compared to the Fine Trees. Similarly, the Bagged Trees produces 56%, 32%, and 75% improvement in RMSE, MSE, and MAE, respectively, in training and testing as compared to the Medium Trees algorithm. This metric shows the percentage of improvement in terms of efficacy between the algorithms. Similarly, in terms of efficiency improvement, the Medium Trees is 32 times more efficient in prediction speed and 14.8 times more efficient in training time, as compared to the Bagged Trees algorithm. Likewise, it is 1.3 times more efficient in prediction speed and 3 times more efficient in training time, as compared to the Fine Trees algorithm.

Discussion
Careful analysis of the simulation results presented in Section 6 establishes the rationale for the best candidate of machine learning algorithm for energy demand forecasting. To the best of the authors' knowledge, little work has been presented in the literature to date attempting to determine the best ML algorithm based on empirical facts. The systematic and empirical evaluation of a wide range of machine learning algorithms reveals that the Bagged Trees, Fine Trees, and Medium Trees algorithms are the top three ranked algorithms for energy demand forecasting using the SEIL dataset. This finding presents a knowledge add-on to the SEIL project consisting of the recommendation of the best machine learning algorithm for energy demand forecasting. Moreover, the new and customized algorithm is suggested to have further improvements in efficiency and efficacy. Based on this study, it is to be suggested that use of the novel and customized variant of the Medium Trees is strongly advised if efficiency is the primary goal. Similar to this, use of the innovative and customized Bagged Trees method is recommended if higher order efficacy is sought. The aforementioned conclusions are solely based on the empirical data and graphical facts presented in this work.
It has been inferred from the literature that the performance of a load management system is mainly the function of its efficiency and effectiveness, depending upon the application area. Most of the time, the knowledge of the optimum trade-off between efficiency and effectiveness is the dire need. Moreover, machine learning algorithms have been reported to be the best candidates for load management and demand forecasting. However, the selection of the relevant algorithm(s) for the specific application in order to attain higher performance is the pressing need. This study has contributed to a further extension of the research on the SEIL dataset by proposing the best candidate machine learning algorithm for the higher degree of performance. The proposed claim is also advocated by the empirical performance parameters of machine learning algorithms.

Conclusions
This study has presented a comprehensive and exhaustive empirical evaluation of machine learning algorithm for energy demand prediction on the SEIL dataset. The Smart Energy Informatics Lab (SEIL) of the Indian Institute of Technology (IIT) Bombay India, conducted an experimental study in 2019 to collect a massive dataset on university campus energy consumption. A comprehensive comparative study for the recommendation of the best candidate of machine learning algorithm on the SEIL dataset was the missing element in the recent literature. This study has well filled that gap left for subsequent investigation. After the careful and detailed empirical assessment of performance parameters, it has been concluded that the Bagged Trees is the most effective algorithm for energy demand prediction applications and the Medium Trees is the most efficient algorithm for real-time systems. Moreover, Fine Trees has the optimum tradeoff between efficacy and efficiency.