Hybrid Short-Term Load Forecasting Scheme Using Random Forest and Multilayer Perceptron †

: A stable power supply is very important in the management of power infrastructure. One of the critical tasks in accomplishing this is to predict power consumption accurately, which usually requires considering diverse factors, including environmental, social, and spatial-temporal factors. Depending on the prediction scope, building type can also be an important factor since the same types of buildings show similar power consumption patterns. A university campus usually consists of several building types, including a laboratory, administrative ofﬁce, lecture room, and dormitory. Depending on the temporal and external conditions, they tend to show a wide variation in the electrical load pattern. This paper proposes a hybrid short-term load forecast model for an educational building complex by using random forest and multilayer perceptron. To construct this model, we collect electrical load data of six years from a university campus and split them into training, validation, and test sets. For the training set, we classify the data using a decision tree with input parameters including date, day of the week, holiday, and academic year. In addition, we consider various conﬁgurations for random forest and multilayer perceptron and evaluate their prediction performance using the validation set to determine the optimal conﬁguration. Then, we construct a hybrid short-term load forecast model by combining the two models and predict the daily electrical load for the test set. Through various experiments, we show that our hybrid forecast model performs better than other popular single forecast models.


Introduction
Recently, the smart grid has been gaining much attention as a feasible solution to the current global energy shortage problem [1]. Since it has many benefits, including those related to reliability, economics, efficiency, environment, and safety, diverse issues and challenges to implementing such a smart grid have been extensively surveyed and proposed [2]. A smart grid [1,2] is the next-generation power grid that merges information and communication technology (ICT) with the existing electrical grid to advance electrical power efficiency to the fullest by exchanging information between energy suppliers and consumers in real-time [3]. This enables the energy supplier to perform efficient energy management for renewable generation sources (solar radiation, wind, etc.) by accurately forecasting power consumption [4]. Therefore, for a more efficient operation, the smart grid requires precise electrical load forecasting in both the short-term and medium-term [5,6]. Short-term load forecasting

Dataset
To build an effective STLF model for buildings or building clusters, it is crucial to collect their real power consumption data that show the power usage of the buildings in the real world. For this purpose, we considered three clusters of buildings with varied purposes and collected their daily power consumption data from a university in Korea. The first cluster is composed of 32 buildings with academic purposes, such as the main building, amenities, department buildings, central library, etc. The second cluster is composed of 20 buildings, with science and engineering purposes. Compared to other clusters, this cluster showed a much higher electrical load, mainly due to the diverse experimental equipment and devices used in the laboratories. The third cluster comprised 16 dormitory buildings, whose power consumption was based on the residence pattern. In addition, we gathered other data, including the academic schedule, weather, and event calendar. The university employs the і-Smart system to monitor the electrical load in real time. This is an energy portal service operated by the Korea Electric Power Corporation (KEPCO) to give consumers electricity-related data such as electricity usage and expected bill to make them use electricity efficiently. Through this і-Smart system, we collected the daily power consumption of six years, from 2012 to 2017. For weather information, we utilized the regional synoptic meteorological data provided by the Korea Meteorological Office (KMA). KMA's mid-term forecast provides information including the date, weather, temperature (maximum and minimum), and its reliability for more than seven days.
To build our hybrid STLF model, we considered nine variables; month, day of the month, day of the week, holiday, academic year, temperature, week-ahead load, year-ahead load, and LSTM Networks. In particular, the day of the week is a categorized variable and we present the seven days using integers 1 to 7 according to the ISO-8601 standard [32]. Accordingly, 1 indicates Monday and 7 indicates Sunday. Holiday, which includes Saturdays, Sundays, national holidays, and school anniversary [33], indicates whether the campus is closed or not. A detailed description of the input variables can be found in [12].

Dataset
To build an effective STLF model for buildings or building clusters, it is crucial to collect their real power consumption data that show the power usage of the buildings in the real world. For this purpose, we considered three clusters of buildings with varied purposes and collected their daily power consumption data from a university in Korea. The first cluster is composed of 32 buildings with academic purposes, such as the main building, amenities, department buildings, central library, etc. The second cluster is composed of 20 buildings, with science and engineering purposes. Compared to other clusters, this cluster showed a much higher electrical load, mainly due to the diverse experimental equipment and devices used in the laboratories. The third cluster comprised 16 dormitory buildings, whose power consumption was based on the residence pattern. In addition, we gathered other data, including the academic schedule, weather, and event calendar. The university employs the i-Smart system to monitor the electrical load in real time. This is an energy portal service operated by the Korea Electric Power Corporation (KEPCO) to give consumers electricity-related data such as electricity usage and expected bill to make them use electricity efficiently. Through this i-Smart system, we collected the daily power consumption of six years, from 2012 to 2017. For weather information, we utilized the regional synoptic meteorological data provided by the Korea Meteorological Office (KMA). KMA's mid-term forecast provides information including the date, weather, temperature (maximum and minimum), and its reliability for more than seven days.
To build our hybrid STLF model, we considered nine variables; month, day of the month, day of the week, holiday, academic year, temperature, week-ahead load, year-ahead load, and LSTM Networks. In particular, the day of the week is a categorized variable and we present the seven days using integers 1 to 7 according to the ISO-8601 standard [32]. Accordingly, 1 indicates Monday and 7 indicates Sunday. Holiday, which includes Saturdays, Sundays, national holidays, and school anniversary [33], indicates whether the campus is closed or not. A detailed description of the input variables can be found in [12].

Temperature Adjustment
Generally, the power consumption increases in summer and winter due to the heavy use of air conditioning and electric heating appliances, respectively. Since the correlations between the temperature and electrical load in terms of maximum and minimum temperatures are not that high, we need to adjust the daily temperature for more effective training based on the annual average temperature of 12.5 provided by KMA [34], using Equation (1) as follows: To show that the adjusted temperature has a higher correlation than the minimum and maximum temperatures, we calculated the Pearson correlation coefficients between the electrical load and minimum, maximum, average, and adjusted temperatures, as shown in Table 1. In the table, the adjusted temperature shows higher coefficients for all building clusters compared to other types of temperatures. The electrical load data from the past form one of the perfect clues for forecasting the power consumption of the future and the power consumption pattern relies on the day of the week, workday, and holiday. Hence, it is necessary to consider many cases to show the electrical load of the past in the short-term load forecasting. For instance, if the prediction time is a holiday and the same day in the previous week was a workday, then their electrical loads can be very different. Therefore, it would be better to calculate the week-ahead load at the prediction time not by the electrical load data of the coming week, but by averaging the electrical loads of the days of the same type in the previous week. Thus, if the prediction time is a workday, we use the average electrical load of all workdays of the previous week as an independent variable. Likewise, if the prediction time is a holiday, we use the average electrical load of all holidays of the previous week. In this way, we reflect the different electrical load characteristics of the holiday and workday in the forecasting. Figure 2 shows an example of estimating the week-ahead consumption. If the current time is Tuesday, we already know the electrical load of yesterday (Monday). Hence, to estimate the week-ahead consumption of the coming Monday, we use the average of the electrical loads of workdays of the last week. temperature and electrical load in terms of maximum and minimum temperatures are not that high, we need to adjust the daily temperature for more effective training based on the annual average temperature of 12.5 provided by KMA [34], using Equation (1) as follows: To show that the adjusted temperature has a higher correlation than the minimum and maximum temperatures, we calculated the Pearson correlation coefficients between the electrical load and minimum, maximum, average, and adjusted temperatures, as shown in Table 1. In the table, the adjusted temperature shows higher coefficients for all building clusters compared to other types of temperatures. The electrical load data from the past form one of the perfect clues for forecasting the power consumption of the future and the power consumption pattern relies on the day of the week, workday, and holiday. Hence, it is necessary to consider many cases to show the electrical load of the past in the short-term load forecasting. For instance, if the prediction time is a holiday and the same day in the previous week was a workday, then their electrical loads can be very different. Therefore, it would be better to calculate the week-ahead load at the prediction time not by the electrical load data of the coming week, but by averaging the electrical loads of the days of the same type in the previous week. Thus, if the prediction time is a workday, we use the average electrical load of all workdays of the previous week as an independent variable. Likewise, if the prediction time is a holiday, we use the average electrical load of all holidays of the previous week. In this way, we reflect the different electrical load characteristics of the holiday and workday in the forecasting. Figure 2 shows an example of estimating the week-ahead consumption. If the current time is Tuesday, we already know the electrical load of yesterday (Monday). Hence, to estimate the week-ahead consumption of the coming Monday, we use the average of the electrical loads of workdays of the last week.

Estimating the Year-Ahead Consumption
The year-ahead load aims to utilize the trend of the annual electrical load by showing the power consumption of the same week of the previous year. However, the electrical load of the exact same week of the previous year is not always used because the days of the week are different and popular Korean holidays are celebrated according to the lunar calendar. Every week of the year has a unique week number based on ISO-8601 [32]. As mentioned before, the average of power consumptions of all holidays or workdays of the week are calculated to which the prediction time belongs and depending on the year, one year comprises 52 or 53 weeks. In the case of an issue such as the prediction time belongs to the 53rd week, there is no same week number in the previous year. To solve this problem, the power consumption of the 52nd week from the previous year is utilized since the two weeks have similar external factors. Especially, electrical loads show very low consumption on a special holiday like the Lunar New Year holidays and Korean Thanksgiving days [35]. To show this usage pattern, the average power consumption of the previous year's special holiday represents the year-ahead's special holiday's load. The week number can differ depending on the year, so representing the year-ahead's special holiday power consumption cannot be done directly using the week number of the holiday. This issue can be handled easily by exchanging the power consumption of the week and the week of the holiday in the previous year.

Load Forecasting Based on LSTM Networks
A recurrent neural network (RNN) is a class of ANN where connections between units form a directed graph along a sequence. Unlike a feedforward neural network (FFNN), RNNs can use their internal state or memory to process input sequences [36]. RNNs can handle time series data in many applications, such as unsegmented, connected handwriting recognition or speech recognition [37]. However, RNNs have problems in that the gradient can be extremely small or large; these problems are called the vanishing gradient and exploding gradient problems. If the gradient is extremely small, RNNs cannot learn data with long-term dependencies. On the other hand, if the gradient is extremely large, it moves the RNN parameters far away and disrupts the learning process. To handle the vanishing gradient problem, previous studies [38,39] have proposed sophisticated models of RNN architectures. One successful model is long short-term memory (LSTM), which solves the RNN problem through a cell state and a unit called a cell with multiple gates. LSTM Networks use a method that influences the behind data by reflecting the learned information with the previous data as the learning progresses with time. Therefore, it is suitable for time series data, such as electrical load data. However, the LSTM networks can reflect yesterday's information for the next day's forecast. Since the daily load forecasting of a smart grid aims to be scheduled until after a week, LSTM networks are not suitable for application to the daily load forecasting because there is a gap of six days. Furthermore, if the prediction is not valid, the LSTM model method can give a bad result. For

Load Forecasting Based on LSTM Networks
A recurrent neural network (RNN) is a class of ANN where connections between units form a directed graph along a sequence. Unlike a feedforward neural network (FFNN), RNNs can use their internal state or memory to process input sequences [36]. RNNs can handle time series data in many applications, such as unsegmented, connected handwriting recognition or speech recognition [37]. However, RNNs have problems in that the gradient can be extremely small or large; these problems are called the vanishing gradient and exploding gradient problems. If the gradient is extremely small, RNNs cannot learn data with long-term dependencies. On the other hand, if the gradient is extremely large, it moves the RNN parameters far away and disrupts the learning process. To handle the vanishing gradient problem, previous studies [38,39] have proposed sophisticated models of RNN architectures. One successful model is long short-term memory (LSTM), which solves the RNN problem through a cell state and a unit called a cell with multiple gates. LSTM Networks use a method that influences the behind data by reflecting the learned information with the previous data as the learning progresses with time. Therefore, it is suitable for time series data, such as Energies 2018, 11, 3283 7 of 20 electrical load data. However, the LSTM networks can reflect yesterday's information for the next day's forecast. Since the daily load forecasting of a smart grid aims to be scheduled until after a week, LSTM networks are not suitable for application to the daily load forecasting because there is a gap of six days. Furthermore, if the prediction is not valid, the LSTM model method can give a bad result. For instance, national holidays, quick climate change, and unexpected institution-related events can produce unexpected power consumption. Therefore, the LSTM model alone is not enough for short-term load forecasting due to its simple structure and weakness in volatility. Eventually, a similar life pattern can be observed depending on the day of the week, which in return gives a similar electrical load pattern. This study uses the LSTM networks method to show the repeating pattern of power consumptions depending on the day of the week. The input variable period of the training dataset is composed of the electrical load from 2012 to 2015 and the dependent variable of the training set is composed of the electrical load from 2013 to 2015. We performed 10-fold cross-validation on a rolling basis for optimal hyper-parameter detection.

Discovering Similar Time series Patterns
So far, diverse machine learning algorithms have been proposed to predict electrical load [1,3,6,14]. However, they showed different prediction performances depending on the various factors. For instance, for time series data, one algorithm gives the best prediction performance on one segment, while for other segments, another algorithm can give the best performance. Hence, one way to improve the accuracy in this case is to use more than one predictive algorithm. We consider electrical load data as time series data and utilize a decision tree to classify the electrical load data by pattern similarity. Decision trees [26,40] can handle both categorical and numerical data, and are highly persuasive because they can be analyzed through each branch of the tree, which represents the process of classification or prediction. In addition, they exhibit a high explanatory power because they can confirm which independent variables have a higher impact when predicting the value of a dependent or target variable. On the other hand, continuous variables used in the prediction of values of the time series are regarded as discontinuous values, and hence, the prediction errors are likely to occur near the boundary of separation. Hence, using the decision tree, we divide continuous dependent variables into several classes with a similar electrical load pattern. To do this, we use the training dataset from the previous three years. We use the daily electrical load as the attribute of class label or dependent variable and the characteristics of the time series as independent variables, representing year, month, day, day of the week, holiday, and academic year. Details on the classification of time series data will be shown in the experimental section.

Building a Hybrid Forecasting Model
To construct our hybrid prediction model, we combine both a random forest model and multilayer perceptron model. Random forest is a representative ensemble model, while MLP is a representative deep learning model; both these models have shown an excellent performance in forecasting electrical load [5,12,[15][16][17].
Random forest [41,42] is an ensemble method for classification, regression, and other tasks. It constructs many decision trees that can be used to classify a new instance by the majority vote. Each decision tree node uses a subset of attributes randomly selected from the original set of attributes. Random forest runs efficiently on large amounts of data and provides a high accuracy [43]. In addition, compared to other machine learning algorithms such as ANN and SVR, it requires less fine-tuning of its hyper-parameters [16]. The basic parameters of random forest include the total number of trees to be generated (nTree) and the decision tree-related parameters (mTry), such as minimum split and split criteria [17]. In this study, we find the optimal mTry and nTree for our forecasting model by using the training set and then verify their performance using the validation and test set. The authors in [42] suggested that a random forest should have 64 to 128 trees and we use 128 trees for our hybrid STLF model. In addition, the mTry values used for this study provided by scikit-learn are as follows. A typical ANN architecture, known as a multilayer perceptron, is a type of machine learning algorithm that is a network of individual nodes, called perceptrons, organized in a series of layers [5]. Each layer in MLP is categorized into three types: an input layer, which receives features used for prediction; a hidden layer, where hidden features are extracted; and an output layer, which yields the determined results. Among them, the hidden layer has many factors affecting performance, such as the number of layers, the number of nodes involved, and the activation function of the node [44]. Therefore, the network performance depends on how the hidden layer is configured. In particular, the number of hidden layers determines the depth or shallowness of the network. In addition, if there are more than two hidden layers, it is called a deep neural network (DNN) [45]. To establish our MLP, we use two hidden layers since we do not require many input variables in our prediction model. In addition, we use the same epochs and batch size as the LSTM model we described previously. Furthermore, as an activation function, we use an exponential linear unit (ELU) without the rectified linear unit (ReLU), which has gained increasing popularity recently. However, its main disadvantage is that the perceptron can die in the learning process. ELU [46] is an approximate function introduced to overcome this disadvantage, and can be defined by: The next important consideration is to choose the number of hidden nodes. Many studies have been conducted to determine the optimal number of hidden nodes for a given task [15,47,48], and we decided to use two different hidden node counts: the number of input variables and 2/3 of the number of input variables. Since we use nine input variables, the numbers of hidden nodes we will use are 9 and 6. Since our model has two hidden layers, we can consider three configurations, depending on the hidden nodes of the first and second layers: (9,9), (9,6), and (6, 6). As in the random forest, we evaluate these configurations using the training data for each building cluster and identify the configuration that gives the best prediction accuracy. After that, we compare the best MLP model with the random forest model for each cluster type.

Time series Cross-Validation
To construct a forecasting model, the dataset is usually divided into a training set and test set. Then, the training set is used in building a forecasting model and the test set is used in evaluating the resulting model. However, in traditional time series forecasting techniques, the prediction performance is poorer as the interval between the training and forecasting times increases. To alleviate this problem, we apply the time series cross-validation (TSCV) based on the rolling forecasting origin [49]. A variation of this approach focuses on a single prediction horizon for each test set. In this approach, we use various training sets, each containing one extra observation than the previous one. We calculate the prediction accuracy by first measuring the accuracy for each test set and then averaging the results of all test sets. This paper proposes a one-week (sum from 145 h to 168 h) look-ahead view of the operation for smart grids. For this, a seven-step-ahead forecasting model is built to forecast the power consumption at a single time (h + 7 + i − 1) using the test set with observations at several times (1, 2, . . . , h + i − 1). If h observations are required to produce a reliable forecast, then, for the total T observations, the process works as follows.
For i = 1 to T − h − 6: (1) Select the observation at time h + 7 + i − 1 for the test set; (2) Consider the observations at several times 1, 2, · · · , h + i − 1 to estimate the forecasting model; (3) Calculate the 7-step error on the forecast for time h + 7 + i − 1; (4) Compute the forecast accuracy based on the errors obtained.

Performance Metrics
To analyze the forecast model performance, several metrics, such as mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE), are used, which are well-known for representing the prediction accuracy.

Mean Absolute Percentage Error
MAPE is a measure of prediction accuracy for constructing fitted time series values in statistics, specifically in trend estimation. It usually presents accuracy as a percentage of the error and can be easier to comprehend than the other statistics since this number is a percentage. It is known that the MAPE is huge if the actual value is very close to zero. However, in this work, we do not have such values. The formula for MAPE is shown in Equation (3), where A t and F t are the actual and forecast values, respectively. In addition, n is the number of times observed.

Root Mean Square Error
RMSE (also called the root mean square deviation, RMSD) is used to aggregate the residuals into a single measure of predictive ability. The square root of the mean square error, as shown in Equation (4), is the forecast value F t and an actual value A t . The mean square standard deviation of the forecast value F t for the actual value A t is the square root of RMSE. For an unbiased estimator, RMSE is the square root of the variance, which denotes the standard error.

Mean Absolute Error
In statistics, MAE is used to evaluate how close forecasts or predictions are to the actual outcomes. It is calculated by averaging the absolute differences between the prediction values and the actual observed values. MAE is defined as shown in Equation (5), where F t is the forecast value and A t is the actual value.

Experimental Results
To evaluate the performance of our hybrid forecast model, we carried out several experiments. We performed preprocessing for the dataset in the Python environment and performed forecast modeling using scikit-learn [50], TensorFlow [51], and Keras [52]. We used six years of daily electrical load data from 2012 to 2017. Specifically, we used electrical load data of 2012 to configure input variables for a training set. Data from 2013 to 2015 was used as the training set, the data of 2016 was the validation set, and the data of 2017 was the test set. Table 2 shows the statistics of the electric consumption data for each cluster, including the number of valid cases, mean, and standard deviation. As shown in the table, Cluster B has a higher power consumption and wider deviation than clusters A and C.

Forecasting Model Configuration
In this study, we used the LSTM networks method to show the repeating pattern of power consumptions depending on the day of the week. We tested diverse cases and investigated the accuracy of load forecasting for the test cases to determine the best input data selection. As shown in Figure 4, the input variables consist of four electrical loads from one week ago to four weeks ago as apart at a weekly interval to reflect the cycle of one month. In the feature scaling process, we rescaled the range of the measured values from 0 to 1. We used tanh as the activation function and calculated the loss by using the mean absolute error. We used the adaptive moment estimation (Adam) method, which combines momentum and root mean square propagation (RMSProp), as the optimization method. The Adam optimization technique weighs the time series data and maintains the relative size difference between the variables. In the configuration of the remaining hyper-parameters of the model, we set the number of hidden units to 60, epochs to 300, and batch size to 12.

Forecasting Model Configuration
In this study, we used the LSTM networks method to show the repeating pattern of power consumptions depending on the day of the week. We tested diverse cases and investigated the accuracy of load forecasting for the test cases to determine the best input data selection. As shown in Figure 4, the input variables consist of four electrical loads from one week ago to four weeks ago as apart at a weekly interval to reflect the cycle of one month. In the feature scaling process, we rescaled the range of the measured values from 0 to 1. We used tanh as the activation function and calculated the loss by using the mean absolute error. We used the adaptive moment estimation (Adam) method, which combines momentum and root mean square propagation (RMSProp), as the optimization method. The Adam optimization technique weighs the time series data and maintains the relative size difference between the variables. In the configuration of the remaining hyper-parameters of the model, we set the number of hidden units to 60, epochs to 300, and batch size to 12. We experimented with the LSTM model [52] by changing the time step from one cycle to a maximum of 30 cycles. Table 3 shows the mean absolute percentage error (MAPE) of each cluster for each time step. In the table, the predicted results with the best accuracy are marked in bold. Table 3 shows that the 27th time step indicates the most accurate prediction performance. In general, the electricity demand is relatively high in summer and winter, compared to that in spring and autumn. In other words, it has a rise and fall curve in a half-year cycle, and the 27th time step corresponds to a week number of about half a year.  We experimented with the LSTM model [52] by changing the time step from one cycle to a maximum of 30 cycles. Table 3 shows the mean absolute percentage error (MAPE) of each cluster for each time step. In the table, the predicted results with the best accuracy are marked in bold. Table 3 shows that the 27th time step indicates the most accurate prediction performance. In general, the electricity demand is relatively high in summer and winter, compared to that in spring and autumn. In other words, it has a rise and fall curve in a half-year cycle, and the 27th time step corresponds to a week number of about half a year.
We performed similar a time series pattern analysis based on the decision tree through 10-fold cross-validation for the training set. Among several options provided by scikit-learn to construct a decision tree, we considered the criterion, max depth, and max features. The criterion is a function for measuring the quality of a split. In this paper, we use the "mae" criterion for our forecasting model since it gives the smallest error rate between the actual and the classification value. Max depth is the maximum depth of the tree. We set max depth to 3, such that the number of leaves is 8. In other words, the decision tree classifies the training datasets into eight similar time series. Max features are the number of features to consider when looking for the best split. We have chosen the "auto" option to reflect all time variables. Figure 5 shows the result of the similar time series recognition for each cluster using the decision tree. Here, samples indicate the number of tuples in each leaf. The total number of samples is 1095, since we are considering the daily consumption data over three years. Value denotes the classification value of the similar time series. Table 4 shows the number of similar time series samples according to the decision tree for 2016 and 2017.  1  62  62  62  62  62  62  2  14  14  14  14  140  138  3  107  111  107  111  20  20  4  64  58  64  58  25  25  5  14  15  14  15  1  2  6  53  52  53  52  10  9  7  16  16  5  5  99  98  8  36  37  47  48  9  11  Total  366  365  366  365  366  365 The predictive evaluation consists of two steps. Based on the forecast models of random forest and MLP, we used the training set from 2013 to 2015 and predicted the verification period of 2016. The objectives are to detect models with optimal hyper-parameters and then to select models with a better predictive performance in similar time series. Next, we set the training set to include data from 2013 to 2016 and predicted the test period of 2017. Here, we evaluate the predictive performance of the hybrid model we have constructed. Table 5 is the prediction result composed of MLP, and MAPE is used as a measure of prediction accuracy and the predicted results with the best accuracy are marked in bold. As shown in the table, overall, a model consisting of nine and nine nodes in each hidden layer showed the best performance. Although the nine and six nodes in each hidden layer showed a better performance in Cluster A, the model consisting of nine and nine nodes was selected to generalize the predictive model.
We performed similar a time series pattern analysis based on the decision tree through 10-fold cross-validation for the training set. Among several options provided by scikit-learn to construct a decision tree, we considered the criterion, max depth, and max features. The criterion is a function for measuring the quality of a split. In this paper, we use the "mae" criterion for our forecasting model since it gives the smallest error rate between the actual and the classification value. Max depth is the maximum depth of the tree. We set max depth to 3, such that the number of leaves is 8. In other words, the decision tree classifies the training datasets into eight similar time series. Max features are the number of features to consider when looking for the best split. We have chosen the "auto" option to reflect all time variables. Figure 5 shows the result of the similar time series recognition for each cluster using the decision tree. Here, samples indicate the number of tuples in each leaf. The total number of samples is 1095, since we are considering the daily consumption data over three years. Value denotes the classification value of the similar time series. Table 4 Cluster C  2016  2017  2016  2017  2016  2017  1  62  62  62  62  62  62  2  14  14  14  14  140  138  3  107  111  107  111  20  20  4  64  58  64  58  25  25  5  14  15  14  15  1  2  6  53  52  53  52  10  9  7  16  16  5  5  99  98  8  36  37  47  48  9  11  Total  366  365  366  365 366 365  13 of 20 Table 6 shows the MAPE of random forest for each cluster under different mTry and the predicted results with the best accuracy are marked in bold. Since the input variable is 9, sqrt and log2 are recognized as 3 and the results are the same. We choose the sqrt that is commonly used [16,43].     Table 7 shows the electrical load forecast accuracy for the pattern classification of similar time series for 2016. In the table, the predicted results with a better accuracy are marked in bold. For instance, in the case of Cluster A, while random forest shows a better prediction accuracy for patterns 1 to 4, MLP shows a better accuracy for patterns 5 to 8. Using this table, we can choose a more accurate prediction model for the pattern and cluster type.  Table 8 shows prediction results of our model for 2017. Comparing Tables 7 and 8, we can see that MLP and random forest (RF) have a matched relative performance in most cases. There are two exceptions in Cluster A and one exception in Cluster B and they are underlined and marked in bold. In the case of Cluster C, MLP and RF gave the same relative performance. This is good evidence that our hybrid model can be generalized.   Table 7 shows the electrical load forecast accuracy for the pattern classification of similar time series for 2016. In the table, the predicted results with a better accuracy are marked in bold. For instance, in the case of Cluster A, while random forest shows a better prediction accuracy for patterns 1 to 4, MLP shows a better accuracy for patterns 5 to 8. Using this table, we can choose a more accurate prediction model for the pattern and cluster type.  Table 8 shows prediction results of our model for 2017. Comparing Tables 7 and 8, we can see that MLP and random forest (RF) have a matched relative performance in most cases. There are two exceptions in Cluster A and one exception in Cluster B and they are underlined and marked in bold. In the case of Cluster C, MLP and RF gave the same relative performance. This is good evidence that our hybrid model can be generalized.

Comparison of Forecasting Techniques
To verify the validness and applicability of our hybrid daily load forecasting model, the predictive performance of our model should be compared with other machine learning techniques, including ANN and SVR, which are very popular predictive techniques [6]. In this comparison, we consider eight models, including our model, as shown in Table 9. In the table, GBM (Gradient Boosting Machine) is a type of ensemble learning technique that implements the sequential boosting algorithm. A grid search can be used to find optimal hyper-parameter values for the SVR/GBM [25]. SNN (Shallow Neural Network) has three layers of input, hidden, and output, and it was found that the optimal number of the hidden nodes is nine for all clusters.
Tables 9-11 compare the prediction performance in terms of MAPE, RMSE, and MAE, respectively. From the tables, the predicted results with the best accuracy are marked in bold and we observe that our hybrid model exhibits a superb performance in all categories. Figure 7 shows more detail of the MAPE distribution for each cluster using a box plot. We can deduce that our hybrid model has fewer outliers and a smaller maximum error. In addition, the error rate increases in the case of long holidays in Korea. For instance, during the 10-day holiday in October 2017, the error rate increased significantly. Another cause of high error rates is due to outliers or missing values because of diverse reasons, such as malfunction and surge. Figure 8 compares the daily load forecasts of our hybrid model and actual daily usage on a quarterly basis. Overall, our hybrid model showed a good performance in predictions, regardless of diverse external factors such as long holidays. Nevertheless, we can see that there are several time periods when forecasting errors are high. For instance, from 2013 to 2016, Cluster B showed a steady increase in its power consumption due to building remodeling and construction. Even though the remodeling and construction are finished at the beginning of 2017, the input variable for estimating the year-ahead consumption is still reflecting such an increase. This was eventually adjusted properly for the third and fourth quarters by the time series cross-validation. On the other hand, during the remodeling, the old heating, ventilation, and air conditioning (HVAC) system was replaced by a much more efficient one and the new system started its operation in December 2017. Even though our hybrid model predicted much higher power consumption for the cold weather in the third week, the actual power consumption was quite low due to the new HVAC system. Lastly, Cluster A showed a high forecasting error on 29 November 2017. It turned out that at that time, there were several missing values in the actual power consumption. This kind of problem can be detected by using the outlier detection technique.    Nevertheless, we can see that there are several time periods when forecasting errors are high. For instance, from 2013 to 2016, Cluster B showed a steady increase in its power consumption due to building remodeling and construction. Even though the remodeling and construction are finished at the beginning of 2017, the input variable for estimating the year-ahead consumption is still reflecting such an increase. This was eventually adjusted properly for the third and fourth quarters by the time series cross-validation. On the other hand, during the remodeling, the old heating, ventilation, and air conditioning (HVAC) system was replaced by a much more efficient one and the new system

Conclusions
In this paper, we proposed a hybrid model for short-term load forecasting for higher educational institutions, such as universities, using random forest and multilayer perceptron. To construct our forecast model, we first grouped university buildings into an academic cluster, science/engineering cluster, and dormitory cluster, and collected their daily electrical load data over six years. We divided the collected data into a training set, a validation set, and a test set. For the training set, we classified electrical load data by pattern similarity using the decision tree technique. We considered various configurations for random forest and multilayer perceptron and evaluated their prediction performance by using the validation set to select the optimal model. Based on this work, we constructed our hybrid daily electrical load forecast model by selecting models with a better predictive performance in similar time series. Finally, using the test set, we compared the daily electrical load prediction performance of our hybrid model and other popular models. The comparison results show that our hybrid model outperforms other popular models. In conclusion, we showed that LSTM networks are effective for reflecting an electrical load depending on the day of the week and the decision tree is effective in classifying time series data by similarity. Moreover, using these two forecasting models in a hybrid model can complement their weaknesses.
In order to improve the accuracy of electrical load prediction, we plan to use a supervised learning method reflecting various statistically significant data. Also, we will analyze the prediction performance in different look-ahead points (from the next day to a week) using probabilistic forecasting.