Daily Peak-Electricity-Demand Forecasting Based on Residual Long Short-Term Network

: Forecasting the electricity demand of buildings is a key step in preventing a high concentration of electricity demand and optimizing the operation of national power systems. Recently, the overall performance of electricity-demand forecasting has been improved through the application of long short-term memory (LSTM) networks, which are well-suited to processing time-series data. However, previous studies have focused on improving the accuracy in forecasting only overall electricity demand, but not peak demand. Therefore, this study proposes adding residual learning to the LSTM approach to improve the forecast accuracy of both peak and total electricity demand. Using a residual block, the residual LSTM proposed in this study can map the residual function, which is the difference between the hypothesis and the observed value, and subsequently learn a pattern for the residual load. The proposed model delivered root mean square errors (RMSE) of 10.5 and 6.91 for the peak and next-day electricity demand forecasts, respectively, outperforming the benchmark models evaluated. In conclusion, the proposed model provides highly accurate forecasting information, which can help consumers achieve an even distribution of load concentration and countries achieve the stable operation of the national power system.


Introduction
It is important to accurately forecast the electricity demand of consumers at all times based on the building's electricity tariff structure to reduce their demand charges. In South Korea, the electricity tariff system for buildings uses the peak load demand per hour from the previous year as the contract demand and divides the demand charges into off-peak, mid-peak, and on-peak time [1]. Furthermore, the electricity-tariff system uses a method for calculating the energy charge that imposes a weighted charge on electricity-load consumption based on the amount of time electricity is used. When the maximum power used exceeds the contract demand, a surcharge of 1.5-2.5 times the basic rate is imposed based on excessive consumption [2]. Accordingly, consumers must devise strategies to avoid increased electricity surcharges, considering the energy consumption of the building and adjusting their electricity demand to prevent the peak load from exceeding the contract demand by accurately forecasting peak-usage time and amount. Additionally, consumers must attempt to shift the concentrated demand during peak-time periods through other time-zones based on an accurate electricity-demand forecast [2].
If the concentration of energy consumption exceeds the supply capacity because of a mismatch between demand and supply at the peak, it may cause significant social problems such as blackouts. Supply reserves should be secured by adding facilities, possibly involving construction and maintenance costs for additional power plants [3,4]. According to the Korea Electric Power Corporation's (KEPCO) 2020 electricity statistics, building

•
The residual LSTM can help consumers reduce demand charges by distributing the concentration of electricity demand based on accurate forecasting performance; • The residual LSTM can help consumers in individual buildings to distribute the concentration of electricity demand during peak hours, reduce electricity demand concentration at the regional level, and contribute to the stable operation of the national power system.

Input Variable(s) Time Step Method(s) Prediction Objective
Statistical-based Modeling [22] Historical lighting load, Weather information, Occupant information 30  Some studies have used statistical models, such as MLR or auto-regression and moving average (ARMA), to forecast building consumption demand that can learn linear relationships between variables. Fan et al., [22] predicted the peak demand of residential buildings using a general linear model (GLM) to identify which variables have the major effect on forecasting single demand peaks, obtaining a MAPE of 4.6% when forecasting peak demand in 30-min intervals. Ke et al., [24] predicted the short-term electricity load demand of campus buildings using direct curve fitting by polynomial regression, similar day approach, and MLR. After comparison, the similar day approach had a mean absolute percentage error (MAPE) of 3.37%, better than polynomial regression and MLR for direct curve fitting. When the observed relationship between the input variable and electricity demand is linear, this statistical model performs well. However, it is difficult to assign an appropriate model parameter for electricity-demand data with nonlinear relationships [3,15,28,29].
New models have been developed because of technological advancement. Machinelearning models such as artificial neural networks (ANN), support vector regression (SVR), and ensemble models have been used to learn the nonlinear relationship of electricitydemand data. Liu and Chen [10] predicted lighting energy consumption in office buildings using ANN and SVR, with an R 2 of 0.9273 for SVR, indicating a higher forecast accuracy  [18] predicted the peak electricity demand of an educational building using various statistical and machine-learning forecasting models to identify which variables have a major impact on the peak-demand forecast. After comparing the performances of the different methods, the ANN resulted in a MAPE of 4.89% when forecasting hourly demand peaks, which was a higher accuracy than those of the other statistical and machine-learning models evaluated. Fan et al., [21] predicted the electricity demand of non-residential buildings using an ensemble model to improve the predictive performance of a single machine-learning model. The MAPE for the ensemble model was 2.32% for the hourly electricity-demand forecast and 2.85% for the peak-demand forecast, indicating a higher accuracy than other statistical and machine-learning models used as base learners. Therefore, the machine-learning model can learn nonlinear data relationships to improve the predictive performance of electricity demand [30]. However, because the previous machine-learning model cannot transform the model architecture based on the characteristics of the input variables [27], it cannot effectively learn the relationship between the electricity-demand observation and the exogenous variable with a time series feature.
In a recent study, deep-learning models such as LSTM and CNN were used to learn data with sequential and spatial characteristics. Luo and Oyeldele [12] employed a LSTM to study forecasting of electricity demand of educational buildings, the results of which calculated a MAE value of 2.4 for their model, which renders it more reliable than MLP, a machine-learning approach. Jin et al., [13] used LSTM to forecast electricity demand of residential buildings. They reported that LSTM, a deep-learning approach, gave lower prediction errors than MLP and SVM for their time-series data. To compare the forecast performance of various LSTM models, Ullah et al., [26] compared the electricity-demand forecast results of residential buildings using the LSTM and BILSTM. As a result, the LSTM showed higher accuracy than the BILSTM with a MAPE of 1.4574% in hourly electricity-demand forecasts. Kim and Cho [16] predicted the electricity demand of a residential building using the CNN LSTM model with a CNN layer before the LSTM layer to extract complex and difficult-to-understand features from input variables. CNN LSTM had a MAPE of 32.83%, exhibiting higher predictive performance than MLR and LSTM. Additionally, Kim et al., [17] used the RICNN model that combined the CNN layer and the LSTM layer to learn the hidden-state vector of the future and previous times to determine the electricity demand of the building complexes. The RICNN model had a MAPE of 4.48-8.79%, higher than the MLP and LSTM trained with the same data. The overall performance of these deep-learning models in forecasting electricity demand improved due to the use of model architectures consistent with the characteristics and input variables of the electricity load demand. However, previous studies that have applied deep-learning models have not aimed to enhance the performance of peak-demand forecasts.
Recent studies on the prediction of the short-term electric load demand of buildings that have suggested using deep-learning models based on LSTM or a variant of it, have demonstrated excellent performance in time-series forecasting, e.g., [12,13,16,17,26]. Such LSTM-based models can improve the accuracy when forecasting components of weather data and electricity demand by learning the relationship between the input variables and the electricity demand data [31]. However, previous studies have not considered the residual load derived from various probabilistic factors, including the behavior of the building occupants, among the components of electricity demand. The residual load, which changes probabilistically according to the behaviors, needs, and desires of occupants, is a major cause of peak demand [32]. Therefore, a method is needed for learning and predicting the pattern in the residual load to improve the performance of both peak and total electricity demand forecasting. It is expected to improve the performance of data forecasts with unexpected values by improving the performance of peak-demand forecasts.

Methodology
In this study, we propose the use of an LSTM-based deep learning architecture that uses a residual block to learn and accurately predict the residual load in the total electricity demand of buildings. The model learns the overall electricity demand through an LSTM layer suitable for forecasting time series, and the residual load, which is not forecast by the LSTM, through a residual block. The residual LSTM consists of a residual block, an LSTM layer, and a dense layer. First, the proposed model learns the sequential features of electricity-demand data through an LSTM layer appropriate for time series prediction. Second, the model uses the residual block to intensively learn the residual load. Finally, the model outputs the final prediction value through the dense layer. Figure 1 shows the structure of the residual LSTM proposed in this study. electricity demand forecasting. It is expected to improve the performance of data forecasts with unexpected values by improving the performance of peak-demand forecasts.

Methodology
In this study, we propose the use of an LSTM-based deep learning architecture that uses a residual block to learn and accurately predict the residual load in the total electricity demand of buildings. The model learns the overall electricity demand through an LSTM layer suitable for forecasting time series, and the residual load, which is not forecast by the LSTM, through a residual block. The residual LSTM consists of a residual block, an LSTM layer, and a dense layer. First, the proposed model learns the sequential features of electricity-demand data through an LSTM layer appropriate for time series prediction. Second, the model uses the residual block to intensively learn the residual load. Finally, the model outputs the final prediction value through the dense layer. Figure 1 shows the structure of the residual LSTM proposed in this study.

Long Short-Term Memory (LSTM)
LSTM is a variant of recurrent neural network (RNN) with modifications to the cells. A general RNN is a model suitable for learning data with recursive characteristics of storing the result calculated for each time point in the internal memory of each cell. Equation (1) is calculated in the RNN prediction process: where ℎ denotes the output value of the RNN; ℎ denotes the weight; ℎ denotes the bias; and denotes the input vector. The RNN is used to calculate the output value ℎ of the cell at time t using the value ℎ −1 calculated from the input vector −1 at time t−1, which can learn the relationship between before and after data with a recursive characteristic. However, in RNN, as the network depth increases owing to the use of multiple cells,

Long Short-Term Memory (LSTM)
LSTM is a variant of recurrent neural network (RNN) with modifications to the cells. A general RNN is a model suitable for learning data with recursive characteristics of storing the result calculated for each time point in the internal memory of each cell. Equation (1) is calculated in the RNN prediction process: where h t denotes the output value of the RNN; W h denotes the weight; b h denotes the bias; and x t denotes the input vector. The RNN is used to calculate the output value h t of the cell at time t using the value h t−1 calculated from the input vector x t−1 at time t−1, which can learn the relationship between before and after data with a recursive characteristic. However, in RNN, as the network depth increases owing to the use of multiple cells, h and W h are repeatedly multiplied, and the long-time gradient accumulation value decreases to zero, causing a vanishing-gradient problem [33]. LSTM uses a cell that stores a calculation result in an internal memory through input, forget, and output gates to solve this problem. Figure 2 shows the cell structure of the LSTM. Equations (2)-(8) are calculated during the LSTM prediction process.
where f l t , i l t , and o l t denote the forget, input, and output gates, respectively; c l t denotes the cell state; h l t denotes the cell output; σ(x) denotes the activation function; and tanh(x) denotes the hypertangent. LSTM can solve the vanishing-gradient problem through the following process. First, the forget gate decides whether to store the value of c l t−1 by outputting the value calculated in the previous cell as a value of zero or one. Subsequently, the input gate stores the information on x t . x t stored through the input gate and the cell state c l t−1 at time t−1 are used to update c l t , the cell state at time t, thus facilitating using information stored through cells at a later point from the first cell to the cell at time t. Finally, the output gate outputs h l t , which is the cell's output value at time t, using x t and c l t updated through the input gate. Therefore, the vanishing-gradient problem can be solved by updating the cell state to prevent data with a large order difference from vanishing.
Mathematics 2022, 10, x FOR PEER REVIEW 6 of 18 ℎ and are repeatedly multiplied, and the long-time gradient accumulation value decreases to zero, causing a vanishing-gradient problem [33]. LSTM uses a cell that stores a calculation result in an internal memory through input, forget, and output gates to solve this problem. Figure 2 shows the cell structure of the LSTM. Equations (2)-(8) are calculated during the LSTM prediction process. (2) * ℎ , 1 1 ) where , , and ! denote the forget, input, and output gates, respectively; # denotes the cell state; ℎ denotes the cell output; denotes the activation function; and *+,ℎ denotes the hypertangent. LSTM can solve the vanishing-gradient problem through the following process. First, the forget gate decides whether to store the value of # by outputting the value calculated in the previous cell as a value of zero or one. Subsequently, the input gate stores the information on . stored through the input gate

Residual Learning
He et al., [34] presented a residual network (ResNet) as the first example of residual learning. Residual learning is performed through a residual block with a structure through which an input vector is shortcut-connected to an output layer. Figure 3a shows the structure of a normal deep learning network, while Figure 3b shows the structure of a deep learning network composed of residual blocks.
The equation for residual learning created by the residual block is expressed as follows: where x is the input vector for the first layer, H(x) denotes the output function computed by the stack layer, and F(x) denotes the residual function learned by a residual block. A normal deep-learning network calculates H(x) to represent an input vector through a learning process. However, the deep learning network with a residual block calculates H(x) as a linear combination of F(x) and x. The result is calculated through the residual block, as shown in Equation (9).
Mathematics 2022, 10, x FOR PEER REVIEW 7 of 18 the structure of a normal deep learning network, while Figure 3b shows the structure of a deep learning network composed of residual blocks.

Layer
Layer The equation for residual learning created by the residual block is expressed as follows: where is the input vector for the first layer, ( ) denotes the output function computed by the stack layer, and ( ) denotes the residual function learned by a residual block. A normal deep-learning network calculates ( ) to represent an input vector through a learning process. However, the deep learning network with a residual block calculates ( ) as a linear combination of ( ) and . The result is calculated through the residual block, as shown in Equation (9). In the backpropagation process, increasing the number of layers to improve the learning performance of a deep learning model causes a vanishing-gradient problem. Conversely, residual learning can prevent the vanishing-gradient problem by adding at least a value of one to the gradient, as the gradient value must be at least one [35,36].

Residual LSTM
Prakash et al., [37] proposed residual LSTM, using the residual block structure introduced by He et al., [34], to resolve the accuracy degradation caused by the vanishing gradient of LSTM. The proposed model consists of two LSTM layers and a shortcut connection, as shown in Figure 4, connected by a red dotted line. Additionally, an upper LSTM input vector is connected to an output layer through a skip connection. In the backpropagation process, increasing the number of layers to improve the learning performance of a deep learning model causes a vanishing-gradient problem. Conversely, residual learning can prevent the vanishing-gradient problem by adding at least a value of one to the gradient, as the gradient value must be at least one [35,36].

Residual LSTM
Prakash et al., [37] proposed residual LSTM, using the residual block structure introduced by He et al., [34], to resolve the accuracy degradation caused by the vanishing gradient of LSTM. The proposed model consists of two LSTM layers and a shortcut connection, as shown in Figure 4, connected by a red dotted line. Additionally, an upper LSTM input vector is connected to an output layer through a skip connection.
Compared with the LSTM layer model, residual LSTM has two advantages. First, residual LSTM can learn a residual through a residual function. In the residual LSTM shown in Figure 4, the output function of the residual block is derived as Equation (10) and transformed as Equation (11): where H(x l t ) denotes the output function of a residual block; F(x l t ) denotes the residual function; and x l t denotes the input vector. As shown in Figure 4, F(x l t ) can be transformed into an equation that expresses the difference between H(x l t ) of the residual block and x l t , which is the same as the residual, the difference between the forecasted and observed values. Residual LSTM learns by approximating F(x l t ) to zero to H(x l t ) ≈ x l t , for finding H(x l t ) that best expresses x l t . Therefore, residual LSTM can directly learn the residual through the residual-learning process. Second, residual LSTM can solve the vanishing-gradient problem that occurs with an increasing network depth.
In the learning performance-improvement method, a deep learning model increases the network depth by adding the most representative layer. As the network depth increases, the backpropagation process encounters a vanishing-gradient problem, preventing the model parameters from being updated. The residual LSTM connects the LSTM layers of the network in parallel through a shortcut connection to allow x l t used in the upper LSTM layer to be used in the lower LSTM layer regardless of the network depth. Therefore, residual LSTM can solve the accuracy-degradation problem of the model by solving the vanishing-gradient problem that occurs when adding the LSTM layer. A hyperparameter optimization algorithm was used to construct an optimized model architecture suitable for the experimental data. The residual LSTM adds a dropout layer that prevents the model from overfitting owing to the regularization effect [38]. Mathematics 2022, 10, x FOR PEER REVIEW 8 of 18 Compared with the LSTM layer model, residual LSTM has two advantages. First, residual LSTM can learn a residual through a residual function. In the residual LSTM shown in Figure 4, the output function of the residual block is derived as Equation (10) and transformed as Equation (11): where / denotes the output function of a residual block; 0 denotes the residual function; and denotes the input vector. As shown in Figure 4, 0 can be transformed into an equation that expresses the difference between / of the residual block and , which is the same as the residual, the difference between the forecasted and observed values. Residual LSTM learns by approximating 0 to zero to / 1 , for finding / that best expresses . Therefore, residual LSTM can directly learn the residual through the residual-learning process. Second, residual LSTM can solve the vanishing-gradient problem that occurs with an increasing network depth.
In the learning performance-improvement method, a deep learning model increases the network depth by adding the most representative layer. As the network depth increases, the backpropagation process encounters a vanishing-gradient problem, preventing the model parameters from being updated. The residual LSTM connects the LSTM layers of the network in parallel through a shortcut connection to allow used in the upper LSTM layer to be used in the lower LSTM layer regardless of the network depth. Therefore, residual LSTM can solve the accuracy-degradation problem of the model by solving the vanishing-gradient problem that occurs when adding the LSTM layer. A hyperparameter optimization algorithm was used to construct an optimized model architecture suitable for the experimental data. The residual LSTM adds a dropout layer that prevents the model from overfitting owing to the regularization effect [38].

Data Collection and Preprocessing
This study proposed a predictive model applicable to all non-residential buildings. In South Korea, since 2017, the installation of sensors has been made mandatory in newly built or expanded public buildings of 10,000 m 2 or more. However, according to statistics from the Korea Energy Agency, only 128 buildings had sensors installed in 2021, with the majority having none. Therefore, this study selected a non-residential building in South Korea without sensors and examined the predictive performance of the proposed model using the building's electricity-demand data.
To build the forecasting model proposed in this study, external data that can be collected without using a sensor other than a power meter were used as the input variables. After reviewing previous studies [9,11,14,16,17,21,23,39], we selected 17 input variables to predict the electricity demand of buildings based on the external data. Variables that affected the maximum power demand were also chosen in this study to accurately predict the maximum power demand. Different electricity rates and time zones affect peak demand because consumers try to avoid peak demand to reduce demand charges [17]. Therefore, the data for the peak-time zone of the electricity-rate system were selected as an input variable in this study. Three types of input variables were selected: (1) weather variables affecting the electricity consumption of home appliances consuming considerable power in buildings, such as air conditioners and heaters; (2) time variables affecting the repeating pattern of electricity-load consumption depending on time, date, and holidays; and (3) electricity-rate variables affecting electricity usage plans of consumers, based on electricity rates by the time of electricity use. Table 2 presents the input variables used for the electricity-demand forecast in this study. This study used building electricity-demand data retrieved from the power datasharing center of KEPCO [40] to train a predictive model after de-identification. The weather variables were obtained from the Korea Meteorological Administration (KMA) weather-data open portal [41], and the electricity-rate variables were based on the KEPCO's electricity-rate system [1]. All the data were collected hourly from 1 January 2017 to 31 December 2018. Finally, the collected data were confirmed to include more than 2000 observations, with no missing values. The collected data were normalized to an interval (0, 1) to prevent the forecast model from overlearning a specific input variable [42]. The normalization equation is as follows: (12) where x denotes the original data; x min denotes the minimum value of x; x max denotes the maximum value of x; and x denotes the data after normalization.

Benchmark Models
Four deep-learning models were chosen as benchmark models to validate the superiority of the proposed model. The first benchmark model was MLP, the simplest predictive model among neural networks. MLP is widely used in data mining because it can learn complex nonlinear relationships between data. Additionally, MLP was used in some related studies that predicted electricity demand [15,20].
Second, this study used LSTM as a benchmark model because it is a deep learning model designed to process sequential data. Therefore, in several studies [16,17,43], LSTM has been used as a benchmark model to verify the model proposed for time-series prediction, such as electricity demand. CNN LSTM was chosen as a benchmark model in this study because of its more complex architecture, combining CNN, LSTM, and RICNN. CNN LSTM can learn input features through CNN layers [16]. RICNN can use the hidden-state vector through the CNN layer [17]. These were chosen as benchmark models because they outperformed the LSTM model, which has been used in several studies in the electricity demand-prediction field. For brevity, detailed descriptions of the benchmark model can be found in previous studies [11,[15][16][17].

Hyperparameter Setting
In this study, the hyperparameters of the proposed and benchmark models were optimized using the electricity-demand data. This study divided the data into three datasets to optimize hyper-parameters, as shown in Figure 5. The data for the 12 months of 2017 were used as the training set, the data from July to September 2018 were used as the verification set, and the data from October to December 2018 were used as the test set. Figure 5 shows that the test set was located after the validation and training sets to prevent any value from being used in the training of the proposed model [44].
tion, such as electricity demand. CNN LSTM was chosen as a benchmark model in this study because of its more complex architecture, combining CNN, LSTM, and RICNN. CNN LSTM can learn input features through CNN layers [16]. RICNN can use the hiddenstate vector through the CNN layer [17]. These were chosen as benchmark models because they outperformed the LSTM model, which has been used in several studies in the electricity demand-prediction field. For brevity, detailed descriptions of the benchmark model can be found in previous studies [11,[15][16][17].

Hyperparameter Setting
In this study, the hyperparameters of the proposed and benchmark models were optimized using the electricity-demand data. This study divided the data into three datasets to optimize hyper-parameters, as shown in Figure 5. The data for the 12 months of 2017 were used as the training set, the data from July to September 2018 were used as the verification set, and the data from October to December 2018 were used as the test set. Figure  5 shows that the test set was located after the validation and training sets to prevent any value from being used in the training of the proposed model [44].  In this study, the hyperparameters of the proposed and benchmark models were optimized using the grid search. The grid search applied all hyperparameter combinations to each model. During the first step, each model was trained on the training sets and then evaluated with the validation sets for the root mean square error (RMSE). The hyperparameter with the lowest RMSE was selected. Table 3 summarizes the hyperparameter space (HP-space) of the proposed and benchmark models.  Additionally, this study made the following two adjustments to the proposed and benchmark models. Following previous studies, this study adopted the Adam optimizer [45] and mean square error (MSE) as model parameter optimization tools and the loss function as common settings [44].

Performance Measure
This study used three parameters to assess the performance of each prediction model: mean absolute error (MAE), expressed using Equation (13); MAPE, expressed using Equation (14); and RMSE, expressed using Equation (15): where y i andŷ i denote the actual and forecasted electricity consumption, respectively, at time t; and n denotes the number of observations.

Results and Discussion
In this study, we verify whether residual learning through residual LSTM can experimentally improve the forecast performance of the peak and overall electricity demands of buildings. The experimental results confirmed the forecast errors for peak and total electricity demands, and residual LSTM and benchmark models were compared. Accordingly, the results of the experiment are presented in this section. The results section is divided into two parts: the first section confirms the model's peak-demand forecast results; the second confirms the total electricity demand and hourly forecast results.

Peak-Demand Forecast Results
Experiments were conducted to derive forecast performance for peak demand by aggregating peak demand among electricity demand generated during one day in the test set. Table 4 presents the forecast errors for peak demand of residual LSTM and benchmark models, such as MLP, LSTM, CNN LSTM, and RICNN. These models were compared considering three error metrics: MAE, MAPE, and RMSE. Table 3 shows that residual LSTM had the best forecast performance, with the lowest error across all error metrics, demonstrating that residual learning improves peak-demand forecast performance. Meanwhile, all error metrics showed that CNN LSTM had a higher prediction error than LSTM, indicating that using a CNN method to learn relational features between input variables does not improve peak-demand forecast performance. Table 4. Performance of the prediction models for the next-day peak-electricity demand. The text in bold denotes the best performance for each performance measure.

Measure
The accuracy of the forecast model is critical in peak-demand forecasting to ensure that the forecasted value is not underestimated. Consumers are likely to plan additional electricity usage during peak times if the predictive model underestimates peak demand. Additional electricity consumption during peak hours may result in a surcharge if consumption exceeds contract demand [20], possibly resulting in an inflated base rate because consumers choose higher contract demand than necessary during the electricity rate contracting process. Accordingly, the errors of the underestimated cases were derived in this study to confirm the performance of the underestimate in the test data at the peak time. Table 5 shows the forecast model error for underestimated cases at peak time, with the residual LSTM having the best predictive performance. Comparing the peak-demandforecast results shows that residual LSTM reduced the error in all error metrics, whereas LSTM, CNN LSTM, and RICNN increased errors in some error metrics. These results indicate that the residual LSTM can predict the peak demand more accurately, particularly when peak demand is underestimated. Therefore, consumers can successfully reduce demand charges because residual LSTM prevents excessive consumption. The text in bold denotes the best performance for each performance measure.

Overall and Hourly Forecast Results
The experiments were conducted to verify the overall results of residual LSTM. Accordingly, the forecast performances of residual LSTM and four benchmark models were compared. The overall results of peak-demand forecast performance were compared using the same three error metrics used for the peak-demand forecast. Table 6 shows the experimental results for the overall electricity-demand prediction model. In terms of MAE and RMSE, residual LSTM outperformed the benchmark models, and in terms of MAPE, residual LSTM outperformed CNN LSTM. Based on the overall performance results, residual LSTM was considered a reliable method for forecasting electricity demand with low errors. The text in bold denotes the best performance for each performance measure. Although an accurate overall electricity-demand forecast is important for consumers in individual buildings to establish electricity plans, an accurate forecast of peak demand helps in the distribution of the peak times. Distributing the peak in each building can help prevent the problem of energy consumption exceeding supply capacity by improving the regional electricity-demand concentration patterns. Conversely, from the perspective of the consumer and the country, the accuracy of the off-peak forecast does not significantly impact developing the electricity-usage plan. Therefore, it is necessary to examine the predictive performance for the on-peak period by dividing the period according to the energy consumption.
South Korea classifies the period for electricity use based on the total energy consumption of the country to manage the supply of energy. Electricity demand is managed by dividing the period into off-peak, mid-peak, and on-peak [1]. The period for electricity used by KEPCO is presented in Table 7. In this study, only data from October, November, and December were used as the test set to examine the performance of the forecast model. The time zone of these three months focused on confirming the forecast performance for the on-peak periods of 10:00-12:00, 17:00-20:00, and 22:00-23:00.  Table 8 shows the average MAPE of the hourly forecast. Residual LSTM shows the best performance in five time steps out of six on-peak periods, indicating that residual LSTM can accurately predict power demand during on-peak periods. In mid-peak periods, residual LSTM and CNN LSTM had similar forecast performance; however, CNN LSTM had slightly better forecast performance in off-peak periods. Considering the differences in forecast performance for each period, the superiority of residual LSTM and CNN LSTM cannot be confirmed in terms of overall performance. Nevertheless, residual LSTM is a forecast model with a higher utility than other benchmark models for consumers and the country because it is capable of accurately predicting electricity demand during on-peak periods for electricity-demand management. The text in bold denotes the best performance for each time zone.

Statistical Tests
Using the Friedman test, we statistically compared the performance of the proposed model with those of the benchmark models. The Friedman test is a statistical method that evaluates the statistical differences between the performances of two or more forecasting algorithms [46,47]. The null (H0) and alternative (H1) hypotheses of the Friedman test are as follows:

•
Null hypothesis (H0): The forecasting models have the same performance; • Alternative hypothesis (H1): The performance of at least one model is statistically different from those of the other forecasting models.
Friedman tests with a significance level of α = 0.05 were performed for the error data of the five algorithms considered in the study. Tables 9 and 10 summarize the results of these tests for the overall and peak forecast performances. The results of both tests revealed the existence of significant differences between the proposed and benchmark models.

Conclusions
This study proposed using residual LSTM to accurately predict the peak demand of a building to improve the forecast performance for the total electricity demand. Residual LSTM consists of an architecture in which LSTMs and residual blocks were applied for learning time-series data and residual learning, respectively. This structure allows residual LSTM to map the hypothesis more easily for the electricity demand by minimizing the residual. The proposed model was compared with existing models based on the electricity-demand data from a non-residential building considering peak-demandforecast performance and overall forecast performance. Peak-demand forecasting shows that residual LSTM outperforms benchmark models, thus improving the overall electricity demand-forecast accuracy.
Based on the above results, the key findings and contributions of this study can be summarized as follows: • The historical electricity demand and weather data between January 2017 and December 2018 were obtained for the area where the building used in the experiments was located; • Three performance metrics, namely, MAPE, MAE, and RMSE, were used for assessing the performance of models when forecasting peak and next-day electricity demand; • The peak-demand forecast by the MLP, LSTM, CNN LSTM, RICNN, and residual LSTM models were 11.85, 10.75, 11.13, 12.17, and 10.5 kW, respectively. Similarly, the RMSEs of the next-day electricity demand predicted by the models were 9.46. 7.7, 6.95, 7.73, and 6.91 kW, respectively; • The performance evaluation of the models showed that the proposed residual LSTM was more accurate than MLP, LSTM, CNN LSTM, and RICNN in peak-demand forecasting; • Regarding next-day electricity demand forecast, the performance of the proposed model was better for on-peak time slots with high electricity demand; • This study demonstrates an improvement in performance when applying residual LSTM for forecasting the electricity demand of buildings; • The proposed model can help distribute concentrated electricity demand and operation of the national power system for buildings.
For future studies, we suggest constructing predictive models for various forecast resolutions, such as a week, month, and a year later, to manage peak demand at the regional level. Second, we recommend adding a feature-selection process to the residual LSTM, which should improve the forecast performance of the model by identifying important variables while forecasting electricity demand.

Conflicts of Interest:
The authors declare no conflict of interest.