Comparison of the Deep Learning Performance for Short-Term Power Load Forecasting

: Electricity demand forecasting enables the stable operation of electric power systems and reduces electric power consumption. Previous studies have predicted electricity demand through a correlation analysis between power consumption and weather data; however, this analysis does not consider the inﬂuence of various factors on power consumption, such as industrial activities, economic factors, power horizon, and resident living patterns of buildings. This study proposes an efﬁcient power demand prediction using deep learning techniques for two industrial buildings with different power consumption patterns. The problems are presented by analyzing the correlation between the power consumption and weather data by season for industrial buildings with different power consumption patterns. Four models were analyzed using the most important factors for predicting power consumption and weather data (temperature, humidity, sunlight, solar total cloud cover, wind speed, wind direction, humidity, and vapor pressure). The prediction horizon for power consumption forecasting was kept at 24 h. The existing deep learning methods (DNN, RNN, CNN, and LSTM) cannot accurately predict power consumption when it increases or decreases rapidly. Hence, a method to reduce this prediction error is proposed. DNN, RNN, and LSTM were superior when using two-year electricity consumption rather than one-year electricity consumption and weather data.


Introduction
The average global temperature is steadily rising due to global warming. In July 2018, the Korean government announced a policy to reduce 37% greenhouse gases by 2030 [1]. Therefore, the Korean government has proposed technologies and policies to reduce coal power generation and drastically increase renewable energy. In 2018, Korea experienced the 'worst heatwave ever recorded' based on 110 years of recorded meteorological observations. Figure 1 shows the results of the power consumption analysis based on a variety of applications in Korea in 2018 [2]. Figure 1a shows the power consumption that includes industrial (55.70%), general (22.20%), residential (13.90%), agricultural (3.50%), educational (1.60%), late-night (2.40%), and street-light applications (0.70%). Figure 1b shows the rate of increase in power consumption by application in 2018 compared to the previous year (2017). The results were analyzed as follows. 1.
The annual domestic power consumption exceeded 70,000 GWh for the first time in 2018. This is expected to increase steadily due to future heat waves.

3.
Industrial use increased by 2.5% compared to the previous year, but it accounted for 55.70%. Thus, the power consumption increased considerably to over 2.92 million In addition, the existing electricity demand forecasting studies have been steadily progressing according to the forecast horizon, data characteristics, and data combinations.
1. The forecast horizon of electricity demand is essential because of the maintenance schedule of the energy management system (EMS) [6], the generation of power system utilities, expansion of operations and plans [7], load switching, safety, market demand assessment, cost reduction, and guarantee of continuous power supply [8].
• Very short-term load forecasting (VSTLF) predicts power consumption and demand in real-time by performing predictions for 1 h, 30 min, and 15 min to stably operate power. • Short-term load forecasting (STLF) is the most used widely used technique that can perform predictions from 1 h to 1 week, and the forecasting stage is timebased. This prediction plays an essential role in operating the power system and plans the arrival and departure of units and optimal load distribution. • Medium-term load forecasting (MTLF) can perform predictions from 1 month to 1 year and includes a daily forecasting step. This prediction is essential for medium-term planning, including the economic operation and repair planning of a system that are directly related to the system's reliability.

•
Long-term load forecasting (LTLF) performs long-term predictions of over one year. It has a forecast range of over ten years, with a monthly forecasting step. This prediction is essential in network development, such as increasing the number of power plants, transmission lines, and distribution equipment.
2. Data characteristics can be divided into data-driven models (single model) and hybrids (combination model).

1.
The forecast horizon of electricity demand is essential because of the maintenance schedule of the energy management system (EMS) [6], the generation of power system utilities, expansion of operations and plans [7], load switching, safety, market demand assessment, cost reduction, and guarantee of continuous power supply [8]. Forecast horizons can be divided into four types [9][10][11].
• Very short-term load forecasting (VSTLF) predicts power consumption and demand in real-time by performing predictions for 1 h, 30 min, and 15 min to stably operate power. • Short-term load forecasting (STLF) is the most used widely used technique that can perform predictions from 1 h to 1 week, and the forecasting stage is timebased. This prediction plays an essential role in operating the power system and plans the arrival and departure of units and optimal load distribution. • Medium-term load forecasting (MTLF) can perform predictions from 1 month to 1 year and includes a daily forecasting step. This prediction is essential for medium-term planning, including the economic operation and repair planning of a system that are directly related to the system's reliability. • Long-term load forecasting (LTLF) performs long-term predictions of over one year. It has a forecast range of over ten years, with a monthly forecasting step. This prediction is essential in network development, such as increasing the number of power plants, transmission lines, and distribution equipment.

2.
Data characteristics can be divided into data-driven models (single model) and hybrids (combination model).
• Data-driven models predict the collected data by using probability and statistics (exponential smoothing [12], autoregressive integrated moving average [13]), classification models (k-nearest neighbors [14], decision trees [15], support vector machines [16], particle swarm optimization [17]), and artificial intelligence (artificial neural networks [18], deep neural networks [19], recurrent neural network [20], long short-term memory [21], and convolution neural networks [22]). This single model has no particular computational complexity, Sustainability 2021, 13, 12493 3 of 25 but its results are unreliable and inaccurate. Hence, the combined model is a good alternative to improve the load prediction accuracy and stability by combining the advantages and disadvantages of a single model [23].

3.
A model can be categorized as univariate or multivariate according to the data composition (combination). A univariate model performs predictions by only using power consumption. A multivariable model performs predictions by combining the power consumption and weather data (temperature, humidity, sunlight, solar radiation, total cloudiness, wind speed, wind direction, humidity, and vapor pressure). In addition, many recent multivariate models have been used for short-term wind speed prediction. A model applying solar production, temperature, insolation, humidity, and wind speed [40] has been used.
The critical factors for forecasting the building power consumption vary according to the weather and indoor conditions, the size and use of the buildings, and the prediction horizons [41][42][43][44]. Among these factors, (1) meteorological conditions are the main predictive factors that can change the indoor conditions and the activities that govern the building power consumption. For example, ambient temperature, cloud cover, humidity, and insolation affect the load pattern due to the comfort of the occupants and adjustment of the lighting levels [45,46]. (2) Commercial buildings that are more than ten times larger than typical residential buildings will most likely consume more electricity than weather variation factors. For example, the load patterns of the buildings vary according to usage and time. This includes single-family houses, multifamily houses, cultural and assembly facilities, religious facilities, education, and research facilities, factories, and warehouse facilities. The varying complexity of load patterns makes it difficult to forecast the building power consumption accurately. (3) Mid/long-term and short-term power consumption forecasting can efficiently operate peer-to-peer (P2P) power transactions and EMS that are suitable for power supply and demand. In particular, changes in consumers' power consumption patterns are uncertain factors for future power demand. Accurate demand forecasting begins by responding to fluctuations in electricity demand, considering weather/environmental changes, and reflecting recent demand patterns. The response to real-time prediction is brought about by frequency control and economic dispatch. Shortterm forecasts account for power generation plans, and mid-term forecasts account for the maintenance of power facilities. Meanwhile, long-term forecasts account for construction plans of generators and transmission networks.
In this study, the seasonal factors and the building power consumption pattern are reflected and proposed to forecast the power consumption accurately.

1.
Four models (TM1-TM4) are proposed, and an analysis of the correlation between the seasonal factors, weather data, and power consumption is conducted. These are univariate and multivariate models that have collected power consumption and weather data (temperature, humidity, sunlight, solar radiation, the total cloud cover, wind speed, wind direction, humidity, and vapor pressure) for 1-2 years. 2.
The prediction horizon for power consumption forecasting was 24 h, which ensured that power suppliers operated peer-to-peer power transactions and energy management systems efficiently. Two industrial buildings in the agricultural complex in Naju, Jeollanam-do, were selected to test and verify the proposed method. Building B is an industrial building with constant power consumption, which houses a manufacturing company. Building T is an industrial building with nonuniform power consumption, which houses a livestock processing company. In addition, the experimental data adopted in this study were collected for three years (from 2017 to 2019) by applying industrial electricity real-time rates. The usage was provided by the Korea Electric Power Corporation's (KEPCO) i-smart [47].
The structure of this study is as follows. Section 2 analyzes the power consumption for Companies B and T's buildings, which have different power consumption patterns. Section 3 briefly explains the experiments using deep learning methods. Section 4 describes the problems of deep learning, and it explains the proposed method. Section 5 describes the experiment and the analysis of the proposed method. Finally, Section 6 presents the conclusions of this study and the future directions of research.

Analysis of the Power Consumption of Companies B and T
Section 2.1 explains the meteorological factors affecting power consumption, which the Korea Meteorological Administration recently announced. Section 2.2 analyzes the power consumption of Companies B and T located in Naju, Jeollanam-do, by season. Finally, Section 2.3 analyzes the correlation between the meteorological data and the power consumption by season.

Existing Research on the Building Electricity Consumption
In previous studies [48][49][50], the meteorological factors affecting the building power consumption include the temperature (average, maximum, and minimum), humidity, wind speed, cloudiness, discomfort index, perceived temperature, and precipitation [48]. In Korea, summer (July-August) and winter (December-February) highly correlate with the minimum temperature, cloudiness, a sensible temperature, and power consumption. During the changing seasons (May and October), the correlation between the meteorological factors and the power consumption is relatively low [49]. When forecasting the power consumption using only the temperature, the prediction error was 1.8%. In contrast, when weather factors such as humidity, cloudiness, sensible temperature, wind speed, and precipitation are added, the prediction error is reduced to 1.3%. When forecasting the power consumption, the prediction error can be improved by 25% depending on the combination of meteorological factors; the annual power generation amount can be reduced by approximately 1100 GWh, saving about 120 billion KRW [50]. However, this analysis has the disadvantage in that it may differ slightly from the actual electricity demand forecast since it does not consider variables such as lifestyle, industrial activities, and economic factors.

Analysis of the Power Consumption by Season
To analyze the power consumption patterns of Companies B and T, the months of April, July, October, and January were selected, which represent Korea's spring, summer, fall, and winter, respectively. Figure 2 shows a comparison of the power consumption of each month (per 24 h) for Companies B and T during spring (April), summer (July), fall (October), and winter (January). Company B's building uses electricity from 9 a.m. to 12 p.m. and from 1 to 6 p.m. on weekdays. It does not use electricity during hours other than holidays and working hours. By comparing the power consumption by season, the order is winter > fall > summer > spring. In the winter, the power consumption is high. In contrast, the power consumption is low in the summer (July) since this is the vacation season. is winter > fall > summer > spring. In the winter, the power consumption is high. In contrast, the power consumption is low in the summer (July) since this is the vacation season.
The power consumption of Company T has an irregular pattern. This is because Company T is an agroprocessed food company that consumes electricity according to economic factors (consumer demand) without affecting the season, time, or public holidays.  Table 1 shows the average (average), standard deviation (Std.), and the coefficient of variation (CV) for the seasonal power consumption of Companies B and T. The CV can compare the relative scope; however, it is not enough to calculate the range or variance. Therefore, to compare the power consumption of the different units of measurement, the CV is adopted (Equation (1)):

CV =
(1) The larger the value of the CV, the larger the relative difference. In addition, and represent the standard deviation and mean, respectively. Company B has a relatively small power consumption than Company T; however, the mean, standard deviation, and CV are large. This is because there is a relatively significant difference between the working hours (9 AM to 6 PM) in which electricity is used The power consumption of Company T has an irregular pattern. This is because Company T is an agroprocessed food company that consumes electricity according to economic factors (consumer demand) without affecting the season, time, or public holidays. Table 1 shows the average (average), standard deviation (Std.), and the coefficient of variation (CV) for the seasonal power consumption of Companies B and T. The CV can compare the relative scope; however, it is not enough to calculate the range or variance. Therefore, to compare the power consumption of the different units of measurement, the CV is adopted (Equation (1)): The larger the value of the CV, the larger the relative difference. In addition, σ and m represent the standard deviation and mean, respectively. Company B has a relatively small power consumption than Company T; however, the mean, standard deviation, and CV are large. This is because there is a relatively significant difference between the working hours (9 AM to 6 PM) in which electricity is used and the nonworking hours wherein electricity is not in use. Company T consumes more electricity than Company B. Still, it consumes electricity regardless of the working hours, holidays, and weather factors. Therefore, the standard deviation and CV values are small.  Figure 3 shows the analysis of the correlation (R 2 ) and trend line (Y) of the temperature and humidity by season for the Naju region where Companies B and T are located. The temperature and humidity in the spring, fall, and winter did not seem to have any correlation. The correlation (R 2 ) was determined to be 0.76 for the summer temperature and humidity.

Correlation between the Seasonal Weather Data and Power Consumption
Sustainability 2021, 13, x FOR PEER REVIEW 6 of 24 and the nonworking hours wherein electricity is not in use. Company T consumes more electricity than Company B. Still, it consumes electricity regardless of the working hours, holidays, and weather factors. Therefore, the standard deviation and CV values are small. Figure 3 shows the analysis of the correlation (R 2 ) and trend line (Y) of the temperature and humidity by season for the Naju region where Companies B and T are located. The temperature and humidity in the spring, fall, and winter did not seem to have any correlation. The correlation (R 2 ) was determined to be 0.76 for the summer temperature and humidity.   and the nonworking hours wherein electricity is not in use. Company T consumes more electricity than Company B. Still, it consumes electricity regardless of the working hours, holidays, and weather factors. Therefore, the standard deviation and CV values are small. Figure 3 shows the analysis of the correlation (R 2 ) and trend line (Y) of the temperature and humidity by season for the Naju region where Companies B and T are located. The temperature and humidity in the spring, fall, and winter did not seem to have any correlation. The correlation (R 2 ) was determined to be 0.76 for the summer temperature and humidity.

Concluding Remarks
Section 2 analyzed the relationships between temperature and humidity, seasonal meteorological factors, and temperature and power consumption. Although weather factors affect power consumption, it was determined that building habits, industrial activities, and economic factors have the greatest influence on power consumption.

Deep Learning for Power Consumption
Section 3.1 briefly describes the DNN, LSTM, RNN, and CNN. Section 3.2 presents the comparison and analysis of forecasting the power consumption using deep learning for Companies B and T.

Deep Neural Network
DNN is an artificial neural network composed of several hidden layers between the input and output, as shown in Figure 6 [51,52]. DNNs can model complex nonlinear relationships, such as general artificial neural networks. For example, in a deep neural network structure of an object identification model, each object can be expressed as a hierarchical configuration of the essential elements of an image [53]. Here, the additional layers may aggregate the characteristics of the gradually gathered lower layers. This feature of the deep neural networks enables complex data modeling with fewer units (units, nodes) when compared to similarly performing artificial neural networks [51].

Concluding Remarks
Section 2 analyzed the relationships between temperature and humidity, seasonal meteorological factors, and temperature and power consumption. Although weather factors affect power consumption, it was determined that building habits, industrial activities, and economic factors have the greatest influence on power consumption.

Deep Learning for Power Consumption
Section 3.1 briefly describes the DNN, LSTM, RNN, and CNN. Section 3.2 presents the comparison and analysis of forecasting the power consumption using deep learning for Companies B and T.

Deep Neural Network
DNN is an artificial neural network composed of several hidden layers between the input and output, as shown in Figure 6 [51,52]. DNNs can model complex nonlinear relationships, such as general artificial neural networks. For example, in a deep neural network structure of an object identification model, each object can be expressed as a hierarchical configuration of the essential elements of an image [53]. Here, the additional layers may aggregate the characteristics of the gradually gathered lower layers. This feature of the deep neural networks enables complex data modeling with fewer units (units, nodes) when compared to similarly performing artificial neural networks [51].

Recurrent Neural Network
RNN refers to a neural network. The connections between the units that constitute an artificial neural network comprise a directed cycle, as shown in Figure 7 [54]. Unlike forward neural networks [20], an RNN can use the memory inside the neural network to process arbitrary inputs. Due to these characteristics, RNNs are used in handwriting recognition fields and show a high recognition rate [55]. A variety of methods are used for structures that can construct RNNs. Hopfield networks [56], Elman networks [57], echo state networks (ESNs) [58], LSTMs [59], bidirectional RNNs [60], continuous-time RNNs (CTRNNs) [61], hierarchical RNNs (HRNNs) [62], and second-order RNNs (SORNNs) [63] are the typical examples. A gradient descent approach, Hessian-free optimization, and global optimization are typically used to train an RNN. However, the RNN has a scaling issue and is difficult to train when there are many neurons or input units [64,65].

Deep Neural Network
DNN is an artificial neural network composed of several hidden layers between the input and output, as shown in Figure 6 [51,52]. DNNs can model complex nonlinear relationships, such as general artificial neural networks. For example, in a deep neural network structure of an object identification model, each object can be expressed as a hierarchical configuration of the essential elements of an image [53]. Here, the additional layers may aggregate the characteristics of the gradually gathered lower layers. This feature of the deep neural networks enables complex data modeling with fewer units (units, nodes) when compared to similarly performing artificial neural networks [51].

Recurrent Neural Network
RNN refers to a neural network. The connections between the units that constitute an artificial neural network comprise a directed cycle, as shown in Figure 7 [54]. Unlike forward neural networks [20], an RNN can use the memory inside the neural network to process arbitrary inputs. Due to these characteristics, RNNs are used in handwriting recognition fields and show a high recognition rate [55]. A variety of methods are used for structures that can construct RNNs. Hopfield networks [56], Elman networks [57], echo state networks (ESNs) [58], LSTMs [59], bidirectional RNNs [60], continuous-time RNNs (CTRNNs) [61], hierarchical RNNs (HRNNs) [62], and second-order RNNs (SORNNs) [63] are the typical examples. A gradient descent approach, Hessian-free optimization, and global optimization are typically used to train an RNN. However, the RNN has a scaling issue and is difficult to train when there are many neurons or input units [64,65].

Concurrent Neural Network
CNN is a multilayer perceptron designed to use minimal preprocessing, as shown in Figure 8 [66]. The CNN comprises one or several convolutional layers. There are general artificial neural network layers on top; it uses weights and pooling layers. Because of this structure, CNN can use the input data of the two-dimensional structure. CNNs perform well in video and audio fields [67,68]. CNNs can also be trained by using standard backpassing methods. CNNs are more accessible to training than the other feed-forward neural network techniques. They have the advantage of using fewer parameters. Recently, a convolutional deep belief network (CDBN) has been developed in deep learning, which is structurally similar to a CNN. It can use the two-dimensional structure of the figure well [69]. Deep belief networks (DBNs) can also take advantage of line training [70]. CDBN provides a general structure that can be used for a variety of image and signal processing techniques. It is used in several benchmark results for standard image data, such as those that belong to the Canadian Institute for Advanced Research (CIFAR) [71].

Concurrent Neural Network
CNN is a multilayer perceptron designed to use minimal preprocessing, as shown in Figure 8 [66]. The CNN comprises one or several convolutional layers. There are general artificial neural network layers on top; it uses weights and pooling layers. Because of this structure, CNN can use the input data of the two-dimensional structure. CNNs perform well in video and audio fields [67,68]. CNNs can also be trained by using standard backpassing methods. CNNs are more accessible to training than the other feed-forward neural network techniques. They have the advantage of using fewer parameters. Recently, a convolutional deep belief network (CDBN) has been developed in deep learning, which is structurally similar to a CNN. It can use the two-dimensional structure of the figure well [69]. Deep belief networks (DBNs) can also take advantage of line training [70]. CDBN provides a general structure that can be used for a variety of image and signal processing techniques. It is used in several benchmark results for standard image data, such as those that belong to the Canadian Institute for Advanced Research (CIFAR) [71].

Recurrent Neural Network
RNN refers to a neural network. The connections between the units that constitute an artificial neural network comprise a directed cycle, as shown in Figure 7 [54]. Unlike forward neural networks [20], an RNN can use the memory inside the neural network to process arbitrary inputs. Due to these characteristics, RNNs are used in handwriting recognition fields and show a high recognition rate [55]. A variety of methods are used for structures that can construct RNNs. Hopfield networks [56], Elman networks [57], echo state networks (ESNs) [58], LSTMs [59], bidirectional RNNs [60], continuous-time RNNs (CTRNNs) [61], hierarchical RNNs (HRNNs) [62], and second-order RNNs (SORNNs) [63] are the typical examples. A gradient descent approach, Hessian-free optimization, and global optimization are typically used to train an RNN. However, the RNN has a scaling issue and is difficult to train when there are many neurons or input units [64,65].

Concurrent Neural Network
CNN is a multilayer perceptron designed to use minimal preprocessing, as shown in Figure 8 [66]. The CNN comprises one or several convolutional layers. There are general artificial neural network layers on top; it uses weights and pooling layers. Because of this structure, CNN can use the input data of the two-dimensional structure. CNNs perform well in video and audio fields [67,68]. CNNs can also be trained by using standard backpassing methods. CNNs are more accessible to training than the other feed-forward neural network techniques. They have the advantage of using fewer parameters. Recently, a convolutional deep belief network (CDBN) has been developed in deep learning, which is structurally similar to a CNN. It can use the two-dimensional structure of the figure well [69]. Deep belief networks (DBNs) can also take advantage of line training [70]. CDBN provides a general structure that can be used for a variety of image and signal processing techniques. It is used in several benchmark results for standard image data, such as those that belong to the Canadian Institute for Advanced Research (CIFAR) [71].

Long Short-Term Memory
LSTM is a method used to solve the problem of gradient extinction, which is a disadvantage of the RNN [64,65]. Compared to the basic RNN model, a study has reported that the accuracy is approximately 10% higher when the LSTM model is implemented [72,73]. The interior of the LSTM block comprises a memory cell with a circular structure and three types of gates (input, forget, and output), as shown in Figure 9. Modifications of the LSTM model include a single-layer LSTM [21], multilayer LSTM [74], gated recurrent unit (GRU) [75], bidirectional LSTM [76], encoder-decoder (ED)-LSTM [77], and con-LSTM [78]. Multiple LSTMs have an accuracy higher than single-layer LSTMs.

Long Short-Term Memory
LSTM is a method used to solve the problem of gradient extinction, which is a disadvantage of the RNN [64,65]. Compared to the basic RNN model, a study has reported that the accuracy is approximately 10% higher when the LSTM model is implemented [72,73]. The interior of the LSTM block comprises a memory cell with a circular structure and three types of gates (input, forget, and output), as shown in Figure 9. Modifications of the LSTM model include a single-layer LSTM [21], multilayer LSTM [74], gated recurrent unit (GRU) [75], bidirectional LSTM [76], encoder-decoder (ED)-LSTM [77], and con-LSTM [78]. Multiple LSTMs have an accuracy higher than single-layer LSTMs.

Persistence Model
The persistence model (PM) is the simplest way of producing a forecast [79,80]. A persistence model assumes that the future value of a time series is calculated under the assumption that nothing changes between the current time and the forecast time (Equation (2)).

=
(2) In Equation (2), is the measured power consumption at the time (t − 1) and is the predicted value at time t. Its accuracy decreases if the time horizon is longer than 1 h.

Comparison of the Deep Learning Performance for Companies B and T
In this section, we used the power consumption in the spring, summer, autumn, and winter in 2018 to compare the deep learning performance of Companies B and T. The experimental data were divided into learning data (70%) and test data (30%) and then simulated. Table 2 lists the training and testing options for simulating the PM, DNN, RNN, CNN, and LSTM. In the simulation, the learning rate, loss function (MSE), optimizer (ADAM) [81], and activity function (RELU) [82] were all the same. The remaining options were set for each deep learning method.

Persistence Model
The persistence model (PM) is the simplest way of producing a forecast [79,80]. A persistence model assumes that the future value of a time series is calculated under the assumption that nothing changes between the current time and the forecast time (Equation (2)).P In Equation (2), O t−1 is the measured power consumption at the time (t − 1) andP t is the predicted value at time t. Its accuracy decreases if the time horizon is longer than 1 h.

Comparison of the Deep Learning Performance for Companies B and T
In this section, we used the power consumption in the spring, summer, autumn, and winter in 2018 to compare the deep learning performance of Companies B and T. The experimental data were divided into learning data (70%) and test data (30%) and then simulated. Table 2 lists the training and testing options for simulating the PM, DNN, RNN, CNN, and LSTM. In the simulation, the learning rate, loss function (MSE), optimizer (ADAM) [81], and activity function (RELU) [82] were all the same. The remaining options were set for each deep learning method.  Table 3 compares the persistence model and the deep learning performance of Companies B and T by season. Figure 10 shows comparison of the persistence model and the deep learning performance of Companies B and T in April 2018. In addition, the average (Ave.), standard deviation (Std.), CV, root-mean-square error (RMSE) [83], and mean absolute percentage error (MAPE) [84] were used for the prediction errors. The values with excellent performance for each deep learning method are shown in bold. The RMSE and MAPE are shown in Equations (3) and (4), respectively.
In Equations (3) and (4), n is the total number that is to be predicted, O i is the actual value, and P i is the predicted value. For RMSE and MAPE, the smaller the number, the better the performance.
In Equations (3) and (4), n is the total number that is to be predicted, is the actual value, and is the predicted value. For RMSE and MAPE, the smaller the number, the better the performance.     (13.25). Thus, Company B with large CV fluctuations has different priorities depending on the prediction error method, but Company T with small CV fluctuations has an excellent LSTM regardless of the prediction error method.

Proposed Method
Section 4.1 analyzes the power consumption patterns of Companies B and T, and it describes the problems. Section 4.2 proposes the prediction error correction to solve the problems when simulating traditional deep learning for Companies B and T.

Problem of Existing Deep Learning
Section 2 analyzes the relationship between the weather data and power consumption by season. This analysis reveals that the weather data, lifestyle, industrial activity, and economic factors affect power consumption. For example, Figure 11 compares the electricity usage pattern analysis for the daily averages by season for Companies B and T. Figure 11a shows that Company B has a constant power consumption pattern. Still, it fluctuates rapidly, depending on the period of power usage and lifestyle. Thus, the power consumption increases during the intensive working hours from 8 AM to 6 PM. Still, the power consumption decreases sharply during the rest of the hours (7 PM to 8 AM) and lunchtime (12 PM to 1 PM). Unlike Company B, as shown in Figure 11b, Company T does not have a uniform power consumption pattern. Still, the power consumption does not significantly increase or decrease.

Proposed Forecasting Error Correction
The application of DNN, RNN, CNN, and LSTM in different seasons revealed that the performance of the LSTM was excellent for deep learning. However, as shown in Figure 11a, when the power consumption increases and decreases rapidly, LSTM cannot accurately predict the power consumption; therefore, this study proposes a method to correct the prediction error when the power consumption rapidly increases or decreases.

Proposed Forecasting Error Correction
The application of DNN, RNN, CNN, and LSTM in different seasons revealed that the performance of the LSTM was excellent for deep learning. However, as shown in Figure 11a, when the power consumption increases and decreases rapidly, LSTM cannot accurately predict the power consumption; therefore, this study proposes a method to correct the prediction error when the power consumption rapidly increases or decreases. Figure 12 shows the flowchart proposed in this study.
First, as a data collection and preprocessing process, the electricity and weather data of Companies B and T in Naju were collected over 1 h intervals and preprocessed for simulation. Second, after splitting the electricity and weather data into four test models, each was divided into training data (70%) and test data (30%). Third, the training data of four test models were simulated using deep learning (DNN, RNN, CNN, and LSTM). The proposed prediction error correction is applied to the training data of four test models and simulated. Fourth, the learning deep learning module was applied to the test data using four test models. Finally, the performances of the traditional and proposed methods are compared with the prediction error measurement (RMSE, MAPE).

Proposed Forecasting Error Correction
The application of DNN, RNN, CNN, and LSTM in different seasons revealed that the performance of the LSTM was excellent for deep learning. However, as shown in Figure 11a, when the power consumption increases and decreases rapidly, LSTM cannot accurately predict the power consumption; therefore, this study proposes a method to correct the prediction error when the power consumption rapidly increases or decreases. Figure 12 shows the flowchart proposed in this study. First, as a data collection and preprocessing process, the electricity and weather data of Companies B and T in Naju were collected over 1 h intervals and preprocessed for simulation. Second, after splitting the electricity and weather data into four test models, each was divided into training data (70%) and test data (30%). Third, the training data of

Data Collection and Preprocessing
The electricity and weather data were collected from 2017 to 2018 for Companies B and T in Naju, Jeollanam-do. The electricity data were collected from Companies B and T in one-hour intervals using i-Smart, which provides a real-time power consumption service from KEPCO. The weather data were collected in one-hour intervals from the Naju Regional Meteorological Administration. The collected electricity and weather data were preprocessed monthly (January, April, August, and October).

Proposed Four Models
This section proposes four models (TM1-TM4), as shown in Table 5, which combine the power consumption and weather data that affect the power consumption prediction. For TM1, one variable was univariate; for TM2 to TM4, multivariate variables with two or more variables were adopted. TM1 was the most frequently used power consumption variable for estimating the existing power consumption. The power consumption data for Companies B and T were collected by season in one-hour units in 2018. TM2 predicts the power consumption by collecting data over two years (from 2017 to 2018). TM3 comprises the 2018 power consumption and the weather data (e.g., temperature, humidity, sunlight, and cloud cover). Finally, TM4 includes the two-year power consumption and the weather data (e.g., temperature, humidity, sunshine, cloud cover, precipitation, wind direction, wind speed, and vapor pressure).

TM3
Power consumption in 2018, temperature ( • C), humidity (%), solar radiation (MJ/m 2 ), cloud cover TM4 Power consumption for two years (2017, 2018), temperature ( • C), humidity (%), solar radiation (MJ/m 2 ), cloud cover, precipitation, wind speed, wind direction, vapor pressure Figure 13 shows that the four test models split the training and test data ratio to 70:30. Four test models separated by the training data were simulated for each deep learning method using Table 2

Proposed Error Correction
Section 4.1 states that if the power consumption increases or decreases rapidly, it cannot be accurately predicted. Therefore, we propose a prediction error correction method shown in Equation (5) by considering the difference between the previous and current power consumptions.
In Equation (5), and represent the actual power consumption for i and i − 1 h, respectively.
is the predicted power consumption, and is the prediction error correction proposed in this study. Finally, the learned deep learning modules simulated four test data points for each deep learning method.

Prediction Error Measurement
This study adopted RMSE and MAPE to measure the prediction error of the existing and proposed methods using Equations (3) and (4) in Section 3.2. RMSE is the most commonly used prediction error measurement method [83]. MAPE is the most common measurement used to forecast error and works best if there are no extremes in the data (and no

Proposed Error Correction
Section 4.1 states that if the power consumption increases or decreases rapidly, it cannot be accurately predicted. Therefore, we propose a prediction error correction method shown in Equation (5) by considering the difference between the previous and current power consumptions.P In Equation (5), O i and O i−1 represent the actual power consumption for i and i − 1 h, respectively. P i is the predicted power consumption, andP i is the prediction error correction proposed in this study. Finally, the learned deep learning modules simulated four test data points for each deep learning method.

Prediction Error Measurement
This study adopted RMSE and MAPE to measure the prediction error of the existing and proposed methods using Equations (3) and (4) in Section 3.2. RMSE is the most commonly used prediction error measurement method [83]. MAPE is the most common measurement used to forecast error and works best if there are no extremes in the data (and no zeros) [84].

Test Environment
To verify the traditional and proposed methods, the experiments were performed on a workstation computer equipped with an Intel Xeon (R) W-2133, a 3.60 GHz CPU, and 32 GB of RAM (Dell Precision 5820 Tower Workstation). The operating system was Windows 10 Pro for the workstations (64 bit). The traditional and proposed methods were implemented using deep learning libraries provided by Tensor flow [85] and Keras [86].

Company B with a Constant Power Consumption Pattern
Section 2.2 states that Company B has a constant power consumption pattern. Table 6 shows the results of applying traditional deep learning and the proposed method to four TMs. Although there is a slight difference in the deep learning performance based on the month, CNN has the worst performance among the four TMs. However, the DNN and RNN exhibited an excellent TM performance. LSTM showed that the TM1, TM2, and TM3 performances are similar to those of DNN and RNN; however, TM4 did not perform well. Therefore, the power consumption prediction will perform well if the test model adopts TM1 or TM2 and simulates DNN, RNN, and LSTM. Figure 14 shows a comparison of MAPE (%) for the traditional and the proposed methods for Company B in April 2018. Figure 14a is a performance comparison of four TMs using deep learning. The error between the traditional and the proposed methods appeared in TM4, TM3, TM2, and TM1. For DNN, RNN, and LSTM, the more variables were adopted, the greater the error. Figure 14b is a comparison of deep learning performance by TM. The error between the traditional and the proposed method appeared in the order CNN, LSTM, RNN, and DNN. In particular, CNN performed poorly in all TMs. For DNN, RNN, and LSTM, the smaller the variable, the smaller the error. Figure 15 shows the results of comparing the performance of MAPE (%) by TMs and deep learning for the proposed method for Company B in January 2018. Figure 15a shows the deep learning performance of TMs, which exhibited the following order: DNN, RNN, LSTM, and CNN. Figure 15b shows the results of four test models for each deep learning method; the performance was excellent in the order of TM1, TM2, TM3, and TM4. Figure 16 shows the performance analysis of the proposed DNN, RNN, CNN, and LSTM for the four TMs in January 2018. Figure 16a shows the proposed DNN performance analysis for the four TMs. The predicted values (DNN-TM1 and DNN-TM2), in which TM1 and TM2 were applied, have predictions similar to the actual value (power). The predicted values (DNN-TM3 and DNN-TM4) when applying TM3 and TM4 from 97 to 121 times showed that the predicted values were not similar to the actual values. Figure 16b shows the proposed RNN performance analysis for the four TMs. Like the proposed DNN results, the predicted values of the proposed RNNs (RNN-TM1 and RNN-TM2), in which TM1 and TM2 were applied, have been predicted to be similar to the actual values. However, the predicted value of TM3 (RNN-TM3) exhibits a higher error (RMSE: 25.70) from the actual value than that shown by the predicted value of TM4 (RNN-TM4); thus, the performance was poor. Figure 16c shows the performance analysis of the proposed CNN for the four TMs. Similar to the proposed DNN results, the predicted values (CNN-TM1, CNN-TM2) of the CNNs, in which TM1 and TM2 were applied, have been predicted to be similar to the actual values. However, the predicted value of TM4 (CNN-TM4) was too high (RMSE: 19.44) from the actual value; thus, the performance was not good compared to that of TM3 (CNN-TM3). Figure 16d shows the proposed LSTM performance analysis for the four TMs. Similar to the proposed DNN result, the predicted value (LSTM-TM1) of the LSTM, in which TM1 was applied, has been predicted to be like the actual value. However, the TM3 predicted value (LSTM-TM3) exhibited a high error (RMSE: 25.92) because the difference from the actual value was too large. Figure 17 illustrates    Figure 15 shows the results of comparing the performance of MAPE (%) by TMs and deep learning for the proposed method for Company B in January 2018. Figure 15a shows the deep learning performance of TMs, which exhibited the following order: DNN, RNN, LSTM, and CNN. Figure 15b shows the results of four test models for each deep learning method; the performance was excellent in the order of TM1, TM2, TM3, and TM4.  Figure 16 shows the performance analysis of the proposed DNN, RNN, CNN, and LSTM for the four TMs in January 2018. Figure 16a shows the proposed DNN performance analysis for the four TMs. The predicted values (DNN-TM1 and DNN-TM2), in which TM1 and TM2 were applied, have predictions similar to the actual value (power). The predicted values (DNN-TM3 and DNN-TM4) when applying TM3 and TM4 from 97 to 121 times showed that the predicted values were not similar to the actual values. Figure  16b shows the proposed RNN performance analysis for the four TMs. Like the proposed DNN results, the predicted values of the proposed RNNs (RNN-TM1 and RNN-TM2), in which TM1 and TM2 were applied, have been predicted to be similar to the actual values. However, the predicted value of TM3 (RNN-TM3) exhibits a higher error (RMSE: 25.70) from the actual value than that shown by the predicted value of TM4 (RNN-TM4); thus, the performance was poor. Figure 16c shows the performance analysis of the proposed CNN for the four TMs. Similar to the proposed DNN results, the predicted values (CNN-TM1, CNN-TM2) of the CNNs, in which TM1 and TM2 were applied, have been predicted to be similar to the actual values. However, the predicted value of TM4 (CNN-TM4) was too high (RMSE: 19.44) from the actual value; thus, the performance was not good compared to that of TM3 (CNN-TM3). Figure 16d shows the proposed LSTM performance analysis for the four TMs. Similar to the proposed DNN result, the predicted value (LSTM-TM1) of the LSTM, in which TM1 was applied, has been predicted to be like the actual  Figure 15 shows the results of comparing the performance of MAPE (%) by TMs and deep learning for the proposed method for Company B in January 2018. Figure 15a shows the deep learning performance of TMs, which exhibited the following order: DNN, RNN, LSTM, and CNN. Figure 15b shows the results of four test models for each deep learning method; the performance was excellent in the order of TM1, TM2, TM3, and TM4.  Figure 16 shows the performance analysis of the proposed DNN, RNN, CNN, and LSTM for the four TMs in January 2018. Figure 16a shows the proposed DNN performance analysis for the four TMs. The predicted values (DNN-TM1 and DNN-TM2), in which TM1 and TM2 were applied, have predictions similar to the actual value (power). The predicted values (DNN-TM3 and DNN-TM4) when applying TM3 and TM4 from 97 to 121 times showed that the predicted values were not similar to the actual values. Figure  16b shows the proposed RNN performance analysis for the four TMs. Like the proposed DNN results, the predicted values of the proposed RNNs (RNN-TM1 and RNN-TM2), in which TM1 and TM2 were applied, have been predicted to be similar to the actual values. However, the predicted value of TM3 (RNN-TM3) exhibits a higher error (RMSE: 25.70) from the actual value than that shown by the predicted value of TM4 (RNN-TM4); thus, the performance was poor. Figure 16c shows the performance analysis of the proposed CNN for the four TMs. Similar to the proposed DNN results, the predicted values (CNN-TM1, CNN-TM2) of the CNNs, in which TM1 and TM2 were applied, have been predicted to be similar to the actual values. However, the predicted value of TM4 (CNN-TM4) was too high (RMSE: 19.44) from the actual value; thus, the performance was not good compared to that of TM3 (CNN-TM3). Figure 16d shows the proposed LSTM performance analysis for the four TMs. Similar to the proposed DNN result, the predicted value (LSTM-TM1) of the LSTM, in which TM1 was applied, has been predicted to be like the actual  Table 7 shows the profound learning performance results for Company T with irregular power consumption patterns. By comparing the performance of the four test models, the prediction error for each test model was not significant for Company T, unlike Company B. In other words, Company T has a slight difference in the deep learning performance based on the month, but similar results are observed for Company B, as follows: (1) The DNN and RNN are the four TMs with excellent performance. (2) The performances of the LSTM and CNN are similar to DNN and RNN in TM1-TM3, but TM4 did not perform well. Therefore, although Company T has an irregular power consumption pattern, it has an excellent performance when simulating DNN, RNN, and LSTM by adopting TM1 or TM2 for predicting the power consumption. Figure 18 shows a comparison of MAPE (%) for the traditional and the proposed methods for Company T in April 2018. Figure 18a is a performance comparison of four TMs using deep learning. The error between the traditional and the proposed methods was almost similar to that observed for the TM performance, unlike Company B. Similar to Company B, Company T's CNN performed poorly in all TMs. Figure 18b Figure 19 shows the results of comparing the MAPE (%) performance by using TMs and deep learning for the method proposed for Company T in January 2018. Figure 19a shows the deep learning performance of TMs, which exhibited the following order: DNN, RNN, LSTM, and CNN. Figure 19b shows the results of four test models for each deep learning model; the performance exhibited the following order: TM2, TM3, TM1, and TM4. Figure 20 shows the DNN, RNN, CNN, and LSTM proposed for the four TMs in January 2018 (winter) for Company T. Figure 20a shows the proposed DNN performance analysis for the four TMs. The four TMs displayed excellent performance. Thus, a building such as Company T, whose power consumption pattern is not constant, will perform well if the power consumption is predicted using the existing power consumption and weather data. In Figure 20b-d, the predicted values of RNN, CNN, and LSTM are applied with TM1-TM3, which have a similar performance to the actual values. However, the predicted values of RNN, CNN, and LSTM applied with TM4 have a large error difference when compared to the measured values; thus, the performance was poor.  Table 6 shows the profound learning performance results for Company T with irregular power consumption patterns. By comparing the performance of the four test models, the prediction error for each test model was not significant for Company T, unlike Company B. In other words, Company T has a slight difference in the deep learning performance based on the month, but similar results are observed for Company B, as follows:

Company T with the Irregular Power Consumption Pattern
(1) The DNN and RNN are the four TMs with excellent performance. (2) The performances of the LSTM and CNN are similar to DNN and RNN in TM1-TM3, but TM4 did not perform well. Therefore, although Company T has an irregular power consumption pattern, it has an excellent performance when simulating DNN, RNN, and LSTM by adopting TM1 or TM2 for predicting the power consumption.     Figure 18 shows a comparison of MAPE (%) for the traditional and the proposed methods for Company T in April 2018. Figure 18a is a performance comparison of four TMs using deep learning. The error between the traditional and the proposed methods was almost similar to that observed for the TM performance, unlike Company B. Similar to Company B, Company T's CNN performed poorly in all TMs. Figure 18b shows a comparison of the deep learning performance exhibited by TM. The error between the traditional and the proposed methods appeared in the following descending order: CNN, LSTM, RNN, and DNN. DNN, RNN, and LSTM performed similarly regardless of the number of variables.  Figure 19 shows the results of comparing the MAPE (%) performance by using TMs and deep learning for the method proposed for Company T in January 2018. Figure 19a shows the deep learning performance of TMs, which exhibited the following order: DNN, RNN, LSTM, and CNN. Figure 19b shows the results of four test models for each deep learning model; the performance exhibited the following order: TM2, TM3, TM1, and TM4.     Figure 18 shows a comparison of MAPE (%) for the traditional and the proposed methods for Company T in April 2018. Figure 18a is a performance comparison of four TMs using deep learning. The error between the traditional and the proposed methods was almost similar to that observed for the TM performance, unlike Company B. Similar to Company B, Company T's CNN performed poorly in all TMs. Figure 18b shows a comparison of the deep learning performance exhibited by TM. The error between the traditional and the proposed methods appeared in the following descending order: CNN, LSTM, RNN, and DNN. DNN, RNN, and LSTM performed similarly regardless of the number of variables.  Figure 19 shows the results of comparing the MAPE (%) performance by using TMs and deep learning for the method proposed for Company T in January 2018. Figure 19a shows the deep learning performance of TMs, which exhibited the following order: DNN, RNN, LSTM, and CNN. Figure 19b shows the results of four test models for each deep learning model; the performance exhibited the following order: TM2, TM3, TM1, and TM4.    analysis for the four TMs. The four TMs displayed excellent performance. Thus, a building such as Company T, whose power consumption pattern is not constant, will perform well if the power consumption is predicted using the existing power consumption and weather data. In Figure 20b-d, the predicted values of RNN, CNN, and LSTM are applied with TM1-TM3, which have a similar performance to the actual values. However, the predicted values of RNN, CNN, and LSTM applied with TM4 have a large error difference when compared to the measured values; thus, the performance was poor.

Conclusions and Future Research
Forecasting the building power consumption is necessary to improve the energy efficiency of buildings and overcome the power peak problem or energy crisis that occur every year worldwide. Building owners and emergency medical services operators need to manage their buildings' electrical energy consumption. Since electrical energy is the main form of energy consumed in commercial buildings, the ability to predict the electrical energy consumption would benefit building owners and operators. Data-driven models for energy forecasting have been extensively studied over the past few decades due to their improved performance, robustness, and ease of deployment. Artificial neural networks are one of the most popular data-driven approaches applied to date among the different models.
In this study, power demand prediction was proposed by using deep learning for two industrial buildings with different power consumption patterns. At this time, four TMs were constructed and simulated while considering the influences on power consumption. This includes the industrial activities, economic factors, resident living patterns, and weather data for buildings, which represent the primary power consumption.
Although the performances are different for each of the four TMs and deep learning methods, TM1 (power consumption for one year) and TM2 (power consumption for two years), which do not use weather data, show excellent performance when using DNN, RNN, and LSTM. In contrast, the performance was poor if weather data that did not affect power consumption were used (such as TM4). Therefore, it is more critical to select weather data (temperature, humidity, solar radiation, and cloud cover) that only affects power consumption, such as TM3.
In future research, the proposed method in this article will be applied worldwide by overcoming the limitations of the Korean region.

Conclusions and Future Research
Forecasting the building power consumption is necessary to improve the energy efficiency of buildings and overcome the power peak problem or energy crisis that occur every year worldwide. Building owners and emergency medical services operators need to manage their buildings' electrical energy consumption. Since electrical energy is the main form of energy consumed in commercial buildings, the ability to predict the electrical energy consumption would benefit building owners and operators. Data-driven models for energy forecasting have been extensively studied over the past few decades due to their improved performance, robustness, and ease of deployment. Artificial neural networks are one of the most popular data-driven approaches applied to date among the different models.
In this study, power demand prediction was proposed by using deep learning for two industrial buildings with different power consumption patterns. At this time, four TMs were constructed and simulated while considering the influences on power consumption. This includes the industrial activities, economic factors, resident living patterns, and weather data for buildings, which represent the primary power consumption.
Although the performances are different for each of the four TMs and deep learning methods, TM1 (power consumption for one year) and TM2 (power consumption for two years), which do not use weather data, show excellent performance when using DNN, RNN, and LSTM. In contrast, the performance was poor if weather data that did not affect power consumption were used (such as TM4). Therefore, it is more critical to select weather data (temperature, humidity, solar radiation, and cloud cover) that only affects power consumption, such as TM3.
In future research, the proposed method in this article will be applied worldwide by overcoming the limitations of the Korean region.