A Two-Stage Industrial Load Forecasting Scheme for Day-Ahead Combined Cooling, Heating and Power Scheduling

: Smart grid systems, which have gained much attention due to its ability to reduce operation and management costs of power systems, consist of diverse components including energy storage, renewable energy, and combined cooling, heating and power (CCHP) systems. The CCHP has been investigated to reduce energy costs by using the thermal energy generated during the power generation process. For e ﬃ cient utilization of CCHP and numerous power generation systems, accurate short-term load forecasting (STLF) is necessary. So far, even though many single algorithm-based STLF models have been proposed, they showed limited success in terms of applicability and coverage. This problem can be alleviated by combining such single algorithm-based models in ways that take advantage of their strengths. In this paper, we propose a novel two-stage STLF scheme; extreme gradient boosting and random forest models are executed in the ﬁrst stage, and deep neural networks are executed in the second stage to combine them. To show the e ﬀ ectiveness of our proposed scheme, we compare our model with other popular single algorithm-based forecasting models and then show how much electric charges can be saved by operating CCHP based on the schedules made by the economic analysis on the predicted electric loads.


Introduction
Recently, as the amount of resources consumed by one person has increased, there are growing concerns about environmental problems caused by carbon dioxide emitted during energy generation and energy shortage problems [1]. Smart grid technologies have been gaining much attention because they help to solve these problems by enabling more efficient use of energy [2]. A smart grid is an intelligent power grid that combines information and communication technology with the existing power grid and integrates the work of all users in the power network by using computer-based remote control and automation [3]. It allows monitoring, analyzing, controlling, and communication within the supply chain to improve efficiency, reduce energy consumption and costs, and maximize the transparency and reliability of the energy supply chain [4]. In addition, by intelligentizing the power grid, it is possible to construct a bi-directional supply system such as a microgrid and distributed power supply system where suppliers and consumers can exchange information that they need [5].
Based on this information, energy prosumers can be more active in the trade of electricity. For instance, prosumers on the demand side can choose the supplier that can supply electricity at a lower price, and prosumers on the supply side can create opportunities to sell electricity more expensive.
Typical smart grids are closely related to various energy systems such as the energy storage system (ESS), renewable energy system (RES), combined cooling, heating and power (CCHP), and so on [6]. In particular, CCHP is a cogeneration technology that integrates an absorption chiller to produce cooling. Thermal energy produced during the power generation process is collected to meet cooling and heating demands via the absorption chiller and heating unit [7]. Besides, natural gas-based CCHP has the advantage of lower fuel prices and lower carbon dioxide emissions compared to existing fossil fuel-based power generations [8]. For the efficient operation of CCHP, accurate short-term load forecasting (STLF) is required [9]. STLF is the basis of the design and implementation of the control strategy of the CCHP system, and the results of the STLF affect the overall energy efficiency of the system directly [10]. CCHP uses the primary energy to drive the generator to generate electricity and then recycle waste heat using waste heat equipment. Therefore, running CCHP without accurate predictions can increase the unnecessary operation cost of the power generation facility [11].
Electric energy consumption can be affected by diverse factors, which include architectural structures, thermal properties of physical materials, lighting, time zones, climatic conditions, and electric rates [12]. In addition, there are complicated electric load correlations between current and previous times [13]. They should be considered appropriately for accurate electric energy consumption forecasting. For instance, many STLF models have proposed a single machine learning algorithm to consider them [14]. However, such models do not always provide good prediction performance because electric energy consumption patterns are intricate, and uncertain external factors can cause a shift in the demand curve [15]. Besides, the domains that they show good performance could be different. Thus, it is not effective to use a single STLF model for prediction in diverse domains. This limitation can be alleviated by combining multiple models of this type [16].
To address these issues, many previous studies have suggested a two-stage STLF model that uses linear regression in the second stage for improving the accuracy of electric load forecasting [17]. These models performed better than previous studies that use a single algorithm by combining the predicted values obtained in the first stage [18]. However, there still are many deficits in the linearly combined model. For instance, the fixed weights of the linear combination can ignore the importance of potential nonlinear terms, which leads to a reduction in prediction accuracy. Additionally, the linear combination can give poor forecasting results when there is a strong nonlinear relationship between individual predictors and outcomes [19]. South Korea is one of the highest energy consumption countries and is interested in using smart grids to improve energy efficiency [20]. However, although studies on the electric load forecasting model have been sufficiently conducted, there are not many cases of configuring a power system in conjunction with CCHP. We focus on the features of the Korean power system and develop an application for scheduling CCHP operations to provide a bi-directional benefit to power suppliers and users.
In this paper, we propose a novel two-stage STLF scheme based on nonlinear combination of forecasting methods to solve this problem. In the first stage, we build two STLF models based on extreme gradient boosting (XGBoost) and random forest (RF), which are known to be popular tree-based ensemble models in time series prediction. In the second stage, we build a deep neural network (DNN)-based STLF model to combine the predicted values of XGBoost and RF. Further, we propose an economic analysis-based operation scheduling scheme for CCHP to effectively utilize the results of the STLF. For instance, in Korea, electric rates and contract demand, especially for industrial services should be considered in the electric rate system. Contract demand indicates the instantaneous peak load contracted with the power supply company. Based on the contract demands, a power supply company can make a stable power plan. Basically, the lower the contract demand is, the lower the basic electricity bill is. Hence, to derive more accurate contract demands, the following policy is used: If the consumer sets the contract demand too low, a progressive tax penalty will be added to the excess power, which results in higher electricity charges. On the contrary, if the contract demand is set too high, consumers have to pay unnecessarily high electricity bills. The economic analysis shows the electric rate and contract demand that should be made to achieve the lowest electric charges. In order to intuitively display the outcome of the economic analysis, a graphical representation of CCHP scheduling is shown with the amount of economic benefits gained from the schedule. Figure 1 shows the overall architecture of our scheme.
The main contributions of this paper are as follows: (1) We propose a novel two-stage STLF scheme that can predict electric energy consumption accurately compared to other previous methods. (2) We propose a method to generate an optimal operation schedule of CCHP based on the predictive values of electric energy consumption and electric/gas charges in South Korea. (3) We propose a method to minimize the electric charge by calculating optimal contract demand and electric rate.
Energies 2020, 13, x FOR PEER REVIEW 3 of 23 demand is set too high, consumers have to pay unnecessarily high electricity bills. The economic analysis shows the electric rate and contract demand that should be made to achieve the lowest electric charges. In order to intuitively display the outcome of the economic analysis, a graphical representation of CCHP scheduling is shown with the amount of economic benefits gained from the schedule. Figure 1 shows the overall architecture of our scheme.
The main contributions of this paper are as follows: (1) We propose a novel two-stage STLF scheme that can predict electric energy consumption accurately compared to other previous methods. (2) We propose a method to generate an optimal operation schedule of CCHP based on the predictive values of electric energy consumption and electric/gas charges in South Korea. (3) We propose a method to minimize the electric charge by calculating optimal contract demand and electric rate.  The rest of this paper is organized as follows: In Section 2, related studies on STLF are reviewed. In Section 3, we explain the input variables for constructing the STLF model. In Section 4, we describe the structure of our forecasting model, and then, explain how to make CCHP operational scheduling in Section 5. In Section 6, we describe some of the experiments for performance evaluation of the proposed model and CCHP operation scheduling. Finally, in Section 7 we summarize our study.

Related Works
So far, many studies have been done to efficiently perform STLF. In recent years, diverse machine learning algorithms in particular have been tested to build more accurate STLF models [21]. In this section, we first introduce diverse STLF models and then describe our previous works for STLF.

Short-Term Load Forecasting
Typical approaches for STLF have been applied to statistical and machine learning methods for diverse external information such as time factors, weather conditions, and so on. Table 1 summarizes STLF-related studies using statistical techniques and machine learning. Veghefi et al. [22] proposed an STLF model based on the Cochrane-Orcutt estimation technique that combines the multiple linear regression (MLR) and seasonal auto-regressive integrated moving average (SARIMA) models to predict cooling and electric energy consumption effectively. Bagnasco et al. [23] constructed an artificial neural network (ANN)-based STLF model considering holiday indicators and weather conditions as input variables for forecasting electric energy consumption of Cellini Medical Clinic. Powell et al. [24] constructed an STLF model based on a nonlinear autoregressive model with exogenous inputs (NARX) for heating, cooling, and electric energy consumption of a district energy system. This study was unique because it covered a large-scale district energy system that simultaneously produced combined heat and power (CHP), chilled water thermal energy storage (TES), gas turbines, steam turbines, heat recovery steam generators (HRSGs), and auxiliary boilers for a large campus. Jurado et al. [25] constructed several prediction models using RF, ANN, and fuzzy inductive reasoning (FIR). They then compared the prediction models with an ARIMA model by predicting electric energy consumptions in three different buildings at Catalonia Technical University, Catalonia, Spain. They confirmed that FIR showed the best prediction performance. Sandels et al. [26] presented a data analysis framework for identifying and generating models that can predict energy consumption on load level in North European office building floors. The models were based on a simplified statistical approach that did not require detailed knowledge about the office building floor. Grolinger et al. [27] constructed two STLF models based on support vector regression (SVR) and ANN. They considered time data, historical electric load data, and event information and compared their prediction performances with other methods for a large entertainment building in Canada. With daily data, the ANN model achieved better accuracy than the SVR. Gerossier et al. [28] presented a forecasting model for hourly household electric load based on quantile smoothing spline regression using the previous day's hourly load, last week's mid-load, and temperature. They computed the mean of the predicted quantile distribution and used it as a single-point forecast. These statistical approaches exhibited excellent performance for simple demand patterns but inaccurate prediction performance for intricate demand patterns. Chen et al. [29] developed a combination of a hybrid SVR model and multiresolution wavelet decomposition (MWD) to predict the hourly electric energy consumption of a hotel and mall. Dong et al. [30] proposed a seasonal SVR with a chaotic cuckoo search (CCS) named SSVRCCS to predict electric energy consumptions in the National Electricity Market and New York Independent System Operator. By using the CCS model, their proposed model can enlarge the population in cuckoo search (CS) to prevent the local optimal problem and increased the search space. By using the seasonal SVR model, it can deal with the seasonal cyclic nature of electric load for accurate and better prediction. However, the computational time is increased due to a large number of iterations. Fan et al. [31] proposed a novel electricity load forecasting model by hybridizing the phase space reconstruction (PSR) algorithm with the bi-square kernel (BSK) regression model, namely the PSR-BSK model. The authors investigated the performance of the model using an hourly dataset of NYISO, USA, and New South Wales market. Hong et al. [32] proposed an electric load prediction model, namely the H-EMD-SVR-PSO model, which combines the empirical mode decomposition (EMD) method, particle swarm optimization (PSO) algorithm, and SVR, to improve predictive accuracy. Based on electrical load data from the Australian electricity market, experimental results showed that the proposed H-EMD-SVR-PSO model received more satisfactory prediction performance than other comparable models.
These studies suggested the construction of non-generic forecasting models by considering the characteristics of buildings and microgrids. On the other hand, CCHP can be installed and used in various places with possibly different features. Moreover, different types of schedules may be required for CCHP depending on electric energy consumption patterns.

Our Previous Works
In this section, we briefly describe several STLF models that we proposed in our previous studies and their differences from the proposed model.
In [33], we built two STLF models using the ANN and SVR for four building clusters of a private university in South Korea. For the prediction, we considered not only weather information and time data but also university events, office hours, and class hours. Subsequently, we evaluated the prediction performance of each model by using 5-fold cross-validation. The comparison showed that the ANN-based forecasting model had better performance than the SVR-based model. In [34], we proposed another STLF model based on an auto-encoder (AE) and RF. The AE was used to extract weather information features and time factors effectively. We constructed an RF-based forecasting model using feature extraction values and historical electric loads for day-ahead electric load forecasting. The model was evaluated using the electric energy consumption data of university campuses and the results showed that it gave a better performance than the proposed model in [33]. In [35], we proposed a recurrent inception convolution neural network (RICNN) that combines recurrent neural networks (RNN) and 1-dimensional convolutional neural networks (CNN) to forecast multiple short-term electric loads (48 time steps with an interval of 30 min). A 1-D convolution inception module was used to calibrate the prediction time and hidden state vector values calculated from nearby time steps. By doing so, the inception module could generate an optimized network via the prediction time generated in the RNN and nearby hidden state vectors. The proposed RICNN model was verified using the electric energy consumption data of three large distribution complexes in South Korea. In [36], we constructed diverse ANN models using different numbers of hidden layers and diverse activation functions and compared their performances in a 30 min STLF resolution. To compare the prediction performance, we considered electric load data collected for two years from five different types of buildings (including the dataset used in this study). The comparison showed that a scaled exponential linear unit (SELU)-based ANN model with five hidden layers had a better average performance than other ANN-based STLF models. In [37], we proposed a two-stage electric load forecasting model that combined XGBoost and RF using MLR for the efficient operation of CCHP. To construct this model, an hourly load forecasting was performed using XGBoost and RF. The forecasting results were then combined using a sliding-window based MLR to reflect the energy consumption pattern. The model had a better prediction performance compared with several popular single algorithm-based forecasting models.
The difference between the papers mentioned above and our paper is as follows. The models in [33] were tailored for university campuses; they were challenging to apply for other types of buildings. The model in [34] used AE to extract features. However, since the performance of AE heavily depends on the size of the training set, it is challenging to show excellent performance if there is not enough quantity of data. In [35], we proposed the RICNN model. However, the RICNN, which purposed a probabilistic approach, is a different purpose because we focus on day-ahead point load forecasting. In [36], the SELU-based ANN model with five hidden layers showed that the dataset we used in this study exhibited insufficient prediction accuracy compared to the other building types because its electric loads are close to zero. In [37], we proposed a two-stage electric load forecasting model to combine XGBoost and RF using MLR. However, to use the forecasted values from one-stage more efficiently, we have to consider existing input variables. Eventually, we further develop our research and propose integrated applications with CCHP operation scheduling and electric rate recommendations, not just ending with forecasts.

Input Variable Configuration
In this study, we use hourly electric energy consumption data collected from 1 January 2015 to 31 December 2018 from an industrial building in Incheon, South Korea. Table 2 summarizes some statistics of the collected data. To construct our STLF model, we consider time factors, weather information, historical electric energy consumption, and electric rates for input variable configuration. The details are described in the following subsections.

Time Data
As time is a very critical factor in the trends of electric energy consumption, we consider all variables that express time such as month, day, hour, day of the week, and holiday. Table 3 shows a list of the time factors we considered as input variables. Herein, month, day, and hour have a sequence form. It is difficult to reflect periodic information in machine learning algorithms when data are in a sequential format. Therefore, we enhanced the data to 2-dimensional data through the periodic function [36]. Table 4 summarizes some regression statistics of 1-dimensional, 2-dimensional, and 1-dimensional + 2-dimensional time factors. The table shows that 1-dimensional + 2-dimensional space data can represent the time factor most effectively. Therefore, we use both 1-dimensional data and continuous 2-dimensional data to represent time factor.

Weather Data
Because the frequency of use of high-power consumption products such as air conditioners and radiators is closely related to weather [38], weather conditions have generally been used for constructing STLF models in many studies [39]. In South Korea, various weather forecast information including temperature, humidity, wind speed, and so on are provided by the Korea Meteorological Administration (KMA). However, KMA provides weather data using two different time resolutions depending on the type of forecast. Very short-term weather forecast provides weather data up to 4 h by 30 min resolution, and short-term weather forecast provides weather data resolution up to 67 h by 3 h resolution. Since our goal is to predict day-ahead electric energy consumption, we need weather data for up to 24 h. Thus, we used the short-term weather forecast data that have 3 h resolution and used linear interpolation to calculate 1 h weather forecast data from them. The short-term weather forecast data consists of values such as daily minimum temperature, daily average temperature, daily maximum temperature, temperature, humidity, wind speed, and precipitation, as shown in Figure 2.

Weather Data
Because the frequency of use of high-power consumption products such as air conditioners and radiators is closely related to weather [38], weather conditions have generally been used for constructing STLF models in many studies [39]. In South Korea, various weather forecast information including temperature, humidity, wind speed, and so on are provided by the Korea Meteorological Administration (KMA). However, KMA provides weather data using two different time resolutions depending on the type of forecast. Very short-term weather forecast provides weather data up to 4 h by 30 min resolution, and short-term weather forecast provides weather data resolution up to 67 h by 3 h resolution. Since our goal is to predict day-ahead electric energy consumption, we need weather data for up to 24 h. Thus, we used the short-term weather forecast data that have 3 h resolution and used linear interpolation to calculate 1 h weather forecast data from them. The shortterm weather forecast data consists of values such as daily minimum temperature, daily average temperature, daily maximum temperature, temperature, humidity, wind speed, and precipitation, as shown in Figure 2. In addition, to establish a more direct correlation between weather data and electric energy consumption, we considered the discomfort index (DI) [40] and wind chill (WC) [41]. DI and WC are defined using Equations (1) and (2), respectively. Here, T, H, and WS represent the temperature, humidity, and wind speed, respectively. (1) As a result, we use nine types of weather data (i.e., daily maximum temperature, daily average temperature, daily minimum temperature, temperature, humidity, wind speed, precipitation, In addition, to establish a more direct correlation between weather data and electric energy consumption, we considered the discomfort index (DI) [40] and wind chill (WC) [41]. DI and WC are defined using Equations (1) and (2), respectively. Here, T, H, and WS represent the temperature, humidity, and wind speed, respectively. (1) As a result, we use nine types of weather data (i.e., daily maximum temperature, daily average temperature, daily minimum temperature, temperature, humidity, wind speed, precipitation, discomfort index, and wind chill) for the STLF model construction. Table 5 summarizes an example of weather conditions considered for the input variables.

Historical Electric Energy Consumption
Historical electric energy consumption is a good indicator of electric usage forecasts as it shows electricity usage patterns and trends [42]. We consider a specific time for 10 different days to reflect historical electric energy consumption. Herein, the historical electric energy consumption we considered included the last six days and the same day of the previous four weeks. For instance, if the forecast time is Saturday, 29 June 2019, 4 p.m., we use historical load data measured at 4 p.m. from 1, 8, 15, 22, 23, 24, 25, 26, 27, and 28 June. However, the weekday and weekend load patterns could be different. To reflect trends through historical load data more accurately, we use additional data that indicate whether the ten days used as input variables were holidays or not.

Electric Rates
Because one of the operational goals of the smart grid is to reduce electric charges, many STLF studies have used electric rate information as one of the input variables [43]. Thus, we also consider information on electric rates as input variables [44]. In South Korea, three different sections are used by Korea Electric Power Corporation (KEPCO) for electric rate: off-peak, mid-peak, and on-peak loads [45]. Electric rates are determined by the amount of electric energy consumption, the intended use of the building (i.e., residential, general, educational, industrial, etc.) and the season or month. As in the day of the week and holiday data, one-hot encoding method is used to represent the electric rate information. Hence, depending on the time and rate section, the input variable is set to 1; otherwise, it is set to zero.

Two-Stage STLF Model Construction
So far, various single algorithm-based STLF models have been proposed [46]. Even though they showed good performance in the domains that were focused on, their performance was limited in the other domains or electric energy consumption patterns were intricate. To alleviate this limitation, we propose a two-stage day-ahead STLF model that combines two single algorithm-based STLF models using DNNs.

The First Stage: Constructing Two STLF Models
In the first stage, we build two STLF models based on XGBoost and RF, which are well-known tree-based ensemble models in time series prediction [47], by using various input variables. They are based on boosting and bootstrap aggregating (bagging) algorithms, respectively. Compared to other boosting algorithms and bagging algorithms, the XGBoost and RF models show better predictive accuracy and have the highest correlation with actual power consumption. In addition, as XGBoost supports various loss functions, we can choose an appropriate loss function depending on the characteristics of the data. On the contrary, it suffers from overfitting during training [48]. RF can handle high dimensional data well, but it cannot give precise value for the regression model because the final prediction is the average of all the predictions from the subset trees [49]. By using the predicted values of the XGBoost model and the RF model together with other input variables, it is possible to prevent overfitting and to make more accurate prediction.

Extreme Boosting Machine
XGBoost, which was proposed by Chen and Guestrin [50], is a scalable machine learning model used in tree-boosting. It has been widely used for forecasting purposes such as STLF and store sales forecasting [51]. The basic principle of XGBoost is boosting, which combines a weak basic learning model with an active learner in an iterative fashion [52]. At each iteration of boosting, the residuals can modify the previous predictor to optimize the specified loss function. XGBoost provides faster learning and expandability based on parallel and distributed computing by further developing the existing boosting technique. It establishes an objective function to measure model performance by adding regularization to loss functions to improve performance. In addition, missing values can be handled easily because they are recognized and automatically supplemented to perform boosting. XGBoost gradually increases the depth of the tree at the beginning of learning. If the gain information obtained in the tree with increased depth is smaller than the of Gamma value, the depth stops increasing.

Random Forest
RF is a flexible machine learning algorithm that produces excellent results even without hyper parameter tuning. It has become one of the most commonly used machine learning algorithms because it can be easily used for classification and regression. Moreover, it can work efficiently on a large amount of data and handle thousands of input variables without deleting them, which is why it performs well. The basic principle of RF is called the bagging algorithm [53]. Bagging is an ensemble algorithm designed to improve the stability and accuracy of individual forecasting models such as decision trees. It selects a random sample of size n from the training set, fits it in the individual forecasting models, and produces a result that is averaged or voted on all individual forecasting models. The bagging algorithm in RF helps reduce the variance and influence of overfitting of decision trees.

The Second Stage: Combining STLF Models Using DNN
An ANN, which is also known as a multilayer perceptron (MLP), is a type of machine learning algorithm that is a feed-forward neural network architecture with an input layer, hidden layer, and output layer [54]. It aims to learn the nonlinear and complex structure of data by duplicating human brain functions [55]. Each layer in the neural network consists of several nodes. Each node receives values from the nodes in the previous layer to determine the output and provide values for the nodes in the next layer. As this process repeats, the nodes in the output layer provide the required values [56]. The number of hidden layers determines whether the network is deep or shallow. For instance, when the number of hidden layers is two or more, then the network is called a deep neural network [57]. Recently, various DNN models have shown excellent prediction performances due to the remarkably improved computing performance [58].
In the second stage, we construct an STLF model by combining the results of the two STLF models built in the first stage using a DNN. For training the DNN model, we used the predicted values of XGBoost and RF as input variables to reflect the characteristics of bagging and boosting algorithms. We also considered time factors, weather data, historical electric energy consumption data, and electric rate as input variables to further improve the forecasting performance. In our DNN model, we use the SELU function as an activation function and the number of hidden layers is set as five [36]. Additionally, the number of neurons in the hidden layer is set by two thirds of the number of input variables [59].

Economic Analysis Based CCHP Operation Scheduling
CCHP is known to improve energy utilization, reduce energy costs, and respond to peak loads by using thermal energy generated from the power generation process for heating and cooling. In addition, by using natural gas, CCHP can be a solution to environmental pollution [60]. Natural gas is a relatively clean-burning fossil fuel [61]. Burning natural gas for energy gives fewer emissions of nearly all types of air pollutants and carbon dioxide than burning coal or petroleum products to produce an equal amount of energy [62]. In this section, to see the applicability of our proposed scheme, we describe how daily CCHP operation scheduling can be made based on the forecasted daily electric energy consumption of 1 h resolution. In particular, we consider natural gas as the primary energy source of CCHP, and the economic benefit of CCHP operation is changed according to power generation efficiency. Hence, for its economic analysis, the cost of natural gas consumed in power generation must be determined by the power generation efficiency. Table 6 summarizes the gas charge sections of the industrial service in South Korea. The electric rate system should be considered for a more accurate economic analysis. There are several considerations in the electric rate system of South Korea.

•
The electric rate system divides the types of contracts according to the purpose of electricity usage, and applies the corresponding charges. The contract types are divided into six classes, namely, residential, general, industrial, educational, agricultural, and streetlights service. Some contract types have more granular rates, depending on the size of the voltage or the contract demand.

•
The electric rate consists of the demand and energy charges. The demand charge recovers the fixed costs related to the electric energy supply equipment. It is determined based on the contract demand or peak load. On the other hand, the energy charge recovers the variable costs in proportion to usage. • Seasonal and hourly differential electric rates are applied to some contract types, including industrial and general service. To reflect the differences in supply costs by time zone according to seasonal demand, high rates are charged in seasons and time zones with high electric energy consumption, and low rates are applied in seasons and time zones with low electric energy consumption.
• The electric rate system offers three options depending on the relative amount of the demand and energy charges: Option I, Option II, and Option III. For the demand charge, Option I > Option II > Option III and for the energy charges, Option I < Option II < Option III. These options are for reducing energy consumption, inducing voluntary peak time load management, and ultimately reducing the cost of power equipment by enabling consumers to select an electric rate depending on their load pattern.
As we focus on the industrial building in this study, we have more refined electric rates depending on the supply voltage and the contract demand. First, depending on whether the contract demand exceeds 300 kW or not, there are two electric rates: Type A and Type B. For each rate, there are four groups depending on the size of the supply voltage: low voltage, high voltage A, high voltage B, and high voltage C. Each group then offers three options: Option I, Option II, and Option III. Table 7 summarizes the electric rates for Industrial Service (B), High Voltage A, and Option I. Industrial service (B) is an electric rate that can be used when contract demand is more than 300 kW. Operation scheduling is created to maximize annual economic benefits. Equations (3)-(6) represent detailed formulas for calculating annual economic benefits. Economic benefits are composed of two parts: (i) reduced electric charges, which are the direct economic benefits of using CCHP, and (ii) reduced heating/cooling charges by using thermal energy generated by CCHP. In the experiment, we assume CCHP can make 1. 43 Mcal of thermal energy while generating 1 kWh [63]. We calculate how much it would cost to obtain this 1. 43 Mcal of thermal energy using electric energy and reflect it in the formulas. Algorithm 1 shows the generation of an operational schedule for maximizing annual economic benefits. Basically, the schedule tells how much energy should be generated by CCHP and how much energy should be supplied by the public power system for each scheduling hour. Annual

Comparison of Prediction Performance with Various STLF Models
In this paper, we compare popular machine learning algorithms such as decision tree (DT), gradient boosting machine (GBM), bagging algorithm, and so on, to explain why we chose XGBoost and RF models in the first stage. Besides, we compare the performance with the prediction model (Persistence) which is actually using in the data collection environment. Persistence model uses the previous day (or the corresponding day in the previous week) as a prediction. Persistence implies that future values of the time series are calculated on the assumption that conditions remain unchanged between the current time and future time. As the second stage of our proposed model uses the predicted values of these two models, we divide the dataset into training set 1 (training the first-stage model), training set 2 (training the second-stage model), and test set (forecasting electric energy consumption and economic analysis), at a ratio of 50:25:25. Specifically, data collected from January 2015 to December 2016 was used as training set 1, data collected from January 2017 to December 2017 was used as training set 2, and data collected from January 2018 to December 2018 was used as test set. The performance of each machine learning algorithm was measured using the training set 2. Figure 3 shows monthly energy consumption and divided dataset. In addition, we compare our proposed model with various STLF models composed of different machine learning algorithms in the first-stage, and several forecasting models from our previous studies in the secondstage. To do this, we divided the dataset into training and test sets, at a ratio of 75:25.

Comparison of Prediction Performance with Various STLF Models
In this paper, we compare popular machine learning algorithms such as decision tree (DT), gradient boosting machine (GBM), bagging algorithm, and so on, to explain why we chose XGBoost and RF models in the first stage. Besides, we compare the performance with the prediction model (Persistence) which is actually using in the data collection environment. Persistence model uses the previous day (or the corresponding day in the previous week) as a prediction. Persistence implies that future values of the time series are calculated on the assumption that conditions remain unchanged between the current time and future time. As the second stage of our proposed model uses the predicted values of these two models, we divide the dataset into training set 1 (training the first-stage model), training set 2 (training the second-stage model), and test set (forecasting electric energy consumption and economic analysis), at a ratio of 50:25:25. Specifically, data collected from January 2015 to December 2016 was used as training set 1, data collected from January 2017 to December 2017 was used as training set 2, and data collected from January 2018 to December 2018 was used as test set. The performance of each machine learning algorithm was measured using the training set 2. Figure 3 shows monthly energy consumption and divided dataset. In addition, we compare our proposed model with various STLF models composed of different machine learning algorithms in the first-stage, and several forecasting models from our previous studies in the second-stage. To do this, we divided the dataset into training and test sets, at a ratio of 75:25. Additionally, we selected a coefficient of variation of the root mean square error (CVRMSE) and mean absolute percentage error (MAPE) because they are easier to understand than other performance indicators such as the root mean square error (RMSE) or mean squared error (MSE) [64]. They were then used to evaluate the prediction performance of the proposed model. The CVRMSE and MAPE equations are shown in (7) and (8), respectively. Here, is the number of time observed, is an average of the actual values. and are the actual and predicted values, respectively.  Additionally, we selected a coefficient of variation of the root mean square error (CVRMSE) and mean absolute percentage error (MAPE) because they are easier to understand than other performance indicators such as the root mean square error (RMSE) or mean squared error (MSE) [64]. They were then used to evaluate the prediction performance of the proposed model. The CVRMSE and MAPE equations are shown in (7) and (8), respectively. Here, n is the number of time observed, Y is an average of the actual values. Y i andŶ i are the actual and predicted values, respectively. Figure 4 exhibits the comparison of CVRMSE and MAPE results for each machine learning algorithm. As shown in Figure 4, XGBoost and RF models show better prediction performance in training set 2 compared with other machine learning algorithms. The performance of machine learning techniques is better than the persistence model which is a statistical model. In addition, the performance of the XGBoost and RF models was better than the other machine learning algorithms. XGBoost performed well because it allows users to choose an appropriate loss function depending on the characteristics of the data. RF performed well because it can handle high-dimensional data well. Table 8 summarizes the Pearson correlation coefficients between the forecasted values of machine learning algorithms and actual electric energy consumptions. We found that the forecasted values of XGBoost and RF present higher correlation coefficients than those of other machine learning algorithms. Therefore, we used the forecasted values of XGBoost and RF as new input variables for the second stage.  As shown in Figure 4, XGBoost and RF models show better prediction performance in training set 2 compared with other machine learning algorithms. The performance of machine learning techniques is better than the persistence model which is a statistical model. In addition, the performance of the XGBoost and RF models was better than the other machine learning algorithms. XGBoost performed well because it allows users to choose an appropriate loss function depending on the characteristics of the data. RF performed well because it can handle high-dimensional data well. Table 8 summarizes the Pearson correlation coefficients between the forecasted values of machine learning algorithms and actual electric energy consumptions. We found that the forecasted values of XGBoost and RF present higher correlation coefficients than those of other machine learning algorithms. Therefore, we used the forecasted values of XGBoost and RF as new input variables for the second stage. Tables 9-11 summarize the comparison of our proposed model with other 2-stage models and several forecasting models of our previous studies in terms of CVRMSE and MAPE. As summarized in Tables 9-11, our proposed model showed an almost better prediction performance than other forecasting models. Finally, to ensure the significant contribution in terms of forecasting accuracy improvement for the proposed model, the Wilcoxon test and the Friedman test are conducted [30]. Wilcoxon test was used to test the null hypothesis by setting the null hypothesis to determine whether there was a significant difference between the two models. If the p-value is less than the significance level, the null hypothesis is rejected and the two models are judged to have significant differences. Friedman test is a multiple comparisons test that aims to detect significant differences between the results of two or more algorithms model. The results of the Wilcoxon test with the significance level set to 0.05 are shown in Table 12. Since the p-value in all cases is below the significance level, it was proven that proposed model is superior to the other models.

Economic Analysis Based CCHP Operation Scheduling
In this section, we describe how CCHP operation scheduling is made based on economic analysis. To maximize the annual economic benefits, it is also essential to determine the electric rate and amount of contract demand at the same time. We perform an experiment to find the optimal electric rate and contract demand to maximize on economic benefits.
A monthly economic analysis using the test set confirms that the economic benefits are similar to the monthly energy consumption, as shown in Figure 5. We can see that high economic benefits can be obtained in summer and winter when energy consumption is high.

Economic Analysis Based CCHP Operation Scheduling
In this section, we describe how CCHP operation scheduling is made based on economic analysis. To maximize the annual economic benefits, it is also essential to determine the electric rate and amount of contract demand at the same time. We perform an experiment to find the optimal electric rate and contract demand to maximize on economic benefits.
A monthly economic analysis using the test set confirms that the economic benefits are similar to the monthly energy consumption, as shown in Figure 5. We can see that high economic benefits can be obtained in summer and winter when energy consumption is high. Because the industrial building where the electric energy consumption data was collected is equipped with advanced meters, the electric rate of industrial service (A) II and industrial service (B) can be chosen. In addition, since the building's supply voltage is between 3300 V and 66,000 V, we choose the high voltage A as the electric rate of the building. Industrial service (A) II has two options, and industrial service (B) has three options. As a result, five different electric rates are compared in the experiment. Figure 6 shows the annual economic benefit of each electric rate based on contract demand. Because the industrial building where the electric energy consumption data was collected is equipped with advanced meters, the electric rate of industrial service (A) II and industrial service (B) can be chosen. In addition, since the building's supply voltage is between 3300 V and 66,000 V, we choose the high voltage A as the electric rate of the building. Industrial service (A) II has two options, and industrial service (B) has three options. As a result, five different electric rates are compared in the experiment. Figure 6 shows the annual economic benefit of each electric rate based on contract demand.  Figure 6 shows that "industrial service (A) II / high voltage A / Option II" electric rate with 160 kW contract demand can make the highest annual economic benefit and Figures 7-9 show the scheduling result of the CCHP operation according to this electric rate. In the figure, the yellow boxes represent electric energy supplied by the public power system and the green boxes represent electric energy generated by the CCHP system.     Figure 6 shows that "industrial service (A) II / high voltage A / Option II" electric rate with 160 kW contract demand can make the highest annual economic benefit and Figures 7-9 show the scheduling result of the CCHP operation according to this electric rate. In the figure, the yellow boxes represent electric energy supplied by the public power system and the green boxes represent electric energy generated by the CCHP system.  Figure 6 shows that "industrial service (A) II / high voltage A / Option II" electric rate with 160 kW contract demand can make the highest annual economic benefit and Figures 7-9 show the scheduling result of the CCHP operation according to this electric rate. In the figure, the yellow boxes represent electric energy supplied by the public power system and the green boxes represent electric energy generated by the CCHP system.   According to the schedule, an economic benefit of USD 195 can be made when using CCHP with a public power system for three days. Moreover, economic benefits of more than USD 14,000 annually can be achieved by using CCHP with the public power system.

Conclusions
In this study, we proposed a novel 2-stage STLF model that combines popular STLF models by using a DNN to further expand the domain of applicability. In the first stage, we used XGBoost and RF algorithms to predict day-ahead electric energy consumption. In the second stage, we built a load forecasting model based on DNN by using the forecasted results of XGBoost and RF and other external data as new input variables. To verify the forecasting performance of our proposed model, we performed day-ahead forecasting using actual factory electric energy consumption data and compared its accuracy with several machine learning methods and our previous forecasting models. The comparison showed that our proposed model showed the best prediction performance in terms of CVRMSE and MAPE.
Additionally, to show the applicability of our model, we performed CCHP operation scheduling based on forecasting and economic analysis, decided the best electric rate and contract demand, and showed how much could be saved by the decision. According to the experiment, the electric cost was reduced by 37% annually.