Prediction of Air-Conditioning Energy Consumption in R&D Building Using Multiple Machine Learning Techniques

: With the global increase in demand for energy, energy conservation of research and development buildings has become of primary importance for building owners. Knowledge based on the patterns in energy consumption of previous years could be used to predict the near-future energy usage of buildings, to optimize and facilitate more e ﬀ ective energy consumption. Hence, this research aimed to develop a generic model for predicting energy consumption. Air-conditioning was used to exemplify the generic model for electricity consumption, as it is the process that often consumes the most energy in a public building. The purpose of this paper is to present this model and the related ﬁndings. After causative factors were determined, the methods of linear regression and various machine learning techniques—including the earlier machine learning techniques of support vector machine, random forest, and multilayer perceptron, and the later machine learning techniques of deep neural network, recurrent neural network, long short-term memory, and gated recurrent unit—were applied for prediction. Among them, the prediction of random forest resulted in an R 2 of 88% ahead of the ﬁrst month and 81% ahead of the third month. These experimental results demonstrate that the prediction model is reliable and signiﬁcantly accurate. Building owners could further enrich the model for energy conservation and management. and M.-J.C. and writing-final review, L.-M.C.


Introduction
The ongoing global economic development has increasingly consumed energy resource. Based on statistical Review of World Energy, global primary energy consumption grew rapidly in 2018, and at a rate of 2.9% last year, almost double its 10-year average of 1.5% per year, and the fastest since 2010 [1]. Moreover, according to the Stated Policies Scenario, electricity use has been growing at more than double the pace of overall energy demand, confirming its place at the heart of modern economies [2]. For the sake of sustainable development and mitigating the depletion of energy resources, energy conservation has become critical task during economic development.
The Ministry of Economic Affairs at Taiwan reported that the national electricity consumption in 2018 was about 264.3 billion kWh. The energy consumed by various buildings each year accounted for more than one-third of the national energy ratio, and the proportion grows year by year [3]. Therefore, the owners of public buildings with high electricity consumption [4] in Taiwan are often the main target for promoting public policies for energy conservation. The optimal operation of the buildings is crucial for reducing electricity consumption. Optimizing buildings' electricity consumption will be area roof area, overall height, orientation, glazing area and glazing area distribution) to predicting the heating and cooling loads of a building based on a dataset for building energy performance. The DNN gives a very good prediction and the wall area, relative compactness and roof area have significant effects on heating and cooling loads. Mocanu et al. [13] presented five methods to predict energy consumption in a residential building, including DNN, or, more specifically, Conditional Restricted Boltzmann Machines and Factored Conditional Restricted Boltzmann Machines (FCRBM). The FCRBM is a powerful method which outperformed the other methods. Residential buildings over different time horizons with different time resolutions were discussed. Liu et al. [15] adopted four economic variables (gross domestic product (GDP), population, import and export trade volume, to forecast the primary energy consumption in China. Compared with MLP and SVM, GRU had the lowest average absolute percentage error.
This study is different from the aforementioned studies that generally focused on the electricity consumption of a whole building. In contrast, this study emphasizes exploring a framework to predict the electricity consumption for each room. The model provides a baseline of managerial mechanisms for facilitating effective energy saving. In order to construct a modelling process for a model for predicting the air-conditioning energy consumption of public buildings, the monthly electricity usage data of CTCI were collected to demonstrate the modelling processes of a generic model. The reader should be cautious that the emphasis should be on the modeling processes, not model itself, as well as on the exemplification, not on a CTCI case. The purpose of this paper is to present the research results based on the use of multiple ML techniques and to propose a generic modelling process for predicting electricity consumption. This research used experimental data measured in CTIC to construct a prediction model.
The paper begins with introducing CTIC, followed by the methodology used. Then, the modelling process is illustrated in detail. Next, the collected data are thoroughly analyzed and the accuracy and reliability of the proposed model are demonstrated. Finally, a conclusion is provided, and future research is recommended.

Location
CTIC is located in the Advanced Research Park of Central Taiwan (close to the center of Taiwan at 23 • 56 12.1" N, 120 • 41 53.3" E).

Mission and Main Functions
With strong government support, the Ministry of Economic Affairs expects CTIC to lead technological upgrades and job openings for small and medium enterprises in the region of central Taiwan. CTIC was constructed and is operated by the Industrial Technology Research Institute (ITRI) and opened on 15 September 2014. It covers a land area of 2.49 ha, with a total floor area of 42,000 m 2 , and it can also accommodate 700 people (400 for non-profit organizations and 300 for the information and communications technologies (ICT) industry). CTIC includes R&D space (office, research laboratory, and pilot plant), showroom and communication space (for exhibition, training, conferences, library, and information exchange), and public service areas (restaurant, pantry, and parking facilities); the design scheme and configured spaces are shown in Figure 1. CTIC hopes to transform into a high-level research and development park under the premise of maintaining the quality of urban life, and to combine with Central Taiwan Study and Research resources to promote local economic development. At present, about 87% of spaces have been stationed by various organizations.

Building Features
CTIC is a four-floor steel structure with one underground parking space. It is a low-carbon, energy-saving, sustainable ecological park, and aims to use energy-saving design as a basis for obtaining recognition as a highest-grade diamond-rating intelligent and green building in Taiwan.
There are quantifiable metrics for rating the green building and intelligent building. The rating of green building is also based on the subtropical climate of Taiwan. In reference to the core characteristics of the building in terms of energy consumption, water consumption, waste disposal and environmental protection, a set of assessment systems was developed.
The assessment system consists of nine indicators: (1) biodiversity, (2) base greening, (3) base water conservation, (4) daily energy saving, (5) carbon dioxide reduction, (6) waste reduction, (7) indoor environment, (8) water resources, (9) sewage and waste improvement. This system is the fourth in the world after the United Kingdom, the United States, and Canada, and is the fourth green building evaluation system with scientific quantification.
Assessor assigns different score points for each indicator. Then, the grade of green building badge would be given according to the sum of the score point which the assess gets. They are 5 grades: qualified grade (9-25 points), bronze grade (26-33 points), silver grade (34-41 points), gold grade (12-52 points), and diamond grade (53 points or more). CTIC got 59.74 points and Diamond Grade [16].
On the other hand, intelligent building emphasizes the application of smart technologies, such as networks, monitoring equipment and system integration, that can facilitate automatic sensing, analysis, and response functions as well as consider the convenience and optimal operation on future maintenance and management. Moreover, it has to meet user's requirements for safety, comfort, demand for convenience and efficiency, and achieving the goal of energy saving. Eight indexes are used for evaluating the intelligent building. The indexes are integrated with wiring, information communication, system integration, facility management, safety and disaster prevention, health and comfort, thoughtful convenience, and energy-saving management. Each index score is 100 points. There are five grades, including qualified grade, bronze grade, silver grade, gold grade, and diamond grade; the highest diamond grade requires eight indicators all higher than 80 points. CTIC achieved higher than 80 points for each index and received diamond grade badge.

Building Energy Management System
CTIC explores two energy conservation strategies. The first is based on various spatial components (as shown in Table 1): If a building is a small city, from the city perspective, the more detailed division of space is more beneficial for building management because it is still inconvenient for the occupant's daily life. R&D Buildings emphasize the interaction among different types of space users. It is difficult to manage energy consumption. Different types of spaces have different electricity supply and consumption rates due to differences in their usage, time of use, and associated

Building Features
CTIC is a four-floor steel structure with one underground parking space. It is a low-carbon, energy-saving, sustainable ecological park, and aims to use energy-saving design as a basis for obtaining recognition as a highest-grade diamond-rating intelligent and green building in Taiwan.
There are quantifiable metrics for rating the green building and intelligent building. The rating of green building is also based on the subtropical climate of Taiwan. In reference to the core characteristics of the building in terms of energy consumption, water consumption, waste disposal and environmental protection, a set of assessment systems was developed.
The assessment system consists of nine indicators: (1) biodiversity, (2) base greening, (3) base water conservation, (4) daily energy saving, (5) carbon dioxide reduction, (6) waste reduction, (7) indoor environment, (8) water resources, (9) sewage and waste improvement. This system is the fourth in the world after the United Kingdom, the United States, and Canada, and is the fourth green building evaluation system with scientific quantification.
Assessor assigns different score points for each indicator. Then, the grade of green building badge would be given according to the sum of the score point which the assess gets. They are 5 grades: qualified grade (9-25 points), bronze grade (26-33 points), silver grade (34-41 points), gold grade (12-52 points), and diamond grade (53 points or more). CTIC got 59.74 points and Diamond Grade [16].
On the other hand, intelligent building emphasizes the application of smart technologies, such as networks, monitoring equipment and system integration, that can facilitate automatic sensing, analysis, and response functions as well as consider the convenience and optimal operation on future maintenance and management. Moreover, it has to meet user's requirements for safety, comfort, demand for convenience and efficiency, and achieving the goal of energy saving. Eight indexes are used for evaluating the intelligent building. The indexes are integrated with wiring, information communication, system integration, facility management, safety and disaster prevention, health and comfort, thoughtful convenience, and energy-saving management. Each index score is 100 points. There are five grades, including qualified grade, bronze grade, silver grade, gold grade, and diamond grade; the highest diamond grade requires eight indicators all higher than 80 points. CTIC achieved higher than 80 points for each index and received diamond grade badge.

Building Energy Management System
CTIC explores two energy conservation strategies. The first is based on various spatial components (as shown in Table 1): If a building is a small city, from the city perspective, the more detailed division of space is more beneficial for building management because it is still inconvenient for the occupant's daily life. R&D Buildings emphasize the interaction among different types of space users. It is difficult to manage energy consumption. Different types of spaces have different electricity supply and consumption rates due to differences in their usage, time of use, and associated equipment. The  The second is based on the composition of each facility system (as shown in Table 1): In accordance with different space usages and the construction of each facility system, of which the air-conditioning system is the most complex and electricity-dissipative, the sum of each facility systems' electricity consumption is equal to the power consumption of the entire building.
CTIC integrates the space management of the building information modeling (BIM) model and the smart grid system to construct a complete energy consumption monitoring system which can measure different electricity consumption rates from the lighting system, sockets, power, and air-conditioning devices in each room every hour. In order to provide good indoor environmental quality and to carry out the user-pays principle, detailed information and bills for the electricity consumption of each stationed unit are provided every month. This could be the basis for the continuous improvement of electricity conservation actions and to create an ecofriendly workplace.

Electricity Consumption Data
Under the user-pays principle, and to maintain a comfortable working environment, CTIC sets up electric meters and independent electricity supply circuits for lighting, sockets, power, and air-conditioners in each room, and installs multifunction electric smart meters (as shown in Figure 2) to collect power consumption information. Thus, the energy management system can carry out various statistical analyses to strengthen electricity consumption management and achieve electricity-saving performance. In addition, the price of electricity can be calculated based on the actual electricity consumption of each stationed unit. With the exception of the user-pays principle, the stationed units can also strengthen the electricity-saving measures for continuous electricity-saving actions.  The space can be categorized into public spaces and independent spaces: research offices, laboratories, pilot plants, and conference rooms under the independent space category; lighting, sockets, power, and air-conditioning under the public space category. The electricity consumption statistics of the whole building from 2015 to 2018 are shown in the Figure 3. The electricity consumption rates of 383 rooms were collected across the time span from December 2014 to April 2019. The data consist of four types of facilities: lighting, sockets, power, and air-conditioning. The time spanned across 52 months and, thus, 79,664 monthly records of electricity usage data (383 × 4 × 52 = 79,664) were collected. These consumption data were used for the construction of the prediction model. Although the electricity consumption rates of 383 rooms were collected, the occupants of different projects and incubator normally move out of CTIC after the project ends. Therefore, this study selected 31 rooms with stable occupiers to predict electricity consumption. Table 2 shows the basic information of these 31 selected rooms.

Energy Consumption Factors
This study explored the six major factors affecting building air-conditioning energy consumption described in the relevant literature of the International Agency (IEA) for Energy in Buildings and Communities (EBC) Annex 53: climate, building envelope characteristics, building equipment, indoor environment, user behavior, maintenance mode, and social aspect-related factors [17].
This study explored the impact factors of building electricity consumption and selected 13 potential factors through multivariate statistical analysis. The selected factors were air pressure, temperature, humidity, wind speed, rainfall, sunshine hours, wet bulb temperature, season, month, area, floor, orientation, and the number of days of occupation. Among these factors, air pressure, temperature, humidity, wind speed, and rainfall were taken from the Nantou station of the Central Weather Bureau in Taiwan, and the hours of sunshine were taken from the Taichung station. The 13 factors were used for subsequently screening out electricity consumption factors through the grid search method. The detailed descriptions of each factor are as shown in Table 3. As shown in Figure 4, the heat load zone of the building shell is divided into eight zones. The temperature changes of the external walls of the eight zones were measured and monitored. It was discovered that the measured results of the temperature changes in several zones are the same. Therefore, we combined the same result of the heat load zones into four zones, as shown in the middle of the Figure 4. In the lower right side of the Figure 4, one can see that the three zones marked in red color are high temperature zones with almost the same temperature. We group all three zones into one zone as the high temperature zone.

Research Methodology
This study used LR and multiple MLs, including earlier MLs and recent MLs, namely DLs, to construct an electricity consumption model to predict air-conditioning electricity consumption for the next 3 months.

Linear Regression
Linear regression is one of the most frequently used algorithms to achieve interpolating and extrapolation, the formula of linear regression can be formed as below, In this equation, and represents the p + 1 dimensional vector of coefficients and the set of n error terms, respectively. The ordinary least squares and maximum likelihood are the most commonly used algorithms to find out the coefficient . For maximum likelihood, different likelihood estimates of would be caused by the assumption about the underlying distribution of the error terms. According to the previous shortage, the ordinary least squares and inferential statistics reported which are based on the assumption that ~ 0, ) iid for i = 1 to n, were employed in this study to estimate all linear regression.

Research Methodology
This study used LR and multiple MLs, including earlier MLs and recent MLs, namely DLs, to construct an electricity consumption model to predict air-conditioning electricity consumption for the next 3 months.

Linear Regression
Linear regression is one of the most frequently used algorithms to achieve interpolating and extrapolation, the formula of linear regression can be formed as below, In this equation, β and represents the p + 1 dimensional vector of coefficients and the set of n error terms, respectively. The ordinary least squares and maximum likelihood are the most commonly used algorithms to find out the coefficient β. For maximum likelihood, different likelihood estimates of β would be caused by the assumption about the underlying distribution of the error terms. According to the previous shortage, the ordinary least squares and inferential statistics reported which are based on the assumption that i ∼ N (0, σ 2 ) iid for i = 1 to n, were employed in this study to estimate all linear regression.

Earlier Machine Learning
ML is the use of algorithms to classify collected data or predict training models. New data that are obtained in the future can be predicted through the training model [18].

Random Forest
Breiman (2001) [19] proposed the random forest (RF) algorithm, an ensemble learning method for ML. RF uses bootstrapping to build a large number of regression trees to form a forest and to ensure randomness by randomly selecting variables and sample prediction results. Regarding classification Energies 2020, 13, 1847 9 of 22 and regression, only sensitive parameters (the number of regression trees, the biggest feature m, and maximum depth) need to be determined [20]. A small number of parameters can simplify the problem, and RF has better prediction results than the traditional single-model without significant improvement of computational complexity. Peters et al. (2007) [21] used RF to establish an eco-hydrological environment model, and Naghibi et al. (2015) [22] used RF in conjunction with geographic information systems to map groundwater potential maps. Applying RF to establish potential collapse assessments has been widely used in recent years [11,23]. Building a RF can be divided into three steps, and this structure is shown in Figure 5.
computing resources. It is suitable for models with high computing efficiency. This method is used to determine the best hyper-parameters of the model, resulting in the highest test accuracy. The grid search was applied to find the best model parameters. The number of trees and maximum depth are continuous, so the number of trees varies from 50 to 1000 with spacing 50, and the maximum depth increases from 3 to 23. The biggest feature uses the setting of all inputs or Sqrt. Different combinations of the three parameters were examined, and the optimal combination of the model was determined.
The grid search method is a method for evaluating all grid points in space and finding the best performance. The structure of the grid search method can be divided into two parts: (1) Setting the upper and lower limits of the parameter-setting the range for searching the parameter, and the lower limit (l1) and upper limit (u1) of parameter 1 and the lower limit (l2) and upper limit (u2) of parameter 2. (2) Setting the grid size-setting the number of grids, dividing parameter 1 into equal p1, and the distance between the two grids is ∆p 1 = u 1 l 1 P 1 ; similarly, dividing parameter 2 into equal p2, and the distance between the two grids is ∆p 2 . Calculating the index of all of the grids is the way to determine the best-performing parameter combination. The total number of calculations is (p 1 1) 1). The calculation time will be longer if the searching range is larger or the grids are denser. This study selected several common parameters for a grid search.

Support Vector Machine
In the ML method, a support vector machine (SVM) is a supervised learning method. Vapnik developed SVM in the early 1990s to solve the problem of statistical classification [26]. In 1995, regression analysis also used SVM to solve this problem. SVM has two main characteristics. First, SVM uses structured risk minimization, which not only reduces the target error function but also considers the complexity of the network structure so that the model can reach a certain accuracy [27]. Thus, it will not make the structure so big as to increase computing time. Second, the weight of the structure and mode of the SVM model is transformed into a quadratic planning problem, which can be quickly solved using a standard algorithm [26,28]. SVM has been widely used in various fields in recent years, and has achieved quite good results in classification and forecasting. Four crucial parameters (kernel function, gamma, cost, epsilon, and degree) of SVM have to be determined [29]. These parameters influence the efficiency in handling the non-linear relationships and computational advantages, which are determined by the grid search method. The first step is randomization. The RF is combined by lots of condition inference trees, and to prevent high similarity between trees, RF imports bootstrapping to guarantee that the training data from each tree are not the same. Assuming that the data population has a total of K pieces of information, the model randomly chooses K times single datum, and each datum can be selected repeatedly as shown in the formula (2). It is valid with a sufficiently large K (K => ∞ theoretically). Therefore, the chance that a single datum is not selected can be calculated as follows [19]: Those not-selected data are named as out-of-bag (OOB) data. The mean of OOB errors (OOB) is calculated to determine the parameters of RF and evaluate the sensitivity of inputs. The formula of the OOB error can be written as follows [19]: where N represents the total amount of decision trees; ε i is the number of the misjudgment in i th decision tree; and n L,i is the total amount of OOB error in the i th decision tree. In general, the smallest N that can lead OOB to become stable and convergent might be the best decision tree quantity of RF. Random sampling represents how many variables should be chosen for each decision tree. That is, before constructing a novel decision tree, the model would choose m variables as inputs from the all inputs M. According to the advice from Breiman (2001) [19], m can or can be decided as all inputs or the square root of all inputs (Sqrt): After randomly selecting input variables and a subsample, a regression tree is established in second step for this subsample, and the process is repeated until the number of regression trees is satisfied, that is, until the construction of the RF model is completed.
After constructing the process of the RF, the testing data can be input into all of the decision trees in prediction step. Each decision tree calculates a single predicting value. The final prediction is the mean of all the prediction values of these decision trees.
An RF has three sensitive parameters (number of trees, the biggest feature, and maximum depth) that need to be calibrated for good performance. The basic concept of the grid search method is to evaluate all grid point indicators in the space, and then to determine the best factor and parameter combination after comparison. The grid search method [24,25] is simple, but has considerable computing resources. It is suitable for models with high computing efficiency. This method is used to determine the best hyper-parameters of the model, resulting in the highest test accuracy. The grid search was applied to find the best model parameters. The number of trees and maximum depth are continuous, so the number of trees varies from 50 to 1000 with spacing 50, and the maximum depth increases from 3 to 23. The biggest feature uses the setting of all inputs or Sqrt. Different combinations of the three parameters were examined, and the optimal combination of the model was determined.
The grid search method is a method for evaluating all grid points in space and finding the best performance. The structure of the grid search method can be divided into two parts: (1) Setting the upper and lower limits of the parameter-setting the range for searching the parameter, and the lower limit (l 1 ) and upper limit (u 1 ) of parameter 1 and the lower limit (l 2 ) and upper limit (u 2 ) of parameter 2.
(2) Setting the grid size-setting the number of grids, dividing parameter 1 into equal p 1 , and the distance between the two grids is ∆p 1 = u 1 −l 1 P 1 ; similarly, dividing parameter 2 into equal p 2 , and the distance between the two grids is ∆p 2 . Calculating the index of all of the grids is the way to determine the best-performing parameter combination. The total number of calculations is (p 1 + 1) × (p 2 + 1). The calculation time will be longer if the searching range is larger or the grids are denser. This study selected several common parameters for a grid search.

Support Vector Machine
In the ML method, a support vector machine (SVM) is a supervised learning method. Vapnik developed SVM in the early 1990s to solve the problem of statistical classification [26]. In 1995, regression analysis also used SVM to solve this problem. SVM has two main characteristics. First, SVM uses structured risk minimization, which not only reduces the target error function but also considers the complexity of the network structure so that the model can reach a certain accuracy [27]. Thus, it will not make the structure so big as to increase computing time. Second, the weight of the structure and mode of the SVM model is transformed into a quadratic planning problem, which can be quickly solved using a standard algorithm [26,28]. SVM has been widely used in various fields in recent years, and has achieved quite good results in classification and forecasting. Four crucial parameters (kernel function, gamma, cost, epsilon, and degree) of SVM have to be determined [29]. These parameters influence the efficiency in handling the non-linear relationships and computational advantages, which are determined by the grid search method.

Multilayer Perceptron
The multilayer perceptron (MLP), proposed by Rumelhart et al. in 1986 [30], is a forward transfer-like neural network consisting of three layers of structure (input layer, hidden layer, and output layer), and uses the technology of back propagation to achieve supervised learning of model learning. The MLP consists of multiple node layers, each connected to the next layer. In addition to the input nodes, each node is a neuron with a nonlinear activation function [31]. Since MLP is a common neural network and has already been applied in research, further details of MLP can be found in related references.

Deep Neural Network
The deep neural network (DNN) was proposed by Hinton in 2006 [32] to initialize the parameters using the restricted Boltzmann machine (RBM) to successfully solve the problem of backward transfer optimization and to make DL return to the front of the scene. However, as long as appropriate activation functions and sufficient training data are given, the benefits of the RBM method are not significant. Therefore, although the currently used DNN and MLP of the 1980s are similar in nature, they are still slightly different. In the 1980s, the network did not usually exceed three layers; however, the current network layer is often deeper than three layers. In the past, the sigmoid function was more commonly used as the activation function.
Because it is hard to obtain good results from the deeper network layers, this has been addressed through the use of rectified linear unit (Relu) in recent years [33].
In networks with deep training layers, the effect of Relu is much better than the sigmoid function. In addition, the training method is usually based on the stochastic gradient descent (Sgd) method [34]. When determining the best combination of weights in the network, the smaller the learning target, the better the optimized process. However, there are currently some new training methods. For example, the Adam algorithm can reduce the updating times of parameters during training and speed up the network to complete training earlier. The dropout algorithm randomly discards some neurons during training. Thus, the network performs better when it encounters data that it has never seen. Therefore, these methods can deepen the neural network to allow for the deep network. Consequently, it achieves the same task using less training data than the shallow network.

Recurrent Neural Network
The original prototype of the recurrent neural network (RNN) was proposed by Jeffrey L. Elman in 1990 [35], and it adds a recursive item into the neural network and takes the feedback of output values from hidden or output layer neurons as the message of the next input item. It memorizes the past information in the network through the recursive item so that the time information can be learned. The input of the RNN usually contains time series data, and the output value at the current time t is related to the output value of the previous time t − 1. The RNN also memorizes the information of the previous output value and stores the sequence data in a hidden layer. Each stage in the chain structure of the hidden layer can memorize the output value data of the previous stage and expand it. Time series data are inputted into x t , and at each time period in which data for x t are calculated in h t in the i-th stage, y t is outputted and a parameter w is given for the next stage. The prediction of building air-conditioning energy consumption is a time series problem. Thus, the RNN is suitable for forecasting the building air-conditioning energy consumption. The parameters, including hidden layers, activation, optimizer, batch, dropout, epoch and loss, is determined by grid search method.

Long Short-Term Memory
The long short-term memory (LSTM) was published by S. Hochreiter et al. in 1997 [36] to solve the problem of the traditional RNN, which is suitable for processing time series data. Even if the intermediate timing has a long interval, it avoids the problem of long-term dependence in design and has a good effect on identification. The difference from the traditional RNN lies in the internal chain structure of the hidden layer. The LSTM changes the neurons in the hidden layer to a set of recurrently connected subnets as memory cells. Each block contains self-connected neurons and three gates-input, output, and forget gates. Selective memory is performed by the input gate on the input value of this node, the output gate is used to determine whether the message is regarded as the output of this node, and the forget gate is to selectively forget the input value of the previous node [36]. The LSTM is one of RNN, which is suitable for time series data. The parameters, as with RNN, are determined by grid search method.

Gated Recurrent Unit
The gated recurrent unit (GRU) is also a variant of RNN. It proposed, by Cho et al. in 2014, to solve the problems of gradient disappearance and explosion similarly LSTM [37]. Its main structure is similar to LSTM in that it changes the neurons into memory blocks, and each block also contains self-connected neurons and gates. However, the GRU changes three gates into two by combining the input and forgotten gates into one update gate; thus, it does not have to decide if it forgets or inputs messages, and makes the decision at the same time. The other gate, the reset gate, determines the way to merge new input with previous messages. GRU also merges cell state and hidden state. It has fewer parameters than LSTM but still has the same effect; however, the calculation time is shorter than LSTM and it is easier to calculate [37]. The parameters of GRU is determined by grid search method. This study adopted the ML and DL techniques to construct model for predicting the building air-conditioning energy consumption using the Keras and Scikit-learning libraries in Python.

Model Constructing and Data Processing
The purpose of this study was to establish a general model for predicting air-conditioning electricity consumption and to provide a reference for regulating electricity consumption by collecting data from CTIC. This research used the information to operate data pre-processing including two procedures, data addendum and removing the extreme error. Data addendum demands that the missing value of the electricity consumption impact factor should be generated by interpolating. Besides, an R&D building often has different project-base of occupants with different demands in terms of power resources. Consequently, it results in different power consumption rates. All the data are automatically recorded through electricity meters. Those recorded extreme/abnormal data on electricity consumption near zero were removed from the training data set.
After completing data pre-processing, the aforementioned eight methods were used to establish a model, and then the grid search method was applied to find the optimal input factor and model parameters. The grid search method is used to determine the best hyperparameters of the model, resulting in the highest test accuracy, for example, the gamma in SVM, the number of trees in RF, and the number of hidden layers in MLP. As shown in Table 4, after cross-matching with the candidate combinations, the operation produces all of the results, and the evaluation index is then used to explore the goals and to pick the optimal model. The flowchart of the proposed method is shown in Figure 6. As shown in Figure 6, the framework of this research is demonstrated in the flowchart.
Energies 2020, 13, x FOR PEER REVIEW 13 of 22 models, the root mean square error (RMSE) [38], mean absolute error (MAE) [39], determination coefficient (R 2 ) [38], and coefficient of efficiency (CE) [39] were applied to this research, which indicates the discrepancy between observed and forecasted air-conditioning electricity consumption. RMSE and MAE represent the errors between two sets of data; meanwhile, R 2 , and CE represent the consistency between the observed and predicted air-conditioning electricity, and the greater the consistency, the better the results.  (2 −7 , 2 7 , 2) (2 −7 , 2 7 , 2) (2 −7 , 2 7 , 2) (2, 3, 1)   When constructing neural network (NN)-based models, collected data are usually partitioned into two parts-training and testing. The training materials are used for adjusting the parameters of the models, and the testing materials are then used to evaluate the model performance. This study analyzed the rooms in the building and partitioned training and testing according to different years. However, different assignments of training and testing may yield different results, and sometimes lead to different conclusions [29].
In terms of training and testing data, a model will be more accurate if the neural network uses more training data. Thus, this study referred to the data of 29 rooms from 2014 to 2017 (37 months) for training, and for testing, the data of 29 rooms from 2018 to 2019 (16 months). The remaining Room B318 and Room B114 were tested between 2014 and 2019. The ratio of training data to testing data is about 2 to 1 (a total of 1265 and 560). To further evaluate the prediction performance of the NN-based models, the root mean square error (RMSE) [38], mean absolute error (MAE) [39], determination coefficient (R 2 ) [38], and coefficient of efficiency (CE) [39] were applied to this research, which indicates the discrepancy between observed and forecasted air-conditioning electricity consumption. RMSE and MAE represent the errors between two sets of data; meanwhile, R 2 , and CE represent the consistency between the observed and predicted air-conditioning electricity, and the greater the consistency, the better the results.

Factor Screening and Parameter Calibration
The factors and model parameters preferred by the grid search method are shown in Tables 5 and 6. In terms of the model factors, the input factors of different models are slightly different, and most of the models perform the best with the combination of factors presented in Table 6. Therefore, this set of parameters was selected for subsequent model parameter calibration and future forecasting. The screened effective factors include the average temperature of the month, antecedent mean temperature, average humidity, air-conditioning energy consumption of the air conditioner, seasonal factors related to the time, and the area related to space, floor, and oriented factor. The changes in temperature and humidity can directly affect air-conditioning electricity consumption, and can also be deduced from seasonal factors. The final conditions of each room, such as area, floor, and orientation, are also effective factors affecting electricity consumption. Adding them to the model can increase its accuracy. Note: Monthly mean temperature and humidity are average values from 08:00 to 18:00, air-conditioning electricity consumption is the total amount per month, and the lag length t, t − 1, and t − 2 indicate the current time is t, one month before the current time is t − 1, and two months before the current time is t -2. In terms of model parameters, the parameters selected by the SVM model perform better with the radial basis function (RBF) as the kernel function rather than the other three kernel functions (linear (LN), polynomial (PL), and sigmoid (SIG)). The RF model is stable at 350 trees, and obtains the best result if using the number of inputs as the maximum feature quantity and with a maximum depth of 22. The MLP model requires a large number of single-layer neurons and uses a limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm (Lbfgs) as an optimizer to get the best results. The number of hidden layers used in DL models such as DNN, RNN, LSTM, and GRU is greater than 2, which shows that the multiple hidden layers in DL models will be better than a single hidden layer. A network with recursive characteristics (RNN, LSTM, and GRU) also uses fewer layers than DNN, which means that the recursive network can use fewer neurons to store data. The remaining parameters are consistent with Rmsprop as the optimizer, Relu as the activation function, and MSLE as the loss, and no dropout layer is added.

Model Results and Comparison
After screening factors in this research, the prediction model of air-conditioning electricity consumption was constructed by six factors in eight models. Rigorous evaluation indicators were used to evaluate the performance of the model, and the air-conditioning electricity consumption predicted by the model output and the actual air-conditioning electricity consumption were analyzed in detail. This study firstly made a comparison among the LR, ML, and DL results for the following month to determine the differences in predictions of air-conditioning electricity consumption. The evaluation indicators for the prediction calculation of air-conditioning electricity consumption results for the following month are listed in Table 7. The table shows that the RMSE of the model in training and testing is lower than 404.7 kWh, and the R 2 value is higher than 0.57. RF has the best performance, followed by GRU, RNN, DNN, LSTM, SVM, and MLP. The LR mode, however, has the worst performance. The RF test has the lowest RMSE and MAE values, and CE, and R 2 indicators have the best performance. The value of CE can reach 0.75, while R 2 can reach 0.88, which indicates that the model has 88% interpretive ability. DL falls from the second to fourth place. This may be because there is too little collected training data (1265 pieces) for training such a complex DL model and, hence, the training level is not strong enough to represent the advantages of DL. The prediction results of air-conditioning electricity consumption during the next month are shown in Figure 7. All models can capture the changes in electricity consumption during the next month. The predicted values are very close to the observed values, especially when the values are below 3000 kWh (highlighted in Figure 7b). However, as shown in Figure 7c, some models cannot capture the peak value of the Room B114 test plant (about 10,000 kWh). The RF and DNN models can accurately predict the peak value of the Room B114 test plant, followed by the LR, SVM, and MLP modes. In addition, some models (MLP, RNN, and LSTM) report negative values at the minimum value and, thus, the vertical axis does not start from zero, and the negative values with little error are still acceptable.
Based on the results above, RF shows the best performance without negative values. It is more accurate than the other models regarding peak and small values. The modeling process of RF is illustrated in Figure 8. Therefore, the RF model was used for predicting the electricity consumption during the following 6 months. Its parameters and inputs are shown in Figure 8. Energies 2020, 13, x FOR PEER REVIEW 16 of 22 Figure 7. Comparison of the predicted electricity consumption of (a) all rooms, (b) rooms below 3000 kWh, and (c) rooms above 3000 kWh with 1-month-ahead predictions of eight models in the testing phase.

Prediction
From the previous chapter, it can be seen that RF is the best model for predicting the electricity consumption of air-conditioning. Therefore, the RF model was used to predict the electricity consumption in the following 6 months as a basis for energy conservation. Table 8 presents the evaluation indicators of the RF model forecast results. It shows that the forecasting time of the error indicators (RMSE and MAE) increases. As the error gradually grows, the trend indicators CE progressively worsen. The analytical capability, R 2 , also deteriorates. Despite this, the model predicts that the RMSE of the next 6 months will be only 363.3 kWh (47.4% of the observed mean electricity consumption of air-conditioning), and CE will start to become negative after the forecast of the next 3 months, but R 2 still has a level of 0.81.
However, it still maintains a certain accuracy. Furthermore, the CE value of RF is predicted to be 0.52 in the following 2 months, but the CE value of the following 3 months is predicted to be −1.53, and the gap is too large. After reviewing the data, it was found that the room B445 has a small amount of electricity consumption with a small fluctuation, which results in the calculation of the averaged CE value becoming negative. If room B445 is removed, the calculated value of CE becomes 0.85, 0.74, 0.60, 0.59, 0.43, and 0.09 for 1-to 6-month-ahead predictions, respectively. This indicates that the performance of the remaining rooms is good, and the calculation error (RMSE and MAE) after removing room B445 is small, and R 2 is still at the level of 0.81. Figure 9 shows the air-conditioning electricity consumption in the following 6 months as predicted by the RF mode. The longer the prediction time, the worse the prediction results, but the trend can still be predicted. A room with lower electricity consumption performs better. Much lower electricity consumption is predicted for Room B114 with a large electricity consumption, and the longer the predicted time, the greater the predicted delay time. The results demonstrate that the RF model can provide quite reliable predictive information for the following 3 months. Figure 10 summarizes step-by-step procedures to obtain the reliability of R 2 for the next 3-month prediction-0.88, 0.84, and 0.81-and provides better information on how to save energy and reduce carbon for relevant decision-makers.

Prediction
From the previous chapter, it can be seen that RF is the best model for predicting the electricity consumption of air-conditioning. Therefore, the RF model was used to predict the electricity consumption in the following 6 months as a basis for energy conservation. Table 8 presents the evaluation indicators of the RF model forecast results. It shows that the forecasting time of the error indicators (RMSE and MAE) increases. As the error gradually grows, the trend indicators CE progressively worsen. The analytical capability, R 2 , also deteriorates. Despite this, the model predicts that the RMSE of the next 6 months will be only 363.3 kWh (47.4% of the observed mean electricity consumption of air-conditioning), and CE will start to become negative after the forecast of the next 3 months, but R 2 still has a level of 0.81. However, it still maintains a certain accuracy. Furthermore, the CE value of RF is predicted to be 0.52 in the following 2 months, but the CE value of the following 3 months is predicted to be −1.53, and the gap is too large. After reviewing the data, it was found that the room B445 has a small amount of electricity consumption with a small fluctuation, which results in the calculation of the averaged CE value becoming negative. If room B445 is removed, the calculated value of CE becomes 0.85, 0.74, 0.60, 0.59, 0.43, and 0.09 for 1-to 6-month-ahead predictions, respectively. This indicates that the performance of the remaining rooms is good, and the calculation error (RMSE and MAE) after removing room B445 is small, and R 2 is still at the level of 0.81. Figure 9 shows the air-conditioning electricity consumption in the following 6 months as predicted by the RF mode. The longer the prediction time, the worse the prediction results, but the trend can still be predicted. A room with lower electricity consumption performs better. Much lower electricity consumption is predicted for Room B114 with a large electricity consumption, and the longer the predicted time, the greater the predicted delay time. The results demonstrate that the RF model can provide quite reliable predictive information for the following 3 months. Figure 10 summarizes step-by-step procedures to obtain the reliability of R 2 for the next 3-month prediction-0.88, 0.84, and 0.81-and provides better information on how to save energy and reduce carbon for relevant decision-makers.

Conclusions
In order to construct a generic modelling process for predicting the air-conditioning energy consumption of public buildings, nearly 79,664 monthly records of electricity usage data were collected to construct the generic model. Meanwhile, the grid search method was employed to calibrate the model parameters. Consequently, six valid factors were selected from the 13 factors. They were temperature, humidity, season, area, floor, and orientation. These factors significantly affected the change in the electricity consumption of air-conditioners. Based on these parameters and factors, the methods of linear regression and the ML techniques of various ML and DL approaches were applied and compared. These methods included LR, SVM, RF, MLP, DNN, RNN, LSTM and GRU.
The overall results show that the RF produces the best prediction of electricity consumption with the smallest RMSE and highest CE. Its R 2 for predictions in the first month was 0.88. Moreover, the prediction for the third month was 0.81. Thus, the reliability and accuracy of the constructed model in predicting the air-conditioning electricity consumption for the next 3 months were demonstrated. Nevertheless, the prediction model proposed in this research could be limited to the same type of building, i.e., public R&D buildings of CTIC in Taiwan. To avoid potential pitfalls, the model may not be applicable when different spaces in different buildings are considered. To enhance the model, the continuous collection of electricity consumption data from different types of buildings is necessary in future research. Thus, a more generic and robust model could be constructed to provide much more reliable information to aid decision making in energy conservation and management.

Conflicts of Interest:
The authors declare no conflict of interest.

Order Abbreviation
Full Form Figure 10. Summary of procedures for 3-month-ahead predictions.

Conclusions
In order to construct a generic modelling process for predicting the air-conditioning energy consumption of public buildings, nearly 79,664 monthly records of electricity usage data were collected to construct the generic model. Meanwhile, the grid search method was employed to calibrate the model parameters. Consequently, six valid factors were selected from the 13 factors. They were temperature, humidity, season, area, floor, and orientation. These factors significantly affected the change in the electricity consumption of air-conditioners. Based on these parameters and factors, the methods of linear regression and the ML techniques of various ML and DL approaches were applied and compared. These methods included LR, SVM, RF, MLP, DNN, RNN, LSTM and GRU.
The overall results show that the RF produces the best prediction of electricity consumption with the smallest RMSE and highest CE. Its R 2 for predictions in the first month was 0.88. Moreover, the prediction for the third month was 0.81. Thus, the reliability and accuracy of the constructed model in predicting the air-conditioning electricity consumption for the next 3 months were demonstrated. Nevertheless, the prediction model proposed in this research could be limited to the same type of building, i.e., public R&D buildings of CTIC in Taiwan. To avoid potential pitfalls, the model may not be applicable when different spaces in different buildings are considered. To enhance the model, the continuous collection of electricity consumption data from different types of buildings is necessary in future research. Thus, a more generic and robust model could be constructed to provide much more reliable information to aid decision making in energy conservation and management.