Demand Forecasting for a Mixed-Use Building Using Agent-Schedule Information with a Data-Driven Model

: There is great interest in data-driven modelling for the forecasting of building energy consumption while using machine learning (ML) modelling. However, little research considers classiﬁcation-based ML models. This paper compares the regression and classiﬁcation ML models for daily electricity and thermal load modelling in a large, mixed-use, university building. The independent feature variables of the model include outdoor temperature, historical energy consumption data sets, and several types of ‘agent schedules’ that provide proxy information that is based on broad classes of activity undertaken by the building’s inhabitants. The case study compares four di ﬀ erent ML models testing three di ﬀ erent feature sets with a genetic algorithm (GA) used to optimize the feature sets for those ML models without an embedded feature selection process. The results show that the regression models perform signiﬁcantly better than classiﬁcation models for the prediction of electricity demand and slightly better for the prediction of heat demand. The GA feature selection improves the performance of all models and demonstrates that historical heat demand, temperature, and the ‘agent schedules’, which derive from large occupancy ﬂuctuations in the building, are the main factors inﬂuencing the heat demand prediction. For electricity demand prediction, feature selection picks almost all ‘agent schedule’ features that are available and the historical electricity demand. Historical heat demand is not picked as a feature for electricity demand prediction by the GA feature selection and vice versa. However, the exclusion of historical heat / electricity demand from the selected features signiﬁcantly reduces the performance of the demand prediction.


Introduction
Worldwide domestic energy consumption has doubled since 1982 [1]. In developed countries, the energy that is consumed in buildings represents over 40% of total energy use [2]. In developing countries, buildings have already become the largest source of energy consumption and CO 2 emissions, which are predicted to increase in the future [3]. Accurate daily demand prediction is important for understanding future use of energy and it can be used to reduce building energy costs and emissions. For example, the building operator can choose to preheat or precool in different seasons according to the prediction results. While accurate and reliable demand predictions can improve building energy performance, the predictions are a complex problem that strongly depends on the specific building. Many factors affect heat and electricity consumption, directly or indirectly, for example, outdoor temperature, equipment efficiency, and occupancy [4]. For thermal loads, increasing numbers of new and refurbished commercial buildings use building management systems (BMS) to regulate the heat consumption, typically using measurements of indoor and outside temperature difference and assumptions regarding the thermal efficiency of the building itself [5]. For electrical load, occupancy is seen as a major driver with temperature contributing to a lesser extent [6]. Numerous load modelling approaches are based on these factors, which fall into two broad categories: physical and data-driven methods [4], with their respective advantages being documented in published studies.
Physical models capture interactions between the building efficiency, lighting, heating, ventilation, occupancy, and air conditioning (HVAC) system and weather to predict consumption. It uses physical equations to describe different factors and calculate demand and takes a wide range of mechanisms into account, including conduction, ventilation, and so on [7]. A range of software tools integrating these complex physical principles have been developed, e.g., TRNSYS, EnergyPlus, and ESP-r. Nan et al. [8] applied ESP-r to model demand in a modern domestic dwelling that is based on the weather, building information, and some other social components. Muhammad et al. [9] demonstrated the potential for energy savings on electricity and heating for households in Thailand through modelling in EnergyPlus. Lizana et al. [10] developed a low carbon heating technology for flexible energy building modelling in TRNSYS.
The main limitation of the physical method is that the model requires a deep level of detail regarding building geometry, material properties, and heating and ventilation systems to calculate reliable results. Unfortunately, this information might not always be available or reliable, particularly for older buildings that have been refurbished one or more times [11]. Data-driven tools, by contrast, have the power to generate models from recorded or proxy data, and these have been used in building simulations and energy performance predictions. Multiple regression and Artificial Neural Networks (ANN) represent two commonly used techniques [12].
Regression models are widely used due to the interpretability and ease of use of model parameters. For heat demand modelling, Rosa et al. [13] proposed a simple dynamic degree-day model for predicting the heating/cooling demand for residential buildings. Catalina et al. [14] built a multiple regression model that was based on the building global heat loss coefficient, the south-facing equivalent surface, and the temperature difference to determine the heating demand. Jaffal et al. [15] utilized an alternative evaluation regression model to model the UK annual heat demand according to dynamic simulation results. Regression models have also been used for electricity modelling. Newsham and Birt [16] put special emphasis on the influence of occupancy, which can increase the model accuracy. Fan et al. [17] used a multiple linear regression model along with eight other models to predict the electricity load of the tallest commercial building in Hong-Kong. Renaldi et al. [18] developed a synthetic linear heat demand model that was based on an "energy signature method". Irrespective of the approach, they can only determine one specific potential relationship between the selected variables. For example, the relationship between outdoor temperature and electricity demand varies in different seasons, which cannot use the same model to express both winter and summer scenarios. Commonly, researchers tend to do classification processing and manually build a series of seasonal or calendar models to reduce the uncertainty [19]. The method works well, although it can be time and computationally expensive.
Machine learning (ML) algorithms, such as support vector machine (SVM), have become new tools for researchers in demand modelling. Apart from the fast calculation speed, the main advantages of these methods over traditional ones are the capability of discovering patterns and automatically capturing the non-numerical information from a large number of datasets. Samuel et al. [20] used four ML techniques to forecast the heat demand in a district heating system with the inputs of outdoor temperature, historical heat loads, and time factor variables; SVM performed the best among all ML algorithms used. Jang et al. [21] optimized a ML model for predicting the thermal energy consumption of buildings by extracting major variables through feature selection. In modelling building electricity use, Nizami and Garni [22] used a simple feed-forward ANN to relate the electrical demand to the number of occupants and weather data. Similarly, Kampelis et al. [23] proposed an ANN power prediction for day-ahead energy management at the building and district levels. Wong et al. [24] used an ANN to predict energy consumption for office buildings with day-lighting controls in subtropical climates; the outputs of the model include daily electricity usage for cooling, heating, and lighting. Some data-driven models employ lagged variables, e.g., actual historic consumption in previous time steps, within the demand prediction, leading to significantly improved results [25,26].
Which variables or features to include in the models are usually chosen by expert knowledge or previous experience and not through a formalized procedure that is based on the model characteristics, leading to a gap between the prediction and actual value [27]. Feature selection processes can be used to find the most important features from a 'feature pool' in a formalized and reproducible way. Feature selection approaches are categorized as filter, embedded, and wrapper methods [28]. Filter methods rank features based on statistical properties that ignore the processing of different features by the algorithm itself, such that there is no way to ensure the accuracy of feature selection [28]. The embedded methods incorporate feature selection into the model training process [29], and they are typically used in regression models, e.g., step-wise regression (LMSR) or classification trees. The wrapper method finds the best feature sets according to the model performance with the selection results varying with the algorithm. Wrapper methods are widely used in feature selection [29] and they view it as an optimization procedure; the methods applied include Particle Swarm Optimization and the well-known Genetic Algorithms (GA) [30,31].
While it is clear that many papers use machine learning technologies with different features as inputs to predict heat and electricity demand, there are still gaps in existing studies that need to be overcome. Most of the work in the literature has tended to regard consumption as a continuous numerical variable utilizing regression models for prediction; however, each daily consumption level can be conceivably described by a discrete class in its own right, lending itself to the application of classification models. Second, the use of classifiers for specific types of day (e.g., weekday/weekend) is well established in demand modelling, as individuals tend to follow established routines and groups of individuals tend to behave similarly. Universities are unusual entities with a wide range of discrete cohorts of users and a complex pattern of activities that take place throughout the year. Given that there are distinct cohorts of occupants, this work regards the agent behaviour as activities that are aggregated across cohorts, which are considered to be determined by schedule. Open source information, such as semester and holiday schedules and timetabling, acts as a proxy for people's behaviour. The hypothesis is that classification of different types of days will reflect different activities with associated energy consumption. The term 'agent schedules' is employed to describe these behaviours, but it should be noted that this is explicitly not a form of agent-based modelling (of which there are many examples in the literature) and avoids the complex data collection and modelling process required. This paper examines the scope for different types of data-driven prediction of daily electricity and heat energy consumption in a complex, large, mixed-use university building. A GA feature selection approach is used to find the best feature set for both heat and electricity demand prediction. The main contributions are (1) a detailed comparison of the performance of sample classification and regression ML algorithms in modelling daily heat and demand profiles, (2) the examination of the value of feature selection in enhancing model performance, and (3) an examination of the value of daily historical measured energy data.
The paper is laid out, as follows. Section 2 describes the methodology covering the data requirements and ML techniques applied. Section 3 presents the case study, while the final section discusses the findings and concludes.

Methodology
This paper uses a range of machine learning models in conjunction with GA feature selections and compares their accuracy in modelling daily electrical and heating demand within a university building. This section describes the data that were considered in the model and provides a technical overview of the four ML methods as well as the GA feature selection approach. The difference between indoor and outdoor temperatures is the most important factor affecting heat demand [12][13][14][15][16][17][18][19][20]. Under the premise that the indoor temperature is set at a fixed value, the outdoor temperature is the determining factor for the heat load. According to [20], solar radiation, wind speed, and some other factors will also affect the heat consumption, but their impact is more limited when compared with the temperature. In addition to heat load, different outdoor temperatures may determine the activation of electric heating or cooling equipment, thus affecting the power consumption. Therefore, the outdoor temperature is the first independent variable considered in the thermal and electric energy consumption model.

Agent Schedule and Its Modelling
The behaviour of the occupants of a building is another important factor that decides the energy consumption, especially electricity consumption [26]. While there will be a range of people using a university office building who have different individual working patterns (Appendix A, Table A1), the operating schedule of the organisation largely determines their aggregate behaviour, and this might affect thermal and electrical loads.
The example used in this paper is a mixed-use building within a university with a wide range of occupants: academic staff, postgraduate and postdoctoral researchers, administrative and support staff and undergraduate students, and potentially members of the public. Their individual schedules are strongly determined by their 'job description', but there is likely to be considerable similarity within the categories. For example, academic, research, and other staff will tend to be resident in the building during typical weekday working hours, and absent during weekends and well-defined university-closure periods; this means that the electrical load on weekdays is higher than weekends. Undergraduate students will use individual buildings in a very different way, typically only being present for specific scheduled academic activities, use of computing infrastructure that is associated with specific exercises, and perhaps use of study space. As such, their influence on the energy demand is driven much more by the pattern of the academic year, weekly scheduled classes, and assessment.
In addition, although the internal temperature of modern buildings is generally set by the BMS as a fixed target value, people can still affect the heat load by adjusting the settings of heating equipment, opening windows, etc. Therefore, occupancy can be important in describing energy consumption. This paper proposes a detailed method for the modelling of 'agent schedules'. The method considers the differences between individual weekdays and holidays. For example, Mondays during the Easter break and semesters need to be differently modelled, as during 'vacation' periods taught students will be off-campus, but staff will not (Appendix A Tables A2 and A3).
Because agent behaviour is non-numerical information, it needs to be translated into a form that is suitable for the ML algorithms. As many ML algorithms cannot directly operate on labelled data, they require all input variables and output variables to be numeric [32]. There are two methods for converting categorical data to numerical data: Integer encoding and 'one-hot' encoding [33]. Integer coding has a drawback, in that the natural ordered relationship between them might mean that ML algorithms interpret this directly, e.g., if Friday is labelled as 5 and Saturday as 6, it does not mean Saturday is 'larger'. One-hot encoding is widely used and compares each level of the categorical variable to a fixed reference level in order to solve this problem. It transforms a single variable with n observations and d distinct values, to d binary variables with n observations each. Each observed value denotes the existence (1) or nonexistence (0) of a binary variable, e.g., Friday is (0,0,0,0,1,0,0).

Electricity/Heat Demand
A minimum requirement in data-driven modelling is for actual demand data on which to train the models and evaluate performance. However, in cases where there are distinct types of energy demand, it might be that one type is a predictor of the other, e.g., yesterday's heat demand as a predictor of today's electricity or heat demand. For heat demand prediction, the physical meaning of such behaviour is that the whole building acts as a virtual heat storage, with indoor heat accumulating from historical heat consumption. Offering lagged historical data might be useful in electricity prediction as well, since the electricity demand may be higher when the outdoor temperature is low because of the electrical-heating equipment, and this contributes to maintaining the indoor temperature. The historical heat demand and historical electricity demand are labelled, respectively, as 'HHD' and 'HED'.

Machine-Learning Models
ML can be classified into three principal groups according to the tasks of learning: supervised learning, unsupervised learning, and reinforcement learning [34]. Supervised learning sets the data into a training group and a test group, and it uses the training set to learn the general 'rules' mapping inputs to outputs. This involves classification and regression, which is the method traditionally used in energy modelling. Here, four different supervised learning algorithms have been selected as representative of ML algorithms to model daily energy consumption: support vector regression (SVR), linear model stepwise regression (LMSR), distance weighted K-nearest neighbours (KNN) and naive bayes (NB). SVR and LMSR are regression models and they have been used in energy modelling [35,36]. NB and KNN are classification algorithms and they have been used for water demand forecasting [37,38], however few papers discuss their application in energy consumption. This paper will compare the regression and classification models by evaluating the results from these four algorithms.
The following notation applies to each algorithm. Figure 1 shows a schematic of the data structure and indexing. The data for model input variable (predictors) X, actual output y, and model output f(X) comprise time series data with a maximum of m periods, with individual periods that are represented by index i. The set of input variables denoted X comprise a maximum of n features (which each represent an individual variable type), with individual features indexed by r.
Energies 2020, 13, x FOR PEER REVIEW 5 of 20 predictor of today's electricity or heat demand. For heat demand prediction, the physical meaning of such behaviour is that the whole building acts as a virtual heat storage, with indoor heat accumulating from historical heat consumption. Offering lagged historical data might be useful in electricity prediction as well, since the electricity demand may be higher when the outdoor temperature is low because of the electrical-heating equipment, and this contributes to maintaining the indoor temperature. The historical heat demand and historical electricity demand are labelled, respectively, as 'HHD' and 'HED'.

Machine-Learning Models
ML can be classified into three principal groups according to the tasks of learning: supervised learning, unsupervised learning, and reinforcement learning [34]. Supervised learning sets the data into a training group and a test group, and it uses the training set to learn the general 'rules' mapping inputs to outputs. This involves classification and regression, which is the method traditionally used in energy modelling. Here, four different supervised learning algorithms have been selected as representative of ML algorithms to model daily energy consumption: support vector regression (SVR), linear model stepwise regression (LMSR), distance weighted K-nearest neighbours (KNN) and naive bayes (NB). SVR and LMSR are regression models and they have been used in energy modelling [35,36]. NB and KNN are classification algorithms and they have been used for water demand forecasting [37,38], however few papers discuss their application in energy consumption. This paper will compare the regression and classification models by evaluating the results from these four algorithms.
The following notation applies to each algorithm. Figure 1 shows a schematic of the data structure and indexing. The data for model input variable (predictors) X, actual output y, and model output f(X) comprise time series data with a maximum of m periods, with individual periods that are represented by index i. The set of input variables denoted X comprise a maximum of n features (which each represent an individual variable type), with individual features indexed by r.
The four different data-driven models are each separately applied for thermal and electrical demand prediction, i.e., four models for heat demand and four models for electrical demand. Figure 2 gives the algorithm workflow, which indicates the full algorithms that are applied in this paper.   The four different data-driven models are each separately applied for thermal and electrical demand prediction, i.e., four models for heat demand and four models for electrical demand. Figure 2 gives the algorithm workflow, which indicates the full algorithms that are applied in this paper. predictor of today's electricity or heat demand. For heat demand prediction, the physical meaning of such behaviour is that the whole building acts as a virtual heat storage, with indoor heat accumulating from historical heat consumption. Offering lagged historical data might be useful in electricity prediction as well, since the electricity demand may be higher when the outdoor temperature is low because of the electrical-heating equipment, and this contributes to maintaining the indoor temperature. The historical heat demand and historical electricity demand are labelled, respectively, as 'HHD' and 'HED'.

Machine-Learning Models
ML can be classified into three principal groups according to the tasks of learning: supervised learning, unsupervised learning, and reinforcement learning [34]. Supervised learning sets the data into a training group and a test group, and it uses the training set to learn the general 'rules' mapping inputs to outputs. This involves classification and regression, which is the method traditionally used in energy modelling. Here, four different supervised learning algorithms have been selected as representative of ML algorithms to model daily energy consumption: support vector regression (SVR), linear model stepwise regression (LMSR), distance weighted K-nearest neighbours (KNN) and naive bayes (NB). SVR and LMSR are regression models and they have been used in energy modelling [35,36]. NB and KNN are classification algorithms and they have been used for water demand forecasting [37,38], however few papers discuss their application in energy consumption. This paper will compare the regression and classification models by evaluating the results from these four algorithms.
The following notation applies to each algorithm. Figure 1 shows a schematic of the data structure and indexing. The data for model input variable (predictors) X, actual output y, and model output f(X) comprise time series data with a maximum of m periods, with individual periods that are represented by index i. The set of input variables denoted X comprise a maximum of n features (which each represent an individual variable type), with individual features indexed by r.
The four different data-driven models are each separately applied for thermal and electrical demand prediction, i.e., four models for heat demand and four models for electrical demand. Figure 2 gives the algorithm workflow, which indicates the full algorithms that are applied in this paper.    SVR uses support vector machine (SVM), a traditional classification algorithm, to achieve the purpose of the regression [39]. The SVR seeks a function that minimizes the error between the data and a hyperplane that is represented by the function. In this way, it seems that there is no difference between SVR and traditional regression methods, such as the least squares method. However, the traditional regression method considers the prediction correct if, and only if, the regression f (x) is completely equal to y. Support vector regression holds that, as long as the difference between f (x) and y is within a given error range, it can be considered a correct prediction.
The first stage is to define the linear function for the hyperplane where w svr = w svr 1 , w svr 2 , . . . , w svr n is the vector of weights that are associated with individual input features, b svr is the intercept; the hyperplane is uniquely determined by this equation.
The second stage finds f SVR (x) with the minimal norm value (w svr T w svr ). According to [40], this is formulated as a convex optimization problem to minimize where y i is the training sample target and denotes the desired error range for all points. However, it is sometimes impossible to find a function that satisfies these constraints for all points, therefore the slack variables ξ i and ξ * i are used to guarantee that a solution exists for all points. The objective function becomes subject to the constraints where C is the box constraint, a positive value that controls the penalty that is imposed on observations that lie outside the error margin ( ) [40]; and, φ represents a kernel function for mapping the input space to a higher dimensional feature space. Reference [41] provides a detailed overview. A successful SVR model generally needs to be optimized for epsilon , kernel function ϕ and the box constraint C; Appendix B Table A4 shows the available range of the parameters.

Linear Model Stepwise Regression (LMSR)
LMSR differs from other regression methods, such as SVR, in that it has a process of choosing features, 'the feature extraction process' [42]. For example, when establishing a regression model, the SVR will use all features of X (X = x 1 , . . . , x r , . . . , x n ) from the dataset. In contrast, LMSR chooses a subset of X to establish the regression model that is based on their significance level obtained by an F-test. When establishing the model, simple features, e.g., x r will be tested for inclusion in the model [43]. A bilateral process is used with features that are selectively added to and removed from the regression model.
The first stage is to establish an initial regression model with a random single input feature x r y = w r lmsr x r + b lmsr + ε where x r are the features being chosen by the regression model, w r lmsr is the weight that is associated with individual features, b lmsr is the intercept, and ε is a vector of error terms.
The second stage is to identify a feature not currently in the model that 'improves' the regression. This requires each available term to be tested for significance: if the p-value of any terms is less than an entrance tolerance (pEnter), the term with the smallest p-value is added. This is repeated several times until no additional feature meets the entrance criteria.
The next stage is to identify whether any of the available terms in the model does not add value to the regression. Terms are again tested for significance for p-values that are greater than an exit tolerance (pRemove, i.e., hypothesis of a zero coefficient cannot be rejected). If this is the case, then the term with the largest p-value is removed and the assessment returns to the second stage, otherwise it stops. Appendix B Table A5 lists the parameter sets used.

Distance Weighted K-Nearest Neighbours (KNN)
The KNN model is a classic classification model and the principle is simple: for samples to receive the same classification, they should be 'similar'. Similarity is defined by 'distance' between a sample and a defined number of samples-the K-value [44]. The K-value is found by trial and error with improvements expected as the number of similar samples increases with errors reducing as the classes become more clear and inclusive; beyond a certain K-value, the performance will decrease and this defines the optimal value. Distance can be calculated in a number of ways with Euclidean distance being common, while the distance weighting can be calculated in a variety of ways (e.g., inverse distance). Appendix B Table A6 shows the range of different parameters that are available for the algorithm [45]. In this algorithm the value of demand is used as the class name rather than other relative terms, such as 'high' or 'low'.

Naive Bayes (NB)
Naive Bayes is a probabilistic classifier that allocates individual samples into various classes C k , with a vector of feature values X. Naive Bayes uses an initial classification of a set of samples to define the overall probability of particular classes occurring P(C k ). It then uses the probability of a smaller set of samples that are near to the instance to estimate the likelihood of an instance being of that class, i.e., the conditional probability P(C k X) . It differs from other algorithms, as it assumes that all variable features are independent of each other [46]. The assumption seems strong and not realistic, even though several Bayesian models have proven their capability in practice [47].
The first stage is to define the collection of unique values-or classes-in the time series of actual output C 1 , . . . , C k , . . . , C K (K ≤ m). The value of the daily demand is used as the class name. Each unique output value C k is associated with multiple combinations of input features X.
The second stage is to calculate the conditional probabilities. Bayes Theorem allows for conditional probability to be rewritten as P C k |X ) = P(X C k )P(C k )/P(X) The probabilities of each class P(C k ) and feature P(X) are fixed values, such that P(C k X) ∝ P(X|C k ). If the set of features is large then calculating conditional probabilities can be challenging, as this implies a substantial probability tree. However, with the Naive Bayes approach, as features are regarded independent from each other, the calculation reduces to a far simpler multiplication where (x 1 , . . . , x r , . . . , x n ) are the individual features. The final stage is to construct the classifier. The set X belongs to class C k , if it has the largest conditional probability Energies 2020, 13, 780 8 of 20 P(C k X r ) = max P(C 1 |X r ), P(C 2 |X r ), . . . , P(C K |X r ) (11) The distribution of feature values within each class is an important assumption. Here, the Gaussian (normal) Naive Bayes was found to perform the best (Equations (12) and (13)). The details of the parameters are listed in Table A7 in Appendix B [48].

Implementation and Optimization of ML Algorithms
In the process of modelling, data are divided into a training set and a test set. The test set is independent of the training data, is not involved in the training at all, and is used for the evaluation of the final model. This provides a more credible and objective assessment of the performance of the algorithm. The algorithms were developed while using Matlab 2018b Machine Learning toolbox [49] and its built-in optimization function. The optimization is a multiple-iteration training process and it uses 'k-fold cross-validation'. It divides the original training data into k groups while using each subset as a validation data set once, and all other k-1 subsets as a training set. The k models are each evaluated and the best performing will pass to the next iteration until the maximum number of iterations is achieved. Different algorithms can use different evaluation functions with normally the regression models using the mean squared error. The detailed optimization settings are given in Appendix B Table A8 and reference [49] contains the details of the terms in that table.

Feature Selection
Feature selection provides an automated approach for choosing a set of features that delivers the best performance of a given model. Here, the full feature set-or feature pool-contains a range of schedule information, outdoor temperature, as well as historical energy consumption data. Except for LMSR, which will automatically choose important features, the full feature set is used as inputs to all ML models and the results are regarded as the benchmark for evaluation. Different ML algorithms may require different combinations of features to deliver better performance and this is not known upfront. Except for the LMSR model, other algorithms have no inherent feature selection function. Here, a genetic algorithm is used to choose the combination of features that delivers the lowest root mean square error (RMSE) (Equation (14)) between modelled demand and historical demand. GA is a random search optimization method that generally consists of three main operators: selection, crossover, and mutation [50]. Selection is the process of generating a new population of feature combinations by choosing strongly performing combinations from an existing population; this study uses the roulette wheel method, which is a random sampling method [51] that uses relative performance to determine the probability of retaining a given feature set. Crossover simulates the breeding process by picking two feature sets and combining their parameter values. A mutation operator randomly changes the combinations of features within each set. After these steps, a new population is formed, evaluated, and sorted according to performance with the best combinations of features are then selected for the next iteration. The algorithm is repeated until the maximum number of iterations is reached. Appendix B Table A9 shows the parameters that are used in GA feature selection. During the feature selection, the 'k-fold cross-validation' was not used to avoid unwieldy numbers of models; 600 sets of parameter values are established for feature selection already and adding k-fold cross-validation to each iteration would have resulted in many thousands of models and excessive run times. Moreover, the GA is used to select features, rather than achieving the best performance of each model; this is carried out during final training of each algorithm.

Case Study
The case study building is the Chrystal Macmillan Building (CMB) at the University of Edinburgh sited on the George Square campus in central Edinburgh. Constructed in 1956, it was extensively refurbished around 2010 and it is representative of current building stock at the university and the wider higher education sector in the UK. The building has a total floor area of 7445 m 2 and it is mixed use for teaching, research, and administration. It has 75 offices, six classrooms, six meeting rooms, and a coffee shop. It was originally part of the Old Medical School, but now houses the School of Social and Political Science. The building is provided with heat from the campus district heating network provided by a gas combined heat and power system.

Data Collection and Preparation
The University of Edinburgh's Energy Office kindly provided the daily electricity and heat consumption data with building level rather than individual room loads. The data was provided with daily resolution and spans the period from 1 January 2010 to 26 April 2017. Figure 3 shows daily consumption over this seven-year period. Around 75% of the data set from 1 January 2010 to 26 April 2015 is used for training, and is itself separated into training and validation subsets within the algorithms. The latter two-year period from April 2015 to April 2017 is used to provide a final evaluation of performance, as it has played no part in the training process and, therefore, offers a reasonably objective measure. The statistical performance measures used in the remainder of the paper only relate to this independent test period.
Energies 2020, 13, x FOR PEER REVIEW 9 of 20 extensively refurbished around 2010 and it is representative of current building stock at the university and the wider higher education sector in the UK. The building has a total floor area of 7,445 m 2 and it is mixed use for teaching, research, and administration. It has 75 offices, six classrooms, six meeting rooms, and a coffee shop. It was originally part of the Old Medical School, but now houses the School of Social and Political Science. The building is provided with heat from the campus district heating network provided by a gas combined heat and power system.

Data Collection and Preparation
The University of Edinburgh's Energy Office kindly provided the daily electricity and heat consumption data with building level rather than individual room loads. The data was provided with daily resolution and spans the period from January 1st 2010 to April 26th 2017. Figure 3 shows daily consumption over this seven-year period. Around 75% of the data set from January 1st 2010 to April 26th 2015 is used for training, and is itself separated into training and validation subsets within the algorithms. The latter two-year period from April 2015 to April 2017 is used to provide a final evaluation of performance, as it has played no part in the training process and, therefore, offers a reasonably objective measure. The statistical performance measures used in the remainder of the paper only relate to this independent test period.
The weather data is collected from measured daily temperature data in the "Land and Marine Surface Stations Data (1853-current)" provided by the UK Met Office Integrated Data Archive System (MIDAS). All of the temperatures in MIDAS have been converted to Celsius and are stored with a precision of 0.1°C. The specific data comes from the Royal Botanic Gardens station some 2.5 km away from the building with two measurements per day, 9 AM and 9 PM with the recorder's height of 2 m. The higher one of them is named as 'HighT', while the lower one as 'LowT'. Figure 4 shows the daily average temperature and total energy consumption for working days and weekend days for 2016; this indicates the very significant seasonal patterns.   The weather data is collected from measured daily temperature data in the "Land and Marine Surface Stations Data (1853-current)" provided by the UK Met Office Integrated Data Archive System (MIDAS). All of the temperatures in MIDAS have been converted to Celsius and are stored with a precision of 0.1 • C. The specific data comes from the Royal Botanic Gardens station some 2.5 km away from the building with two measurements per day, 9 AM and 9 PM with the recorder's height of 2 m. The higher one of them is named as 'HighT', while the lower one as 'LowT'. Figure 4 shows the daily average temperature and total energy consumption for working days and weekend days for 2016; this indicates the very significant seasonal patterns.
There are occasional missing records or outliers (more than three standard deviations from the mean) in the heating and electricity records as well as a lack of temperature data for one or two data points due to equipment maintenance, communication system outage, and other reasons. The data was preprocessed before use, a crucial step in ML, because the algorithms cannot handle missing values and because the outliers will mislead the algorithm [52]. During preprocessing, the outliers were removed from the data and linear interpolation of neighbouring data was used to fill the missing points. Data normalization was applied to the input datasets (temperature and schedules) to a common scale, without distorting the differences in the ranges; this used the z-score (z = (x -µ)/σ for variable x with mean µ and standard deviation σ). Overall, 2376 data points, representing almost 90% of the total, were used in the training and validation process. Table 1 shows a summary of the datasets.

Analysis and Performance Evaluation
The input information includes daily high and low temperature (referred to as 'HighT' and 'LowT'), the historical heat demand (HHD) or historical electricity demand recordings (HED), as well as a range of schedule information. The following measures are applied to the historic and modelled demand time series in order to evaluate the performance of the algorithms: coefficient of correlation (R), root mean square error (RMSE), and the mean average percentage error (MAPE). They are defined, as follows: where M is the number of data points in the test data sets, f (x i ) is the modelled demand value, and y i is the actual demand value.

Feature Selection Results
The GA feature selection approach selects different subsets of features from 20 different features for different models. Figure 5 shows the selection process for the three different algorithms and Tables 2  and 3 show the GA selection results for heat and electricity demand predictions.
Energies 2020, 13, x FOR PEER REVIEW 11 of 20 words, we know they are useful, but do not know exactly how useful they are to specific models. The selection results from the GA also show that different algorithms choose different 'agent schedule' features. For the heat demand model, all of the algorithms choose "HHD, Monday, university closure", and at least one of the two temperatures as inputs. For SVR, KNN, and NB, the 'weekday' and 'weekend' features are also part of the GA selection results, while LMSR chooses 'Monday' to 'Friday', but does not choose 'Saturday' or 'Sunday', which also indicates a distinguishing between workday and non-workday. At the same time, none chose 'HED', "spring vacation", 'winter vacation' or "Sunday" as model inputs. For electrical demand, all of the algorithms select 'HED' and a large number of 'agent schedule' data as inputs, while none picked 'HHD' or 'Sunday' features.  Weekday, weekend and Monday to Sunday 12,13,14-20 Table 3. Feature set selected by GA (or embedded F-test for Linear Model Stepwise Regression (LMSR)).
From the selection results, it is clear that the lagged variable is vital in predicting the same energy type, while it has only limited predicting power for the other type. Furthermore, the selection succeeded in suggesting that temperature and historical records are the main drivers of heating demand, while some user behavior in the form of 'agent schedule' can improve the model's accuracy. These 'agent schedule' always bring large fluctuations in occupancy rate. For instance, the model chooses 'university closure', because it means extremely low occupancy rates when compared with other times. In terms of electricity demand, daily historical electricity records and a richer collection of 'agent schedules' are the most important determinants of its outcome. On the other hand, the selection results indicate that temperature is not important in electricity demand modelling.
Next, the performance of the four algorithms for three different feature sets (Fx) are compared. These are: F1-the optimal feature selection chosen by the GA, F2-all features with no selection by the GA; and, F3-the optimal feature selection minus the lagged energy demand variable. Table 4 summarises the results of the heat modelling across the four algorithms and the three feature sets. Figure 6 shows a sample time series for classification and regression models, respectively. Several broad conclusions can be drawn from the results. Firstly, all of the algorithms successfully describe the heat consumption patterns, but the regression models are superior to classification models, as indicated by lower RMSE, higher R value, and generally lower MAPE. Secondly, the optimal feature selection (F1) produces better results for all algorithms when compared to using all features (F2). Thirdly, the lagged variable is crucial in enhancing the model accuracy, because removing them from the optimal feature set reduces the accuracy of the prediction. Between the regression algorithms, SVR performs better than LMSR across the F1 and F3 cases. LMSR cannot choose all features because of its inherent characteristics, thus scenario F2 has no LMSR test. In the F1 case, where there is a feature selection process, the SVR model has clearly lower RMSE and MAPE values, and the R value is slightly above when compared to the LMSR model. In the F3 case, LMSR has lower RMSE, but a higher MAPE and lower R.

Heat Consumption Modelling
The error is often very large when the model is wrong, as the classification models only represent demand as a numerical 'class name' rather than a numerical value. As a result, it can be observed from Figure 6 that the classification models sometimes give extreme results in the F3 case, which result in high RMSE values when compared to the regression algorithms or their own performance in F1. Similar to regression algorithms, classification models perform best in the F1 case where the R-values of NB and KNN are close to 1. From the classification model results, it is clear that excluding the lagged variable (F3) led to misleading features, thus degrading model performance.  Through analysis of different scenarios, it is found that the F1 rather than F2 case offers better accuracy for heat, indicating a successful feature selection. In Figure 6, it can be seen that including all features (F2) leads to predictions that are over actual peak or below trough values. It is understood that this arises due to the action of the BMS, which employs both a temperature sensor and occupancy detectors [5]. The temperature sensor is normally located at the main switch of the heating pipe, while motion detectors are sited one per floor [53]. The heat supply is not adjusted to reflect heat emitted by human activities or electrical equipment operation because of the location of the temperature sensor. Only very large fluctuations in the schedule (as captured by the features selected by GA) are sufficient for influencing the occupancy detectors to change BMS behaviour and the consumption outcomes. From the results, it is clear that ML regression algorithms can successfully model the heat consumption.
Moreover, the F3 features perform worst among the three scenarios because the lagged variable, i.e., historical heat consumption data, is removed. From Figure 6, the NB and KNN models only track the general trend of heat consumption, giving a figure of 'high', 'medium', and 'low', but fail to show the actual results; this is particularly clear in Figure 6a.

Electricity Consumption Modelling
When compared with the heat consumption, electricity consumption patterns are more consistent, but are also more sensitive to activity level: the consumption in the working week is higher than that of the weekend, and consumption during the semester is higher than during the holiday periods. Similar to heat modelling, the regression algorithms are superior to the classification algorithms. The optimal feature set case (F1) delivers greater accuracy in modelling electricity demand over F2 and F3. When compared with the optimal heat case, the optimal electricity case contains almost all 'agent schedule' features. In addition, the results in the F1 case across all algorithms show that the influence of temperature on electricity consumption is limited, and activity level is the main factor driving electricity consumption.
It can be seen from Table 5 and Figure 7 that LMSR and SVR have very similar results in F1 and F3. In all three cases, the statistical results of all the classification algorithms are worse than the heating demand models. In particular, the classification algorithms in F3 do not accurately give a 'label' of electrical consumption (lowest R-value in all cases). KNN performance is better than NB in all three cases, with an R-value of 0.873 in F1. It can be seen from Figure 7a,b that the KNN and NB models successfully capture the overall trend of electricity consumption and give reliable indications of "high", "medium", and "low" in F1 and F2. The disappointing results in F3 indicate that, without lagged electricity data, electricity consumption is hard to classify with the remaining features.

Discussion and Conclusions
Previous work has studied ML applied to electricity and heat demand. However, these studies have either used regression models only or have not integrated 'feature selection' from large datasets, including daily historical electricity and heat consumption within the analysis. Here, four different algorithms, different scenarios regarding information on users, and the incorporation of other energy data allowed for a wide analysis of the benefits and limitations of the data-driven approach. Broadly, regression models perform better across the board and look fit for purpose in modelling daily A similar pattern of performance between cases is seen, as for heat with F1 performance better than F2 and F3 is the worst. However, all of the MAPE and RMSE values do not change as dramatically, largely as a result of the more consistent pattern of electricity consumption.

Discussion and Conclusions
Previous work has studied ML applied to electricity and heat demand. However, these studies have either used regression models only or have not integrated 'feature selection' from large datasets, including daily historical electricity and heat consumption within the analysis. Here, four different algorithms, different scenarios regarding information on users, and the incorporation of other energy data allowed for a wide analysis of the benefits and limitations of the data-driven approach. Broadly, regression models perform better across the board and look fit for purpose in modelling daily electricity and heat demand.
The use of feature selection on demand prediction improved performance in modelling both electricity and heat demand for all models. Without feature selection, the classification models offer poorer performance overall, although specific algorithms and cases are closer to that of regression models for heat modelling. This suggests that thermal loads are more easily classified than electrical loads, which may be due to the BMS that cannot respond directly to heat that is associated with electrical load or human activity, or more likely, that external temperature has a much more dominant role in heat demand.
On the other hand, the results from optimal feature selection indicate that the heat and electricity demand predictions need different information to achieve good performance. Temperature and some key schedule information that describes the occupancy rate along with the lagged demand variables are important for enhancing the model accuracy. Two results from the case study support this conclusion. First, all of the optimal feature sets contain the lagged variable and the results are the best among all features; secondly, removing the lagged variable from the optimal set degrades the accuracy of both electricity and heat models. The feature selection results show that the incorporation of historical heat demand information in the electricity modelling and vice versa was unhelpful in ensuring adequate modelling performance; this reflects the relative disconnect between the drivers of heating and electrical demand in this building. Clearly, this distinction will reduce in cases where heat is provided by electrical means. The 'agent schedule' approach that is used in this analysis is not what many would recognise as 'true' ABM; however, the value of providing information that reflects the behaviour of cohorts of users is clearly demonstrated.
The effectiveness of the data-driven approach has been demonstrated for the Chrystal Macmillan Building. Built 70 years ago and recently refurbished, the building is reasonably representative of a large number of buildings in the UK that are, or will, be refurbished. It is of significance to study the data-driven model for this kind of building: due to the continuous updating of historical buildings, the construction of physical models require the estimation of material properties, which involves great uncertainty and is difficult to achieve [54]. The results illustrate that the data-driven models can simulate the energy consumption, despite no knowledge of the building's physical condition.
There are a number of unresolved questions that arise from this work. First, as far as heat load is concerned, it seems that the ML model performs well; further work could consider whether introducing a hybrid model-incorporating building physical parameters in the data-driven model, e.g., BMS parameters or U-value-improves the predictive capability. Secondly, it is worth discussing whether the performance of the electrical demand classification model can be improved by introducing more detailed schedule and power equipment information. Answers to these questions may allow for the framework to be developed into a more complete toolkit in the future.
Finally, the proposed approach can be used to estimate other buildings' energy consumption by transformation. For example, the demand model developed here will be applied to other University of Edinburgh buildings with similar uses and building controls to develop models of campus level energy use. This would enable the forecasting of day-ahead electricity and heat consumption to assist with efforts to reduce operational energy costs and CO 2 emissions. Similarly, this enables a more accurate load curve incorporating realistic schedule information to be applied in simulating the effects of interventions in building fabric and energy supply options in long term planning exercises. Funding: This work is funded by the EPSRC National Centre for Energy Systems Integration (grant number EP/P001173/1) and the School of Engineering, University of Edinburgh.

Acknowledgments:
The authors gratefully acknowledge the assistance of David Jack and Chris Litwiniuk of the University of Edinburgh for their advice and provision of data. We also gratefully acknowledge the two anonymous reviewers for their valuable suggestions.

Conflicts of Interest:
The authors declare no conflict of interest.