Estimating the Heating Load of Buildings for Smart City Planning Using a Novel Artiﬁcial Intelligence Technique PSO-XGBoost

: In this study, a novel technique to support smart city planning in estimating and controlling the heating load (HL) of buildings, was proposed, namely PSO-XGBoost. Accordingly, the extreme gradient boosting machine (XGBoost) was developed to estimate HL ﬁrst; then, the particle swarm optimization (PSO) algorithm was applied to optimize the performance of the XGBoost model. The classical XGBoost model, support vector machine (SVM), random forest (RF), Gaussian process (GP), and classiﬁcation and regression trees (CART) models were also investigated and developed to predict the HL of building systems, and compared with the proposed PSO-XGBoost model; 837 investigations of buildings were considered and analyzed with many inﬂuential factors, such as glazing area distribution (GAD), glazing area (GA), orientation (O), overall height (OH), roof area (RA), wall area (WA), surface area (SA), and relative compactness (RC). Mean absolute percentage error (MAPE), root-mean-squared error (RMSE), variance account for (VAF), mean absolute error (MAE), and determination coe ﬃ cient (R 2 ), were used as the statistical criteria for evaluating the performance of the above models. The color intensity, as well as the ranking method, were also used to compare and evaluate the models. The results showed that the proposed PSO-XGBoost model was the most robust technique for estimating the HL of building systems. The remaining models (i.e., XGBoost, SVM, RF, GP, and CART) yielded more mediocre performance through RMSE, MAE, R 2 , VAF, and MAPE metrics. Another ﬁnding of this study also indicated that OH, RA, WA, and SA were the most critical parameters for the accuracy of the proposed PSO-XGBoost model. They should be particularly interested in smart city planning as well as the optimization of smart cities.


Introduction
Smart cities are the development goals of many countries around the world [1]. Intelligent systems have been widely researched and applied in smart cities to provide a better quality of life, as well as to bring higher economic efficiency [2][3][4][5][6]. One of the critical issues of smart cities is the efficient use of energy by buildings [7][8][9]. Of those, energy for cooling or heating the buildings is significant since they take a considerable part of the total energy [10]. In winter, the demand for energy for the heating load (HL) is significant [11]. The ineffective use of HL not only results in economic losses but also a threat vector regression (SVR), MLP, and random forests (RF). The same dataset (i.e., 768 experimental datasets) was used to develop the HL predictive models in their study. Their results indicated that the GWA technique can predict HL of buildings more accurately than the other models (i.e., GSGP, ANN, EMARS, SVR, MLP, and RF). Similar works for predicting HL of buildings can be found in the following literatures [22][23][24][25][26][27][28][29].
Although soft computing models for estimating the HL of the building system were developed; however, they have not been evaluated comprehensively on subjective and objective factors, such as glazing area distribution (GAD), glazing area (GA), orientation (O), overall height (OH), roof area (RA), wall area (WA), surface area (SA), and relative compactness (RC). Furthermore, new smart systems with high efficiency are always the target of engineers and scientists to optimize building systems, as well as smart cities planning. Therefore, this study developed and proposed a new technique to estimate HL, based on an evolutionary algorithm (particle swarm optimization-PSO) and an extreme gradient boosting (XGBoost) model, namely PSO-XGBoost model. Careful consideration of GAD, GA, O, OH, RA, WA, SA, and RC were implemented in the present study for estimating HL of building systems. Five other AI techniques include XGBoost, support vector machine (SVM), random forest (RF), Gaussian process (GP), and classification and regression trees (CART), were also investigated and developed to predict the HL of building systems and compared with the proposed PSO-XGBoost model.
The structure of this study was organized as follows: Section 1 presents the reason for conducting this study and related works; Section 2 presents the details of data collection and properties of the database used; Section 3 presents the background of the methods used; Section 4 proposes the framework of the new technique to estimate the HL of buildings systems (i.e., PSO-XGBoost); Section 5 introduces several performance indices for evaluating the accuracy of the developed models; The results and discussion are presented in Section 6; finally, conclusions and remarks are given in Section 7.

Experimental Database
To implement this study, a database includes 768 buildings was analyzed by Tsanas, Xifara [30], which was used with nine parameters, i.e., glazing area distribution (GAD), glazing area (GA), orientation (O), overall height (OH), roof area (RA), wall area (WA), surface area (SA), relative compactness (RC), and heating load (HL). Sixty-nine other buildings with similar parameters were also analyzed and investigated in Vietnam during the winter (2018), using Ecotect computer software. Ultimately, in total, we used 837 simulated buildings in this study for estimating the HL of buildings; 12 forms of building shape were surveyed with the different RC, as shown in Figure 1. All buildings have different dimensions and surface areas but have the same materials for each building.
For this aim, GAD, GA, O, OH, RA, WA, SA, and RC were considered as the input variables to estimate the HL of buildings according to the recommendations of the previous studies [30][31][32][33]. As introduced above, most buildings have shapes of orthogonal polyhedral, as shown in Figure 1. Different types of building were interpreted through the RC, and is calculated as follows: where V denotes the volume of the building, m 3 ; A denotes the surface area (i.e., SA) of the buildings, m 2 . The SA parameters were calculated based on the floor, roof, wall areas, and the overall building height (i.e., OH). To implement this study, four primary orientations were surveyed, including east, west, south, and north. A data encryption procedure under the numerical for orientations has been performed with the values of 1, 2, 3, 4, for east, west, south, north, respectively (Table 1). Six glazing area (i.e., GA) percentages were recorded, including 0%, 10%, 15%, 25%, 40%, and 50%. Besides, the glazing area distribution (i.e., GAD) also investigated five forms of distribution, including south, uniform, west, north, and east. They were also encrypted as 1, 2, 3, 4, 5, for east, uniform, south, north, and west, respectively. The remaining parameters (i.e., RA and WA) were calculated based on their  [31].
For this aim, GAD, GA, O, OH, RA, WA, SA, and RC were considered as the input variables to estimate the HL of buildings according to the recommendations of the previous studies [30][31][32][33]. As introduced above, most buildings have shapes of orthogonal polyhedral, as shown in Figure 1. Different types of building were interpreted through the RC, and is calculated as follows: where V denotes the volume of the building, m 3 ; A denotes the surface area (i.e., SA) of the buildings, m 2 .
The SA parameters were calculated based on the floor, roof, wall areas, and the overall building height (i.e., OH). To implement this study, four primary orientations were surveyed, including east, west, south, and north. A data encryption procedure under the numerical for orientations has been performed with the values of 1, 2, 3, 4, for east, west, south, north, respectively (Table 1). Six glazing area (i.e., GA) percentages were recorded, including 0%, 10%, 15%, 25%, 40%, and 50%. Besides, the glazing area distribution (i.e., GAD) also investigated five forms of distribution, including south, uniform, west, north, and east. They were also encrypted as 1, 2, 3, 4, 5, for east, uniform, south, north, and west, respectively. The remaining parameters (i.e., RA and WA) were calculated based on their dimensions in the AutoCAD environment. To determine the HL of the buildings, the Ecotect computer software was used to simulate the energy efficiency of the 837 buildings. Box and whisker plots of the dataset are shown in Figure 2.

Background of the Methods Used
As stated above, this study performs an HL estimation of buildings by the use of six techniques, i.e., PSO-XGBoost, XGBoost, SVM, RF, GP, and CART. However, due to the details of SVM, RF, GP, and CART were introduced in many previous kinds of literature [34][35][36][37][38][39][40][41][42]; so, some description of them were added in the present study. Since the main objective of this study was to develop the new hybrid technique, i.e., PSO-XGBoost; therefore, the details of PSO and XGBoost were presented in this section.

Background of the Methods Used
As stated above, this study performs an HL estimation of buildings by the use of six techniques, i.e., PSO-XGBoost, XGBoost, SVM, RF, GP, and CART. However, due to the details of SVM, RF, GP, and CART were introduced in many previous kinds of literature [34][35][36][37][38][39][40][41][42]; so, some description of them were added in the present study. Since the main objective of this study was to develop the new hybrid technique, i.e., PSO-XGBoost; therefore, the details of PSO and XGBoost were presented in this section.

Particle Swarm Optimization (PSO) Algorithm
PSO is a swarm algorithm inspired by the behavior of the particles/social animals, such as fish, or birds. It is a stochastic optimization method that was introduced and developed by Eberhart, Kennedy [43]. It was also classified as one of the metaheuristic techniques. The main idea of the PSO algorithm is to make better social information sharing among individuals in a crowd. Each individual act as a particle in the swarm. Then, they implement a searching procedure in a searching space. During the search process, they share information and experience to update better locations [44]. Thus, it was also considered as an evolutionary computation technique in the statistical community [44][45][46][47][48][49]. The PSO algorithm implements five steps for optimal searching: -Step 1: Initialize the aboriginal population and velocity of particles. Subsequently, compute the particles fitness and detect the most logical place as local best and global best.
-Step 2: Every particle flies roundly in the space of search with the initial velocity as established in the first step. The speed depends on the local best and global best. For each loop, the best solution is corresponding to the local best, and the best spot of the particle obtained so far is corresponding to the global best. In other words, corresponding to the local best and global best in each loop, the velocity is updated in this step. It is described as: j indicates the position of the particles; v (i) j denotes the speed of the j th particle at the i th iteration; w represents the coefficient of inertial weight; i is the number of repetitions; and r 1 and r 2 symbolize numbers in the interval [0,1].
-Step 3: After the new velocity is calculated and updated, the particles fly in the search space with the new speed. Corresponding to each position, the fitness of them is determined and updated through a fitness function (i.e., RMSE).
-Step 4: Update local best and global best for the better position with lower RMSE. The local best can be updated as: -Step 5: Check the satisfaction of the searching. If the fitness of the particle is the best (i.e., lowest RMSE), stopping the searching. Otherwise, return to step 2.
The pseudo code of the PSO algorithm for optimization of searching is shown in Figure 3. Appl. Sci. 2019, 9, x 7 of 26 Figure 3. The particle swarm optimization (PSO) pseudo-code for the optimization process [50].

Extreme Gradient Boosting Machine (XGBoost)
Based on the ideas of a gradient boosting machine [51,52], Chen, He [53] improved and introduced the XGBoost algorithm as a robust decision tree. Unlike the gradient boosting machine, XGBoost can Appl. Sci. 2019, 9, 2714 7 of 22 run in parallel based on the constructed boosted trees. It can handle complex data at high speed and accuracy. The XGBoost algorithm can be described as follows [54]: Given a dataset with n examples and m features D = (x i , y i ) (|D| = n, x i ∈ R m , y i ∈ R), K additive functions will be used to predict the output values of a tree ensemble model as follows: where F is the regression trees space. It is calculated as: where q denotes for the structure of each tree; T denotes for the number of leaves in the tree, and f k is a function that corresponds to an independent tree structure q and leaf weights w.
To reduce errors of ensemble trees, the objective function is calculated in the XGBoost model: where l is a differentiable convex objective function to determine the error between predicted and measured values; y i andŷ i is regulated and predicted values, respectively; t denotes the repetitions in order to minimize the errors; and Ω is the complexity penalize with the regression tree functions:

Support Vector Machine (SVM)
SVM is a machine learning algorithm based on the principle of minimizing structural risk to generalize a limited number of samples better and was proposed by Cortes, Vapnik [55]. It can solve both classification and regression problems [56,57]. Since the objectives of this study was numeric, so, support vector machine for regression problems were applied. In the general regression learning problem, the learning machine-based training data from which it attempts to learn the input-output relationship (dependency, mapping, or function) f(x). A training data set D = [x(i), y(i)] ∈ n * , i = 1, . . . , l . However, in the case of SVM's regression, the error of approximation was measured rather than the margin used in classification. Therefore, a linear regression hyperplane by minimizing Vapnik's insensitivity loss function is shown in formula (7).
From the above function it can be seen the slack variables ψ i and ψ * i also related to Lagrange multipliers ν i and ν * i , and note also the trade-off between an approximation error and the weight vector norm W were affected by constant C. While, in the case of nonlinear regression, solving the maximum value of dual Lagrange function is equivalent to solving the learning problem, the dual Lagrangian in input space is: Subject to : After calculation of Lagrange multipliers ν i and ν * i , an optimal weight vector of the regression hyperplane could be expressed with the formula (9).
In order to create the best nonlinear regression function, the most crucial method is using the kernel function, there are some formulas kernel functions had been proposed, as listed in Table 1. Table 1. Kernel functions.

Kernel Functions
Type Two-layer neural kernel We replaced x i by the corresponding feature vector in a feature space and made some changes Besides, an optimal weighting vector of the kernel's expansion, as shown in formulas (10), can be calculated with Lagrange multiplier.
In this way, the regression function combined weighting vector and kernels function as follows,

Random Forest (RF)
Breiman [58] provided RF that is an ensemble machine learning method, which is a decision trees algorithm and has a proper performance for classification and regression purposes. The RF algorithm consists of many decision trees through bootstrap aggregation (bagging) [59,60]. For arriving at a final decision, this algorithm includes the combination of a result set of various decision trees, and each tree is commonly trained with choosing variables as well as data samples, randomly, from an initial training database [61].
RF should be applied as follows to estimate the HL of buildings: (a): For ensuring the forest richness, the number of trees indicated was determined.
(b): The remaining amount that had been utilized for validation were named out-of-bag (OOB) data, bootstrap with replacement by the principal HL training database.
(c): A non-pruning regression was expanded to enhance each node, for each bootstrap sample. (e): Performance indices including RMSE, R 2 , MAE, VAF, and MAPE could be used to analyze the predicted HL amounts on OOB.

Gaussian Process (GP)
AI has different non-parametric models like GP. This non-parametric model consists of random parameters along with finite element by a Gaussian distribution [62]. According to the covariance function, and also mean function, h(x), GP can be determined as follow: The GP starts for encrypting the uncertainty before performing the training in the case of regression. There is a relation between the function and the data that can be indicated by it. The Bayes' rule can be employed for updating the confidences by the function. Moreover, the posterior distribution can be calculated by the Bayes' rule [63].
In previous works, the non-parametric model of GP was not used for the prediction of HL. Therefore, in this paper, we studied the usages of GP associated with the radial basis function that can be described in Table 2.

Classification and Regression Tree (CART)
As one of the most popular statistical methods, the CART (classification and regression tree) has been widely employed to handle classification and regression problems [64]. Inspired by the growth process of trees, the construction of CART trees generally consists of roots, leaves, branches, and nodes. By means of binary recursive classification techniques, CART algorithms divide each sample set into two sub-sample sets, and thus, there are two branches in each none-leaf node [65,66].
The CART was first proposed and developed by Breiman et al. (1984). Different from traditional statistical methods, the CART is mainly established by many binary decision trees and is easy to interpret and understand. Especially for the complex data and significant variables, the CART tends to show better prediction accuracy compared with previous prediction methods. The most prominent advantages of the CART are the keen ability to distinguish the importance of variables and eliminating outliers.

Proposing the PSO-XGBoost Framework for Estimating HL
In this section, the PSO-XGBoost model was proposed to predict the HL of the building systems. For this stage, an initial XGBoost model was developed first; then, its hyper-parameters were optimized by the PSO algorithm. In the initial XGBoost model, seven hyper-parameters were considered and optimized, including subsample ratio of columns (δ), boosting iterations (k), minimum loss reduction (γ), max tree depth (d), shrinkage (η), subsample percentage (ς), and a minimum sum of instance weight (µ). For determination of the optimal values of these parameters, the particles fly in the search space and exchange the experiences. For each position, they calculate the fitness of particles via a fitness function, i.e., RMSE in equation 13. For each value of hyper-parameters, a corresponding RMSE value was computed, and the best fit model is corresponding to the lowest RMSE. The scheme of the development of the PSO-XGBoost model for estimating the HL of buildings is shown in Figure 4. minimum sum of instance weight (  ). For determination of the optimal values of these parameters , the particles fly in the search space and exchange the experiences. For each position, they calculate the fitness of particles via a fitness function, i.e., RMSE in equation 13. For each value of hyperparameters, a corresponding RMSE value was computed, and the best fit model is corresponding to the lowest RMSE. The scheme of the development of the PSO-XGBoost model for estimating the HL of buildings is shown in Figure 4.

Performance Evaluation Indices
To evaluate the quality of the PSO-XGBoost, XGBoost, SVM, RF, GP, and CART models, five indices of performance were used, including mean absolute percentage error (MAPE), root -mean-

Performance Evaluation Indices
To evaluate the quality of the PSO-XGBoost, XGBoost, SVM, RF, GP, and CART models, five indices of performance were used, including mean absolute percentage error (MAPE), root-mean-squared error (RMSE), variance account for (VAF), mean absolute error (MAE), and determination coefficient (R 2 ). The calculation of RMSE, R 2 , MAE, VAF, and MAPE was described in Equations (13)- (17): where n stands for the number of instances; y, y i, andŷ i consider as average, calculated, and modeled amounts of the response variable.

Results and Discussions
In this stage, 80% of the whole dataset was randomly selected to develop the HL of buildings forecasting models; the remaining 20% of the dataset was used to test and re-evaluate the accuracy, as well as the performance of the developed models. The re-sampling method of 10-fold cross-validation was applied to reduce the error of the models. Note that, the same training/testing datasets, as well as the re-sampling techniques, were used for all models. It is of interest to consider the feasibility of the proposed PSO-XGBoost model in estimating the HL of buildings systems. Indeed, a combined process between the XGBoost model and the PSO algorithm was made based on the framework introduced in Figure 4. The parameters of the PSO algorithm has been optimally set before performing the optimization of the XGBoost model, as shown in Table 2. After the parameters of the PSO algorithm have been set, the process of searching and optimizing for the hyper-parameters of the XGBoost was performed. Its performance is shown in Figure 5.   As shown in Figure 5, an optimal PSO-XGBoost model was found with the swarm size of 400 and stopped at the iteration of 209 with the lowest RMSE (i.e., RMSE = 1.776). For comparison and overall performance evaluation of the developed PSO-XGBoost model, the remaining models have also been developed as previously introduced, including XGBoost, SVM, RF, GP, and CART models.  For the development of the SVM model, the radial basis function (RBF) was used as a kernel function to estimate the HL in the present study. Two hyper -parameters of the SVM model with the RBF were selected and considered to construct the SVM model, including Sigma ( ) and cost (C). A grid search was established to find the optimal values for  and C with  lies in the range of [0,1]; C lies in the range of [0. 25,5]. The stepwise for  and C in the grid search is 0.05 and 0.25, respectively. The "scale" method was applied to reduce the skewness of the data in this study. The ten-fold cross-validation technique was also applied to increase the accuracy of the model. Ultimately, the optimal SVM model for estimating HL in this study was defined with the following hyper-parameters:  = 0.05 and C = 1.75, as shown in Figure 7. Note that, the same training dataset (i.e., 672 experimental datasets) was used to develop the SVM model, like those used for the development of the PSO-XGBoost and XGBoost models. For the development of the SVM model, the radial basis function (RBF) was used as a kernel function to estimate the HL in the present study. Two hyper-parameters of the SVM model with the RBF were selected and considered to construct the SVM model, including Sigma (σ) and cost (C). A grid search was established to find the optimal values for σ and C with σ lies in the range of [0,1]; C lies in the range of [0. 25,5]. The stepwise for σ and C in the grid search is 0.05 and 0.25, respectively. The "scale" method was applied to reduce the skewness of the data in this study. The ten-fold cross-validation technique was also applied to increase the accuracy of the model. Ultimately, the optimal SVM model for estimating HL in this study was defined with the following hyper-parameters: σ = 0.05 and C = 1.75, as shown in Figure 7. Note that, the same training dataset (i.e., 672 experimental datasets) was used to develop the SVM model, like those used for the development of the PSO-XGBoost and XGBoost models.  For the development of the RF model, the number of the tree in the forest (n) and randomly selected predictor ( ), were used to adjust the accuracy/performance of the RF model. According to the recommendation of Nguyen, Bui [67], n should be set equal to 2000 to ensure the enrichment of the forest. Then, was tested to check the accuracy of the RF model. Since the predictors used in this study was eight, was set in the range of 1 to 8. Ultimately, the performance of the RF model based on the 2000 trees and randomly predictors was computed in Figure 8. As a result, the best RF model for predicting HL in this study was the RF3 (i.e., = 3). Note that, the same techniques, as well as the same training dataset, were used for the development of the RF model, like those used to develop the previous models (i.e., PSO-XGBoost, XGBoost, SVM). For the development of the RF model, the number of the tree in the forest (n) and randomly selected predictor ( ), were used to adjust the accuracy/performance of the RF model. According to the recommendation of Nguyen, Bui [67], n should be set equal to 2000 to ensure the enrichment of the forest. Then,  was tested to check the accuracy of the RF model. Since the predictors used in this study was eight,  was set in the range of 1 to 8. Ultimately, the performance of the RF model based on the 2000 trees and randomly predictors was computed in Figure 8. As a result, the best RF model for predicting HL in this study was the RF3 (i.e.,  = 3). Note that, the same techniques, as well as the same training dataset, were used for the development of the RF model, like those used to develop the previous models (i.e., PSO-XGBoost, XGBoost, SVM).  Like to the SVM model, kernel functions can also be applied to develop the GP model. RBF was also used to develop the GP model for estimating HL in this work with σ was the only parameter used to control the accuracy of the GP model. A grid search, as well as the same techniques, like those used for the previous models, was also applied to develop the GP model. As a result, an optimal GP model was developed in this study with an σ of 0.009, for estimating the HL of buildings (Figure 9). Like to the SVM model, kernel functions can also be applied to develop the GP model. RBF was also used to develop the GP model for estimating HL in this work with  was the only parameter used to control the accuracy of the GP model. A grid search, as well as the same techniques, like those used for the previous models, was also applied to develop the GP model. As a result, an optimal GP model was developed in this study with an  of 0.009, for estimating the HL of buildings ( Figure 9). Finally, an optimal CART model was also developed based on the same techniques and the training dataset, like those used for the previous models. Note that, only the complexity parameter (  ) was used to develop the CART model with the grid search lies in the range of [0,0.1] (Figure 10). Finally, an optimal CART model was also developed based on the same techniques and the training dataset, like those used for the previous models. Note that, only the complexity parameter (ψ) was used to develop the CART model with the grid search lies in the range of [0,0.1] (Figure 10). Finally, an optimal CART model was also developed based on the same techniques and the training dataset, like those used for the previous models. Note that, only the complexity parameter (  ) was used to develop the CART model with the grid search lies in the range of [0,0.1] (Figure 10).  After the HL predictive models were developed, 165 observations of the testing dataset were used to evaluate the performance of the model through the statistical criteria, i.e., RMSE, R 2 , VAF, MAE, and MAPE, as shown in Table 3. The intensity of color and ranking method were also applied to evaluate the models.
The results in Table 3 confirmed the perfect predictability of the proposed AI techniques for the HL of buildings systems in the present study. Of those AI techniques, the proposed PSO-XGBoost model was defined as the most superior model in predicting the HL of buildings with an RMSE of 1.124, R 2 of 0.990, MAE of 0.615, VAF of 98.934, MAPE of 0.024, and total ranking of 29. Figure 11 illustrates the exact predictability of the proposed PSO-XGBoost model on the testing dataset. The results in Table 3 confirmed the perfect predictability of the proposed AI techniques for the HL of buildings systems in the present study. Of those AI techniques, the proposed PSO-XGBoost model was defined as the most superior model in predicting the HL of buildings with an RMSE of 1.124, R 2 of 0.990, MAE of 0.615, VAF of 98.934, MAPE of 0.024, and total ranking of 29. Figure 11 illustrates the exact predictability of the proposed PSO-XGBoost model on the testing dataset.   Compared with the classical XGBoost model (without being optimized by the PSO algorithm), the proposed PSO-XGBoost model provided superior performance in estimating the HL of buildings. The results of the XGBoost model (i.e., RMSE = 1.651, R 2 = 0.977, MAE = 0.720, VAF = 97.664, MAPE = 0.028, total ranking of 17) recognized the significant optimization capabilities of the PSO algorithm in this study. Observing closely in Table 3, it can be seen that the XGBoost model had provided lower performance than the RF model (RMSE = 1.589, R 2 = 0.978, MAE = 0.557, VAF = 97.835, MAPE = 0.026, total ranking of 25). However, the XGBoost model had become even more potent than the RF model when optimized by the PSO algorithm (i.e., PSO-XGBoost). The remaining models provided a lower performance than the proposed PSO-XGBoost model. Figures 12-16 illustrate the accuracy level of the XGBoost, SVM, RF, GP, and CART models, respectively, on the testing dataset.             Considering the advantages and disadvantages of the other methods (i.e., XGBoost, SVM, RF, GP, and CART), it can be seen that the development of these models was more straightforward than the proposed PSO-XGBoost model, especially the GP and CART models with only one parameter used to build the models. The RF and SVM models are a bit more complicated when using two parameters to build predictive models. Most notably, the XGBoost model used seven parameters to build the model (i.e., k, d, η, γ, δ, ς, and µ). The higher the number of variables used, the more processing time and model construction. This is one of the disadvantages of complex models. Besides, although the construction of the single models (i.e., XGBoost, SVM, RF, GP, and CART) was more straightforward than the proposed PSO-XGBoost model; however, their performance was proven to be lower than the proposed PSO-XGBoost model.
Herein, the proposed PSO-XGBoost model had performed very well in estimating the HL of buildings systems with very high accuracy based on input variables. However, the number of input variables used in the present study was high; therefore, a thorough consideration of the relationship, as well as the importance of the input variables used in this study is necessary. The sensitivity indices based on the Csiszar f-divergence method was used to determine the significance of the inputs [68][69][70]. This technique implements a sensitivity analysis based on the density. The influence of the input variables was established in terms of the difference between the density function of the entire output and the density function of output. Note that the input variables were fixed. The difference between density functions was measured with Csiszar f-divergences. Finally, an evaluation was performed through the estimation of kernel density. The results of the estimation were shown in Figure 17. The results revealed that OH, RA, WA, and SA were the most critical variables in estimating the HL of buildings systems. Considering the advantages and disadvantages of the other methods (i.e., XGBoost, SVM, RF, GP, and CART), it can be seen that the development of these models was more straightforward than the proposed PSO-XGBoost model, especially the GP and CART models with only one parameter used to build the models. The RF and SVM models are a bit more complicated when using two parameters to build predictive models. Most notably, the XGBoost model used seven parameters to build the model (i.e., k, d, ,  ,  ,  , and  ). The higher the number of variables used, the more processing time and model construction. This is one of the disadvantages of complex models. Besides, although the construction of the single models (i.e., XGBoost, SVM, RF, GP, and CART) was more straightforward than the proposed PSO-XGBoost model; however, their performance was proven to be lower than the proposed PSO-XGBoost model.
Herein, the proposed PSO-XGBoost model had performed very well in estimating the HL of buildings systems with very high accuracy based on input va riables. However, the number of input variables used in the present study was high; therefore, a thorough consideration of the relationship, as well as the importance of the input variables used in this study is necessary. The sensitivity indices based on the Csiszar f-divergence method was used to determine the significance of the inputs [68][69][70]. This technique implements a sensitivity analysis based on the density. The influence of the input variables was established in terms of the difference between the density function of the entire output and the density function of output. Note that the input variables were fixed. The difference between density functions was measured with Csiszar f-divergences. Finally, an evaluation was performed through the estimation of kernel density. The results of the estimation were shown in Figure 17. The results revealed that OH, RA, WA, and SA were the most critical variables in estimating the HL of buildings systems.

Conclusions and Remarks
Optimizing and designing HL systems of the buildings is one of the crucial tasks in smart cities. Effective use of HL enables buildings to be more energy efficient, reduce economic losses, as well as reduce adverse environmental impacts. This study proposed a new technique (PSO-XGBoost) to predict HL of buildings system with high reliability (RMSE = 1.124, R 2 = 0.990, MAE = 0.615, VAF = 98.934, MAPE = 0.024). Based on the obtained results of this study, some conclusions and remarks are drawn as: -The AI techniques in this study, included PSO-XGBoost, XGBoost, SVM, RF, GP, and CART, were bright candidates for estimating the HL of building systems in practice. They could predict the HL of building systems with high reliability, especially the proposed PSO-XGBoost model.
-The proposed PSO-XGBoost model was a robust technique, which could accurately predict the HL of building systems with a promising result (RMSE = 1.124, R 2 = 0.990, MAE = 0.615, VAF = 98.934, Figure 17. The importance of the input variables used in this study.

Conclusions and Remarks
Optimizing and designing HL systems of the buildings is one of the crucial tasks in smart cities. Effective use of HL enables buildings to be more energy efficient, reduce economic losses, as well as reduce adverse environmental impacts. This study proposed a new technique (PSO-XGBoost) to predict HL of buildings system with high reliability (RMSE = 1.124, R 2 = 0.990, MAE = 0.615, VAF = 98.934, MAPE = 0.024). Based on the obtained results of this study, some conclusions and remarks are drawn as: -The AI techniques in this study, included PSO-XGBoost, XGBoost, SVM, RF, GP, and CART, were bright candidates for estimating the HL of building systems in practice. They could predict the HL of building systems with high reliability, especially the proposed PSO-XGBoost model.
-The proposed PSO-XGBoost model was a robust technique, which could accurately predict the HL of building systems with a promising result (RMSE = 1.124, R 2 = 0.990, MAE = 0.615, VAF = 98.934, MAPE = 0.024). It should be used as an alternative tool for experimental measurements. Furthermore, building design optimization methods can also be applied based on the proposed PSO-XGBoost model to minimize heat loss for buildings.
-Although the SVM, RF, GP, and CART models were bringing acceptable performance in this study. However, they need further research to improve the accuracy of estimating the HL of building systems, uniquely combining them with optimization algorithms.
-OH, RA, WA, and SA are the input variables that have the most influence on the accuracy of the HL of building systems forecasting model. It should be carefully collected and used as essential variables in the development of HL forecasting models.
Although the results of this study was perfect for evaluating and predicting the HL of building systems; however, they need to be further researched in the future works, such as improving the accuracy of the other models (i.e., XGBoost, SVM, RF, GP, and CART), or try to build several novel hybrid artificial intelligence systems based on these models and optimization algorithms. The optimization of building design is also one of the challenges of engineers in the future aim of energy-efficiency. It can be conducted based on the models of this study.