Multi-Objective Plum Tree Algorithm and Machine Learning for Heating and Cooling Load Prediction

: The prediction of heating and cooling loads using machine learning algorithms has been considered frequently in the research literature. However, many of the studies considered the default values of the hyperparameters. This manuscript addresses both the selection of the best regressor and the tuning of the hyperparameter values using a novel nature-inspired algorithm, namely, the Multi-Objective Plum Tree Algorithm. The two objectives that were optimized were the averages of the heating and cooling predictions. The three algorithms that were compared were the Extra Trees Regressor, the Gradient Boosting Regressor, and the Random Forest Regressor of the sklearn machine learning Python library. We considered five hyperparameters which were configurable for each of the three regressors. The solutions were ranked using the MOORA method. The Multi-Objective Plum Tree Algorithm returned a root mean square error value for heating equal to 0.035719 and a root mean square error for cooling equal to 0.076197. The results are comparable to the ones obtained using standard multi-objective algorithms such as the Multi-Objective Grey Wolf Optimizer, Multi-Objective Particle Swarm Optimization, and NSGA-II. The results are also performant concerning the previous studies, which considered the same experimental dataset.


Introduction
Energy conservation and emission reduction have recently received a lot of attention in the context of the increase in energy consumption [1].The prediction of heating and cooling loads could lead to a rational use of renewable energy to replace thermal energy systems which are based on fossil fuel.Moreover, reducing heating and cooling loads can help to decarbonize the building sector [2].Therefore, the application of machine learning techniques has become important in the context of heating and cooling load prediction.Even though many living buildings are usually equipped with electricity metrics which provide almost complete statistics, the data are often not enough to develop complex machine learning models for the prediction of heating and cooling loads.Information about characteristics which are particular to each building, such as the wall area, the surface area, and the roof area, can complement the metrics data to develop algorithms which are more accurate and which better address the particularities of buildings.
Buildings are responsible for approximately 40% of total global energy consumption [3].The improvement of energy efficiency and the conservation of more energy has become essential in recent years [4,5] due to the adverse effects of high-energy consumption on the environment.The estimation of heating and cooling loads depends on the characteristics of the structure.To construct energy-efficient buildings, it is helpful to develop conceptual systems that anticipate the cooling load in the residential building sector [6].
Since energy resources present limitations and have an important role in the economic development of countries, reduction in energy consumption represents a necessity [7,8].The Energies 2024, 17, 3054 2 of 23 modeling of heating and cooling loads represents the cornerstone of energy-efficient building design.Furthermore, as stated in [9], energy efficiency leads to both environmental and financial benefits [10].Moreover, energy efficiency directly impacts economic competitiveness and sustainable development [11].These facts underscore the significance of the work presented in the current manuscript.
The accurate prediction of energy consumption and the determination of the factors that influence heating energy consumption are important [12] in the context of the substantial increase in energy consumption in the case of residential buildings [13].Therefore, the use of advanced machine learning algorithms for energy consumption prediction presents great interest for researchers.As can be seen in [14], methods based on machine learning for the prediction of energy consumption have advanced significantly in recent years.The development and examination of machine learning algorithms which can learn from patterns in the data and make predictions presented a lot of interest for many scholars and scientists [15].
Several statistics, such as the ones presented by the authors of [16], estimate that by 2040, there will be an increase of nearly 25% in global energy demand.Also, to create a sustainable and healthy economy, it is important to measure the economic and environmental effects of energy production [17].
The optimization of building energy prediction represents an important research area because of its potential to improve the efficiency of energy management systems [18].Many studies have shown that the air conditioning system consumes up to 38% of the total energy in the building sector [19,20].
As can be seen, the energy efficiency research domain will present great interest to the research community in the years to come, and the application of novel machine learning and artificial intelligence techniques can lead to a significant improvement in the existing techniques used for energy consumption prediction.
The solution presented in the manuscript is specific to the heating and cooling load prediction optimization problem, as it aims to improve the prediction results for two objectives, namely the heating load and the cooling load.Two particularities of the proposed solution were the consideration of three machine learning algorithms which had common hyperparameters and returned good predictions for energy data characterized by a small number of features and the development of an objective function which considered a 10-fold cross-validation and averaged the heating and cooling prediction results.
The main contributions of the work presented in this paper are as follows: (1) A critical review of the application of machine learning methods for the prediction of heating and cooling loads; (2) The introduction of a novel algorithm called the Multi-Objective Plum Tree Algorithm (MOPTA) which adapts the original Plum Tree Algorithm [21] to multi-objective optimization problems; (3) The ranking of the solutions using the MOORA method [22]; (4) The adaptation of the MOPTA to the hyperparameter optimization and the optimal regressor selection for a machine learning methodology used to predict heating and cooling loads, using the Energy Efficiency Dataset of the UCI Machine Learning Repository as experimental support [23,24]; (5) The development of an objective function that considers the averages of the heating and cooling RMSE results; (6) The comparison and validation of the obtained results with the ones obtained by the Multi-Objective Grey Wolf Optimizer (MOGWO) [25], Multi-Objective Particle Swarm Optimization (MOPSO) [26], and NSGA-II [27].
The manuscript is structured as follows: Section 2 presents the research background, Section 3 presents the MOPTA-based machine learning methodology for the optimization of heating and cooling load prediction, Section 4 presents the results, Section 5 compares the obtained results with the ones from previous studies, and Section 6 shows the conclusions.

Research Background
This section reviews representative recent studies that considered the application of machine learning for the prediction of heating and cooling loads.
In [28], the authors approached the prediction of the energy efficiency of buildings using machine learning techniques considering a small data approach.The method proposed by them considered the Support Vector Regression and the K-means algorithms.The prediction of heating and cooling parameters was considered as two separate tasks.The dataset was split at 75%:25% and the metrics used to evaluate the performance were mean square error and mean absolute error.Their proposed method was better in terms of mean square error and mean absolute error than other methods, such as the classical Support Vector Regression with rbf kernel.
The review from [29] presented a selection of representative studies that used datadriven techniques, such as machine learning and artificial intelligence, for the prediction of the cooling and heating loads of residential buildings.The review considered various techniques, such as ensemble learning, Artificial Neural Network, Support Vector Machines, probabilistic models, and statistical models.As support for their experiments, the review also considered recent studies that used the same experimental dataset as the one used in our manuscript.For example, the approach presented in [30] used an ensemble machine learning model based on three Random Forest models that achieved 0.999 R 2 for the heating load prediction and 0.997 R 2 for the cooling load prediction using a 10-fold cross-validation approach.On the other hand, the authors of [31] used an approach based on the Tri-Layered Neural Network and Maximum Relevance Minimum Redundancy that led to 0.289 mean absolute error for heating load and 0.535 mean absolute error for cooling load, respectively.
The approach presented in [32] considered a novel method for energy consumption estimation using Support Vector Machine and Random Forest.The Owl Search Algorithm [33] was used to improve the performance of these two algorithms.The root mean square error values returned by the approaches based on Support Vector Machine and Random Forest were 0.85 and 1.29 for heating and 1.02 and 1.65 for cooling, respectively.
The approach presented in [34] compared four algorithms, namely, the Linear Regression, the Decision Tree, the Random Forest, and the XGBoost.Like our approach, the data were split randomly into 80% training data and 20% testing data.The hyperparameters were optimized using Bayesian Optimization [35].The best results in terms of root mean square error for the testing data were obtained by the XGBoost algorithm, as follows: 0.3797 for the heating load and 0.7578 for the cooling load.
The mean square error results obtained by the authors of [36] were 0.201 for the heating load prediction and 2.56 for the cooling load forecast.Another approach-for example, the one presented in [37], which used the Multilayer Perceptron and Support Vector Regression algorithms-returned 0.4832 and 0.8853 root mean square error values for the heating load prediction and 2.626 and 1.7389 root mean square error values for the cooling load prediction.On the other hand, the approach presented in [38] based on the Gated Recurrent Unit returned the 0.0166 and 0.0247 root mean square error values for the heating and cooling load predictions, respectively, when hold-out was used, and 0.01 root mean square error values for both heating and cooling load predictions when 10-fold validation was used.
The authors of [39] considered an approach based on a Multi-Objective Optimization method for the tuning of the hyperparameters of a Random Forest model used for the prediction of heating and the cooling loads.The two objectives that were optimized were the averages of the heating and cooling load prediction.Compared to their approach, our method also predicts which regressor to use as part of the multi-objective optimization process.

Multi-Objective Plum Tree Algorithm (MOPTA) Machine Learning Methodology for Heating and Cooling Load Prediction
The original version of the Plum Tree Algorithm was introduced in [21] with the following sources of inspiration: • The plum trees flowering at the beginning of spring; • The transformation of the flowers, which are pollinated into plums; • The dropping of a percentage of the plums before maturity due to various reasons; • The continuity of the lives of the plums after the harvest for a couple of weeks.
The PTA presents similarities with other bio-inspired algorithms, such as Chicken Swarm Optimization [40], Particle Swarm Optimization, Grey Wolf Optimizer [41], and Crow Search Algorithm [42], which influenced how particular mathematical parts of the algorithm were modeled.
Table 1 summarizes the PTA's configurable parameters.The PTA starts with the initialization of N flowers in a search space with D dimensions, such that the values are selected randomly from the range [X min , X max ]: Then, N plums are initialized with the value of the flowers: The OF is used to calculate the fitness values of the flowers and of the plums.The plum gbest is set to the position of the plum that has the best fitness value.
Then, the PTA runs the following instructions I times.At the beginning of each iteration, the positions of the following two plums are computed: • plum ripe -the plum with the best fitness value • plum unripe -the plum with the second-best fitness value For each f lower k i , where k is the iteration number and i = 1, N, a random number r from the range [0, 1] is selected.Three cases, one for each phase, are considered further:

•
Fruitiness Phase (r ≥ FT): In this case, the positions of the flowers are updated using the following formula: where rand(FR min , FR max ) is a random number from [FR min , FR max ].
The following formula is used to update the positions of the flowers in this case: where r 1 and r 2 are random numbers from [0, 1].
• Storeness Phase (RT > r): The positions of the flowers are updated as follows: where N 0, σ 2 is a Gaussian distribution with mean 0 and standard deviation σ 2 , defined as follows: Then, the positions of the flowers are updated to be in [X min , X max ].For each j = 1, D : After the positions of the flowers are updated, for each plum k i , where k is the iteration number and i = 1, N, the following formula is used: At the end of each iteration, the position of plum gbest is updated to the position of the plum with the best fitness.
Finally, when all iterations are completed, the PTA returns the value of the plum gbest .

Heating and Cooling Load Prediction
The Energy Efficiency Dataset was split randomly using a 5-fold cross-validation.For each split out of the five splits, the testing data were represented by one different fold, while the training data were represented by the remaining folds.The training data were standardized using the Z-score for the values of each column, while the testing data were standardized using the mean and the standard deviation values, which were computed for the training data.
The algorithms that were used were the Extra Trees Regressor (ETR), the Gradient Boosting Regressor (GBR), and the Random Forest Regressor (RFR).The regressors were configured with the hyperparameter values described by the plums.The eight metrics which were used to evaluate the results were the averages of the RMSE, R 2 , MAE, and MAPE across the 5 folds for heating and cooling load prediction.

MOPTA Multi-Objective Fitness Function
Each position of a plum corresponds to an algorithm and its hyperparameter configuration.To apply the multi-objective fitness function to a plum, it is necessary to convert the values that describe the position of the plum to integers first.This is done using the floor function, which takes a real value as input and returns the greatest integer value that is less than or equal to it.
Figure 1 illustrates the high-level view of the multi-objective fitness function.Compared to the approach presented in [39], we did not consider the bootstrap parameter, as we aimed to use a set of hyperparameters that can be configured for all algorithms.However, compared to that approach, we added a new dimension that describes the algorithm, namely GBR, RFR, or ETR, such that 0 corresponds to GBR, 1 to RFR, and 2 to ETR, respectively.
The inputs of the fitness function are the converted position of the plum and the train data.The first dimension describes the algorithm, while the other five dimensions describe the values of the hyperparameters.
We performed a 10-fold cross-validation on the train data, and we computed the average RMSE values.In each partition, the test data are represented by one fold, and the train data are represented by the other nine folds.The configured selected algorithm was applied twice in each partition, depending on the prediction type.The first time, it was used to predict the heating load, while the second time, it was used to predict the cooling load.
The output of the fitness function is represented by the following two values:  The average Heating RMSE, denoted as  ;  The average Cooling RMSE, denoted as  .
The MOPTA aims to obtain minimal values for both values.

Multi-Objective Adaptations of PTA
The multi-objective adaptations of the PTA considered in this manuscript are similar to the ones we used in [25], and they are based on the method presented in [43].The major adaptations introduced by the MOPTA are the application of an external archive for the saving and retrieval of the solutions that are pareto-optimal and the use of this archive for obtaining the values for the ripe and the unripe plums.
The dominance relations between   ,  and   ,  are defined as follows: (1) If   and   , and at least one of the relations   and   is true, then  dominates  ; (2) If   and   , and at least one of the relations   and   is true, then  dominates  ; (3) If neither (1) nor ( 2) is true, then  and  are non-dominated.
A set that contains two solutions is non-dominated if neither solution dominates the other one.Compared to the approach presented in [39], we did not consider the bootstrap parameter, as we aimed to use a set of hyperparameters that can be configured for all algorithms.However, compared to that approach, we added a new dimension that describes the algorithm, namely GBR, RFR, or ETR, such that 0 corresponds to GBR, 1 to RFR, and 2 to ETR, respectively.
The inputs of the fitness function are the converted position of the plum and the train data.The first dimension describes the algorithm, while the other five dimensions describe the values of the hyperparameters.
We performed a 10-fold cross-validation on the train data, and we computed the average RMSE values.In each partition, the test data are represented by one fold, and the train data are represented by the other nine folds.The configured selected algorithm was applied twice in each partition, depending on the prediction type.The first time, it was used to predict the heating load, while the second time, it was used to predict the cooling load.
The output of the fitness function is represented by the following two values: • The average Heating RMSE, denoted as RMSE H ; • The average Cooling RMSE, denoted as RMSE C .
The MOPTA aims to obtain minimal values for both values.

Multi-Objective Adaptations of PTA
The multi-objective adaptations of the PTA considered in this manuscript are similar to the ones we used in [25], and they are based on the method presented in [43].The major adaptations introduced by the MOPTA are the application of an external archive for the saving and retrieval of the solutions that are pareto-optimal and the use of this archive for obtaining the values for the ripe and the unripe plums.
The dominance relations between  2) is true, then Plum 1 and Plum 2 are non-dominated.
A set that contains two solutions is non-dominated if neither solution dominates the other one.The input is represented by a set of plums  , … ,  , where  is the total number of plums.

Plum Matrix Grid Computation
step 1: The costs of the plums are computed using the multi-objective fitness function.The result is the following matrix: step 2: The minimum and the maximum cost values of  are computed using the  matrix and the grid's inflation parameter (): step 3: Similarly, the minimum and the maximum cost values of  are computed as follows: The input is represented by a set of plums plum 1 , . . ., plum N plums , where N plums is the total number of plums.step 1: The costs of the plums are computed using the multi-objective fitness function.The result is the following matrix: Energies 2024, 17, 3054 step 4: The plum matrix Grid was defined using the number of grids ng, such that the x-axis presents the endpoints of the RMSE H minimization objective, and the y-axis presents the endpoints of the RMSE C minimization objective.step 5: The formulas that are used for the computation of the index of plum i , with the cost RMSE H i , RMSE C i such that i = 1, N plums , are as follows: index The output is represented by the plum indices set index(plum 1 ), . . . ,index plum N plums .

Plum Selection Methodology
The ripe and unripe plums were selected using the archive of plums archive plums .The selection of these two plums was performed at each iteration of the MOPTA for each plum.
Figure 3 presents a high-level view of the methodology for the plum selection.
step 4: The plum matrix  was defined using the number of grids , such that the -axis presents the endpoints of the  minimization objective, and the -axis presents the endpoints of the  minimization objective.step 5: The formulas that are used for the computation of the index of  , with the cost  ,  such that  1,  , are as follows: The output is represented by the plum indices set   , … ,   .

Plum Selection Methodology
The ripe and unripe plums were selected using the archive of plums ℎ .The selection of these two plums was performed at each iteration of the MOPTA for each plum.
Figure 3 presents a high-level view of the methodology for the plum selection.The set  of occupied indices for ℎ is calculated as follows: ℎ    , … ,   (18) such that the function  converts the numbers received as input into a list of unique numbers sorted in increasing order.Suppose that there are  cells that are occupied, and each one of them is defined by the cell index .Then: The set OccIndex of occupied indices for archive plums is calculated as follows: OccIndex archive plums = Set(index(plum 1 ), . . . ,index(plum as )) (18) such that the function Set converts the numbers received as input into a list of unique numbers sorted in increasing order.Suppose that there are M cells that are occupied, and each one of them is defined by the cell index cindex.Then: Energies 2024, 17, 3054 9 of 23 The following vector that stores the cell count of each plum is defined: OccCnt archive plums = {ccnt 1 , . . . ,ccnt M } (20) such that for each i = 1, M, the value ccnt i represents how many plums are present at the location cindex i .Then, a random number r is selected from {1, . . . ,M} using a roulette wheel selection mechanism defined by the following: The set Selected archive plums was defined by the formula: such that i = 1, as, where as is the archive size, was used to select the ripe or the unripe plum, randomly considering the uniform probability.

Plum Removal Methodology
The size of archive plums was adjusted during each iteration of the algorithm if it was greater than the maximum archive size (mas).Figure 4 presents a high-level view of the methodology for the removal of the plums.
The following vector that stores the cell count of each plum is defined: such that for each  1,  , the value  represents how many plums are present at the location  .Then, a random number  is selected from 1, … ,  using a roulette wheel selection mechanism defined by the following: The set  ℎ was defined by the formula: such that  1,  , where  is the archive size, was used to select the ripe or the unripe plum, randomly considering the uniform probability.

Plum Removal Methodology
The size of ℎ was adjusted during each iteration of the algorithm if it was greater than the maximum archive size ().Figure 4 presents a high-level view of the methodology for the removal of the plums.Therefore, a number of   plums were removed from the archive using the steps similar to the ones presented in the plum selection methodology, with the following two adaptations: Adaptation 1: the plums removal probability, which was used instead of the  probability, was defined using the formula: such that the parameter  describes the plum selection pressure.
Adaptation 2: the removal of the selected plum from ℎ , as the last step of the methodology.Therefore, a number of (as − mas) plums were removed from the archive using the steps similar to the ones presented in the plum selection methodology, with the following two adaptations: Adaptation 1: the plums removal probability, which was used instead of the P plum probability, was defined using the formula: such that the parameter ζ describes the plum selection pressure.Adaptation 2: the removal of the selected plum from archive plums , as the last step of the methodology.

MOPTA for Heating and Cooling Prediction
Algorithm 1 presents the MOPTA for heating and cooling prediction.The input parameters of the MOPTA consist of the standard input parameters of the PTA, which are presented in Table 1, and the following additional parameters: The OF is a multi-objective function that returns the average values of the RMSE H and RMSE C of the ML algorithm trained and validated according to the position of the plum.The range [X min , X max ] was adapted such that the first dimension describes the selected regressor, while the other dimensions describe the limits of the hyperparameters considered in the training of the ML algorithm.
The output of the MOPTA is archive plums , which consists of the non-dominant plums after I iterations.
The N flowers were initialized with random values from [X min , X max ] in the D- dimensional search space (line 3), while the N plums were initialized to the positions of the N flowers in line 4.Then, both the positions of the plums and of the flowers were adapted to arrays of integers in line 5 using the floor function.The multi-objective fitness function OF presented in Section 3.2 was applied to each plum i (i = 1, . . ., N) (line 6).
Using the conditions presented at the beginning of Section 3.3, the dominance relation was determined in line 7 of the algorithm.The archive plums of non-dominated plums was created in line 8.Then, using the steps presented in Section 3.3.1, the grid matrix and the indices of the plums were computed.
The instructions from lines 11-33 were repeated for I iterations.For each f lower i (i = 1, . . ., N), the instructions from lines 12-16 were performed.The current size of the archive as was updated in line 12 to the total number of plums from archive plums .The values of plum ripe and plum unripe were computed in line 13 using the values of as, π, and grids and the plum selection methodology presented in Section 3.3.2.
Initially, the ripe and the unripe plums were selected randomly from archive plums .If as > 1, then plum ripe was selected from archive plums − plum unripe following the steps from Section 3.3.2.
The positions f lower i (i = 1, . . ., N) were updated in line 15 using Equations ( 3)-( 6) for the three phases: fruitiness phase, ripeness phase, and storeness phase.The equations for the storeness phase were adapted for the multi-objective optimization using a procedure adapted after the one from [44].Equation ( 6) was adapted to the equation: such that the function F was defined as follows: where (RMSE H , RMSE C ) = OF(plum).
Then, the positions of the flowers were updated to be in [X min , X max ] (line 16).The instructions from (lines 19-24) were performed for each plum i (i = 1, . . ., N).First, the plum i and the corresponding f lower i were updated to arrays of integers using the floor function (line 19).Then, the OF was used to compute the fitness values Plum i and Flower i of plum i and f lower i , respectively (line 20).If Flower i dominated Plum i , then the position plum i and the fitness value Plum i were updated to f lower i and Flower i , respectively.
The plum dominance relation was determined again in line 26.The non-dominated plums nplums were computed in line 27.Then, in line 28, the plums from archive plums were appended to nplums (line 29) and the nplums were updated in line 30.
The matrix grids was computed in line 31, while the value as was computed in line 32.If the value as was greater than mas, then (as − mas) plums were removed from the archive, according to the methodology presented in Section 3.3.3.
Finally, the MOPTA returned the archive plums as output in line 35.

Solution Ranking Using MOORA
The solutions which were returned by the MOPTA were ranked using an adaptation of MOORA [22,45].
The matrix D was defined as follows: where m is the size of the plums archive and RMSE H i , RMSE C i represent the RMSE values predicted by the model trained according to the position of the i-th plum, where i = 1, . . ., m.
The values of D were normalized as follows: The MOORA scores of the plums from the archive of size m were finally computed as follows: The most dominant plum was the one with the lowest MOORA score.

MOPTA Methodology for Heating and Cooling Prediction
Figure 5 presents the high-level view of the MOPTA methodology which was used for the prediction of the heating and cooling loads.
where  is the size of the plums archive and  ,  represent the  values predicted by the model trained according to the position of the -th plum, where  1, … , .
The values of  were normalized as follows: The MOORA scores of the plums from the archive of size  were finally computed as follows: The most dominant plum was the one with the lowest MOORA score.

MOPTA Methodology for Heating and Cooling Prediction
Figure 5 presents the high-level view of the MOPTA methodology which was used for the prediction of the heating and cooling loads.

Results
The experiments were performed in Python version 3.12.3using the sklearn library on a machine with the following properties:
All the computations were CPU-based.

Energy Efficiency Dataset
The Energy Efficiency Dataset used in the experiments was characterized by 768 samples, eight attributes, and two responses.The dataset was obtained considering 12 building shapes, simulated in Ecotect.Table 2 presents the summary of the features.The dataset was split randomly into five folds of an approximately equal size, such that the Testing Data were represented by one of the folds while the Training Data were represented by the other four folds.

Hyperparameters Configuration
Table 3 presents the ranges of the hyperparameters used in the experiments.The values were inspired by the ones used by the authors of [39].

MOPTA Configuration Parameters
Table 4 presents the MOPTA configuration parameters used in our experiments.As a remark, in the case of X max , the table also adds the value 1 to the upper limit since the search space is represented by continuous values.However, if the upper limit is obtained, then the value 1 is subtracted from that value.Figure 6 presents this adjusting transformation more clearly for the first dimension.As a remark, in the case of  , the table also adds the value 1 to the upper limit since the search space is represented by continuous values.However, if the upper limit is obtained, then the value 1 is subtracted from that value.Figure 6 presents this adjusting transformation more clearly for the first dimension.As can be seen in the figure, the values from 0, 1 are adjusted to 0, the values from 1, 2 are adjusted to 1, and the values from 2, 3 are adjusted to 2, respectively.

MOPTA Prediction Results
Table 5 presents the results obtained by the MOPTA for each of the five folds and the mean results.As can be seen in the figure, the values from [0, 1) are adjusted to 0, the values from [1, 2) are adjusted to 1, and the values from [2, 3] are adjusted to 2, respectively.

MOPTA Prediction Results
Table 5 presents the results obtained by the MOPTA for each of the five folds and the mean results.As can be seen in the table, in the case of the MAPE metric, the result was negative for Fold 4. The negative value is justified by the fact that after the standardization operation, the labels had both positive and negative values.In all five cases, the selected algorithm was the GBR.

Comparison to the Prediction Results Obtained Using the Default Parameters
Table 6 compares the results obtained by the MOPTA approach to the ones obtained by each of the algorithms GBR, RFR, and ETR when the default values were used.We considered these three algorithms in the comparison because they are used by the MOPTA as part of the optimization process.Moreover, the best solution returned by the MOPTA describes which of the three algorithms is applied and the optimal values of the hyperparameters.Basically, in this table, each MOPTA result corresponds to one of the three algorithms, depending on the value of the first dimension of the best plum, tuned according to the values of the remaining dimensions of the best plum.To get reproducible results, each of the algorithms was initialized with a random_state equal to 42.The obtained results show that the MOPTA RMSE results were better than the ones returned by the GBR, the RFR, and the ETR, both for the heating predictions and the cooling predictions in all cases.

Comparison to Other Multi-Objective Optimization Approaches
The results obtained using the MOPTA were compared to the ones obtained by the MOGWO, the MOPSO, and the NSGA-II.We considered these algorithm methods in the comparison because a part of the mathematical equations of the Plum Tree Algorithm were inspired by the Grey Wolf Optimizer and the Particle Swarm Optimization, while the Genetic Algorithms, which at the base of the NSGA-II, are one of the most popular evolutionary algorithms.Moreover, the Particle Swarm Optimization and the Grey Wolf Optimizer are some of the most popular swarm intelligence algorithms.Therefore, we considered the multi-objective implementations of these three benchmark algorithms to validate our results.In the case of the MOGWO algorithm, we used the implementation from our previous work [25], as it was used for this type of problem.The MOPSO and NSGA-II were also used in [25] to validate our results.For NSGA-II, we considered the implementation from the DEAP (Distributed Evolutionary Algorithms in Python) framework [46].Some of the configuration parameters have the same values as in the case of the MOPTA, while other parameters were specific to each algorithm.The common configuration parameters for all four algorithms were the number of iterations I, the number of dimensions D, the population size N, the minimum and the maximum position values X min and X max , and the objective function OF.
The configuration parameters that were common to the MOPTA, the MOGWO, and the MOPSO were the maximum archive size mas, the number of grids ng, the grid's inflation parameter ϵ, the pressure parameter π, and the selection pressure parameter ζ.
Table 7 presents the specific configuration parameter values for each algorithm.The MOGWO returned the best RMSE for cooling for Fold 1 and Fold 2 and the best RMSE for heating for Fold 2 and Fold 3. The MOPSO returned the best RMSE for heating for Fold 1 and the best RMSE for cooling for Fold 4. The NSGA-II returned the best RMSE for cooling for Fold 4. The MOPTA returned the best RMSE for heating for all folds except for Fold 1, and the best RMSE for cooling for Fold 1, Fold 2, and Fold 5. Also, the MOPTA obtained the best mean RMSE values both for the heating predictions and for the cooling predictions.Another remark is that all of the multi-objective algorithms selected the GBR, even though, as can be seen in Table 6, the GBR does not always return the best RMSE results when compared to RFR and ETR when the default parameter values are used.
Table 9 describes how many times each algorithm was the best with respect to the five folds.We can see that in seven cases, the MOPTA was the best.The second algorithm, the MOGWO, was the best in only four cases.

Computational Load Analysis
This section presents a computational load analysis of the algorithms used in our experiments from the perspective of the running time.Table 10 summarizes the total running time expressed in milliseconds for each algorithm across all five folds.As can be seen in the table, the GBR, the RFR, and the ETR had the best running time.The running time of the MOPTA, which was approximately 19.5 h, was almost double compared to the running time of the MOGWO and MOPSO algorithms.The running time of the NSGA-II was almost 12 times better than the one of the MOPTA.
However, we also want to point out that a grid search which searches through all combinations of hyperparameters, namely 3 × 401 × 91 × 10 × 9 × 8 = 78, 820, 560, and which would need an average of around 200 ms per experiment, a value which is slightly less than that of the GBR algorithm when it is tuned with the default parameters (e.g., 297 ms), would need around 15,764,112,000 ms to complete, or around 182 days.With respect to these remarks, we can conclude that the MOPTA has a much better running time compared to the running time of the standard grid search.

Robustness and Convergence Analysis
In this section, we discuss the robustness and the convergence of the MOPTA, and we compare the results to the ones corresponding to the MOGWO, the MOPSO, and the NSGA-II.
To compute the robustness values, we consider the heating and cooling RMSE results obtained for each fold and calculate the standard deviation.Table 11 summarizes the comparison of the standard deviation (std) values obtained for each multi-objective optimization algorithm.If we are to consider that a lower variability means a better robustness value, then the MOPSO returned the best result for the heating standard deviation, while the MOPTA returned the best result for the cooling standard deviation.
For the analysis of the convergence of the MOPTA, we identified the first iteration, which returned the best value for each fold.We performed similar calculations for the other multi-objective algorithms.Table 12 summarizes the convergence analysis results.The MOPTA converged relatively fast compared to the other algorithms except for Fold 5, where it obtained the best result in Iteration 20.In the case of the other algorithms, the best results were obtained after more iterations, except for the PSO and Fold 5, when the best result was obtained in Iteration 2.

Discussions
This section compares our results to the ones obtained by recent studies in the literature.Table 13 presents a summary of the result comparison building upon the comparison results presented in [21].The articles presented in the table were selected so that a 5-fold crossvalidation was used, and the models performed two predictions, one for the heating load and the other one for the cooling load.A method that uses the Imperialist Competition Algorithm to optimize the weights and the biases of an Artificial Neural Network.

2.7795
The results presented in [47,48] are better than the ones presented in [50], but they are not directly comparable because no standardization was performed.For articles [31,49,50], we presented only the best values.In the case of article [34], which used a 5-fold crossvalidation like our approach, we presented only the best results.Even though the results from [34] are not directly comparable, as we also used standardization, they were better than the ones from [31].However, the root mean square error results of the current approach based on the Multi-Objective Plum Tree Algorithm were significantly better than the results obtained by the Plum Tree Algorithm-based ensemble.
Even though they are not directly comparable to our results because of different preprocessing configurations or cross-validation settings, the recent studies based on the latest deep learning methods returned promising results.For example, the approach presented in [51] based on deep neural networks returned a root mean square error equal to 0.0137.However, compared to our approach, the problem was converted into an image processing problem by transforming the data into image datasets.Rounding the heating and cooling load values to the closest integer, the issue was converted into a multi-class classification issue.

Conclusions
The manuscript presented a novel approach based on the Multi-Objective Plum Tree Algorithm for the prediction of the heating and cooling loads.The dataset that was used for the testing and validation of the approach was the Energy Efficiency Dataset from the UCI Machine Learning Repository.The solutions were ranked using the MOORA method.The results are better than the ones returned by the individual predictors Gradient Boosting Regressor, Random Forest Regressor, and Extra Trees Regressor, respectively.The results were also comparable to the ones returned by other multi-objective optimization approaches such as the Multi-Objective Grey Wolf Optimizer, the Multi-Objective Particle Swarm Optimization, and the NSGA-II, respectively.Compared to the results obtained in one of the previous research studies, where the Plum Tree Algorithm was used to tune an ensemble of predictors, the results were better.Also, the Multi-Objective Plum Tree Algorithm results were compared to the ones from the literature.The following directions are proposed for future research work: • The improvement of the performance of the proposed algorithm through hybridization or the use of concepts such as Levy flights; • The comparison of the results obtained by the Multi-Objective Plum Tree Algorithm to the ones obtained using other multi-objective optimization approaches that were not considered in the current paper; • The application of the Multi-Objective Plum Tree Algorithm to more engineering problems from the field of energy efficiency prediction; • The adaptation of the prediction methodology presented in the manuscript to a larger class of energy-consuming buildings, the ones which belong to the heavy industry, considering more data about the characteristics of the buildings, which complement characteristics such as the surface area, the overall height, and the orientation.

Figure 2
Figure 2 presents the methodology for the computation of the plum matrix grid.

•
mas-the maximum archive size of the repository that contains the nondominated solutions; • ng-the number of grids per objective; • ϵ-the grid's inflation parameter; • π-the pressure parameter used during the plum selection; • ζ-the plum selection pressure parameter used during the plum removal.

Figure 5 .
Figure 5. MOPTA methodology for heating and cooling load prediction.The input of the methodology was represented by the Energy Efficiency Dataset.The data were split into Training Data and Testing Data considering a 5-Fold Cross-Validation approach, such that five splits were performed.Each time, one fold was used for testing and the remaining ones for training.Then, the Standardized Training Data and the Standardized Testing Data were obtained.The MOPTA was run using the Standardized Training Data as input, with a 10-fold cross-validation to evaluate the plums.The archive returned by the algorithm was evaluated using MOORA, and the plum with the best

Figure 5 .
Figure 5. MOPTA methodology for heating and cooling load prediction.The input of the methodology was represented by the Energy Efficiency Dataset.The data were split into Training Data and Testing Data considering a 5-Fold Cross-Validation approach, such that five splits were performed.Each time, one fold was used for testing and the remaining ones for training.Then, the Standardized Training Data and the Standardized Testing Data were obtained.The MOPTA was run using the Standardized Training Data as input, with a 10-fold cross-validation to evaluate the plums.The archive returned by the algorithm was evaluated using MOORA, and the plum with the best MOORA score was further considered to evaluate the predictions.The predictions were evaluated using the MAPE, RMSE, R 2 , and MAE metrics.
and at least one of the relations RMSE H 1 < RMSE H 2 and RMSE C 1 < RMSE C 2 is true, then Plum 1 dominates Plum 2 ; (2) If RMSE H 1 ≥ RMSE H 2 and RMSE C 1 ≥ RMSE C 2 , and at least one of the relations RMSE H 1 > RMSE H 2 and RMSE C 1 > RMSE C 2 is true, then Plum 2 dominates Plum 1 ; (3)If neither (1) nor ( ∆RMSE H = ϵ × max RMSE H − min RMSE H step 2: The minimum and the maximum cost values of RMSE H are computed using the Cost matrix and the grid's inflation parameter (ϵ):   min RMSE C 1 , . . ., RMSE C N plums max RMSE C 1 , . . ., RMSE C N plums   (12) Input I, D, N, FT, RT, FR min , FR max , ε, X min , X max , OF, mas, ng, ϵ, π, ζ 2:Output archive plums 3:initialize N flowers in the D−dimensional space with values from [X min , X max ]; 4:initialize N plums to the positions of the N flowers; 5:adapt the positions plum i and f lower i (i = 1, ..., N) to arrays of integers; 6:RMSE H i , RMSE C i = OF(plum i ) (i =1, . .., N); GetGrids archive plums , ng, ϵ ; 10: for iter = 1 to I do 11: for i = 1 to N do 12: as = SizeO f archive plums ; 13: determine plum ripe and plum unripe from archive plums using as, π, grids; 14: update r to a random number from [0, 1]; 15: update f lower i according to FT, RT, FR min , FR max , r; 16:adjust the flowers to be in the range [X min , X max ]; i and f lower i to arrays of integers; 20: use OF to get the fitness values Plum i and Flower i of plum i and f lower i ; 21: if dominates(Flower i , Plum i ) then

Table 2 .
Energy Efficiency Dataset features summary.

Table 6 .
Comparison of the MOPTA results to the ones obtained by each algorithm when the default parameters were used.

Table 7 .
Specific configuration parameters values.

Table 8
compares the MOPTA results to the ones obtained by the other three multiobjective optimization algorithms.

Table 8 .
Comparison of the MOPTA results to the ones obtained by the MOGWO, the MOPTA, and the NSGA-II.

Table 9 .
Summary of the algorithm ranking.

Table 10 .
Running time comparison.

Table 11 .
Standard deviation results comparison summary.

Table 12 .
Convergence analysis results summary.

Table 13 .
Comparison to the literature results.