Abstract
A hybrid Artificial Intelligence (AI) framework centered on metamodeling, integrating simulation data with hybrid data-driven techniques, was implemented to enhance the predictive accuracy and optimization of thermal load projections in three distinct climates in Morocco. Initially, 13 machine learning (ML) models were assessed to predict heating and cooling loads. The best-performing models from this stage were then selected for the subsequent phase to find out the optimal combinations of inputs to predict thermal loads. In this phase, an Integral Feature Selection (IFS) method was employed in conjunction with the best ML models. An extensive evaluation using advanced statistical measures was performed during the evaluation stage. The results reveal that, for each climate, numerous high-accuracy prediction pathways were identified for thermal load prediction, surpassing the confidence level of 99% for R2. The results found here outperformed those reported by other researchers in thermal load predictions for Low-Energy Buildings (LEBs).
1. Introduction
The need for sustainable and energy-efficient building strategies is becoming increasingly urgent, especially in regions like Morocco, which spans diverse climatic zones—from Mediterranean to semi-arid and cold mountainous regions. In line with its low carbon transition strategy, Morocco has pledged to reduce national greenhouse gas emissions by 40% by 2030 and 77% by 2050 [,]. Within this context, Low-Energy Buildings (LEBs) are crucial for meeting these goals. However, a persistent Energy Performance Gap (EPG)—the discrepancy between predicted and actual energy use—remains a challenge, particularly in residential buildings where geometry, insulation, and local climate play pivotal roles [,]. Bridging this gap necessitates accurate thermal load predictions using advanced methodologies tailored to Morocco’s climatic and socio-economic realities [].
Artificial intelligence (AI), and more specifically machine learning (ML), has shown great promise in improving the precision of heating and cooling load predictions for LEBs. Hybrid ML models such as XGBoost and LightGBM have demonstrated high predictive performance (R2 > 0.99) by integrating optimized feature selection techniques and climate-sensitive inputs []. Transformer-based architectures have also proven capable of multi-step forecasting with remarkably low errors in complex environments [], while Physics-Informed Neural Networks (PINNs) provide robust predictions in thermodynamic applications, even with sparse datasets []. These methods are further supported by interpretable models like decision trees, which help identify critical predictors (e.g., glazing area, compactness) and align well with Morocco’s material efficiency and solar integration objectives [,].
The use of hybrid AI techniques is growing in building energy prediction. For instance, [] combined Support Vector Regression (SVR) and XGBoost with six metaheuristic algorithms and identified SBO-XGBoost as the best-performing model for both heating (R2 = 0.9380) and cooling (R2 = 0.9583) loads. In [], the authors evaluated four ML models—Multi-Layer Perceptron (MLP), Extreme Learning Machine (ELM), Radial Basis Function (RBF), and Response Surface Methodology (RSM)—and found ELM to outperform others (R2 = 0.9850 for heating and R2 = 0.9916 for cooling). Likewise, [] used an enhanced SVR combined with Feature Selection (FS) to predict cooling loads in public buildings, relying on dispersion metrics for evaluation. In Morocco-specific research, [] compared Artificial Neural Network (ANN) and Generalized Linear Model(GLM) models for heating load prediction across six climatic zones and found ANN to be more accurate (R2 = 0.95), although it suffered from stochastic variability.
An extensive review in [] highlights two critical gaps: (1) the underrepresentation of building-specific characteristics—such as geometry or orientation—as predictors, and (2) the lack of clear taxonomies distinguishing between ML, deep learning, and forecasting methods. Most models in the literature rely heavily on historical energy consumption data, overlooking the predictive value of architectural design variables. Furthermore, while numerous studies use AI to predict thermal loads using various features, none systematically compare all possible combinations of predictor variables to find optimal input configurations.
In contrast, the present study offers a holistic and comparative approach to thermal load prediction in LEBs using hybrid AI models. It investigates how different predictor variables interact across three distinct Moroccan climates, with the goal of optimizing both model accuracy and computational efficiency. This work integrates Integral Feature Selection (IFS) techniques with several advanced ML algorithms to discover the most relevant input features and ideal model structures.
The primary aim of this study is not merely to achieve accurate predictions of thermal loads using AI models, but to leverage these predictions to inform and optimize energy-efficient design decisions and operational strategies for LEBs. By identifying the most influential building and environmental parameters through hybrid IFS–ML frameworks, the findings provide actionable insights for architects, engineers, and energy managers. These insights support more informed design choices—such as optimized window-to-wall ratios, insulation levels, and orientation—which can contribute to substantial reductions in heating and cooling demand. While this study does not directly quantify the exact energy savings from each intervention, it lays the foundation for integrated building performance simulations and future work that links predictive accuracy to real-world energy savings and Heating, Ventilation and Air-Conditioning (HVAC) system optimization.
The main objectives and contributions of this study are:
- To analyze the partial dependence and relative importance of predictor variables for heating and cooling loads;
- To evaluate and compare various hybrid IFS–ML models for accurate thermal load prediction;
- To identify the most influential predictor combinations for enhanced model performance across diverse climates;
- To contribute actionable insights toward improving building energy management strategies in Morocco.
The structure of the paper is as follows:
- Section 2 presents the methodology, including the building case study, climate data, ML models, and optimization framework;
- Section 3 reports on the performance evaluation of the models and discusses the results;
- Section 4 concludes the study, and outlines the study limitations and directions for future research.
2. Materials and Methods
2.1. Weather Data and Locations
This study used typical year data for three distinct Moroccan climates: Ifrane, Meknes, and Marrakech. These cities were selected based on their diverse climatic characteristics, ranging from cold to semi-arid conditions. Ifrane experiences a humid and temperate climate, characterized by higher rainfall in winter than in summer, classified as Csb on the Köppen–Geiger climate map, with an average annual temperature of 15 °C. Meknes exhibits a Mediterranean climate with moderate, rainy winters and hot, dry summers. The temperature ranges from 30 °C to 44 °C in the warmest month and from 0 °C to 7 °C in the coldest month. On the other hand, Marrakech has a semi-arid climate, featuring an average annual temperature of 20 °C and an average annual rainfall of 281 mm, which is lower than the average for the Mediterranean climatic zone. Table 1 provides details on specific climatic characteristics such as dry bulb temperature (DBT), cooling degree days (CDD), and heating degree days (HDD).
Table 1.
Locations and climate characteristics.
2.2. Building Description and Simulation
This study examined a multi-story building, as depicted in Figure 1, showcasing the geometric model. The building model under investigation was proposed by the National Agency for the Development of Renewable Energy and Energy Efficiency (ADEREE). It was segmented into four levels, with two apartments on each floor. Occupancy patterns during weekdays involve each flat being occupied by 5 people from 05:00 p.m. to 07:30 a.m. and by 2 people during the remaining time. On weekends, each home is occupied by 5 people. The overall window-to-wall ratio is 21%, and the net floor space covers 588 m2. The windows are single-glazed with a heat-transfer coefficient (U-value) of 5.74 W/m2.K and a solar heat gain coefficient (SHGC) of 0.87. External shading devices (50%) are utilized during the summer from 07:30 a.m. to 05:00 p.m. Additional information about the building, architectural plan, floor areas, and external surfaces can be found in []. The building was simulated in TRNSYS 18 software for 8760 h (one year) with a 1 h time step. The assumptions employed to calculate the heating and cooling loads are shown in [,].
Figure 1.
Model simulation—(a) perspective, (b) east view, (c) south view, (d) north view, (e) west view.
The case study was based on a single prototypical residential LEB, selected to ensure a consistent basis for model comparison across different climatic zones. This reference building aligns with widely accepted design practices and performance standards in energy-efficient construction. The rationale for this selection lies in the need to isolate and analyze the effects of climate and predictor variables without introducing confounding variability from differing building geometries or operational profiles. Despite this focused approach, the framework developed is inherently flexible and can be applied to various building types, provided appropriate input data are available. Future studies will aim to validate the generalizability of the method across multiple building archetypes and use cases.
Table 2 outlines the characteristics of the envelope materials, with the thermo-physical properties sourced from the TRNSYS library. The set-point temperature is maintained at 20 °C in winter and 26 °C in summer, following the Moroccan standard (NM ISO 7730) [,].
Table 2.
Building construction materials.
2.3. Predictor Variables
Ten input parameters pertaining to the building envelope and HVAC system were chosen as predictor variables. Several of these parameters have been investigated in recent research, considering the anomalies observed in the current national thermal regulation. These parameters include the heat transfer coefficient of external walls, the coating of opaque elements, air change rate, windows-to-wall ratio, and type of glazing. The ranges of variation for these input factors are detailed in Table 3.
Table 3.
Locations and climate characteristics.
2.4. Methodology
The initial step involves enhancing the efficiency of building energy optimization by simultaneously reducing computation time and increasing accuracy. To achieve this, a MATLAB 2023a code generates 1000 quasi-random samples, serving as the initial sample size for the predictor variables outlined in Table 3. Subsequently, TRNSYS software is used as a computer simulation tool, coupled with MATLAB via TRNSYS Type155, to execute building energy simulations for each configuration and climate.
The flowchart depicting the adopted methodology is illustrated in Figure 2.
Figure 2.
The main steps in our methodology.
The main steps are summarized as follows:
- The dataset, comprising predictor variables and corresponding heating and cooling loads, underwent pre-processing involving scaling and normalization. Subsequently, the pre-processed data were randomly split into a training dataset (70%) and a test dataset (30%):
- Thirteen different ML models were checked for predicting thermal loads (heating, cooling, and total loads): artificial neural network (ANN), decision trees (DT), Support Vector Machine (SVM), Extreme Learning Machine (ELM), Extreme Gradient Boosting (XGBoost), random forest (RF), Tree Bagger (TreeBag), Generalized Linear Regression (GLR) model, Gaussian Process Regression (GR), Linear Regression (LR), Generalized Additive Model (GAM), Kernelized Ridge Regression (KRR) model, and Linear Ridge Regression (LRR).
- Following the selection of the best ML models in step 2, a comprehensive statistical analysis was conducted to discover the optimal combinations of predictor variables for accurately predicting heating and cooling loads. This was achieved through a hybrid approach employing the proposed IFS–ML approach;
- Based on the best combinations of the predictor variables, the thermal loads were predicted by employing, respectively, the best IFS–ML models.
2.5. Hybrid AI Models and Evaluation Metrics
This section provides a short description of each hybrid AI technique employed. Readers can find more details about each technique in the given references.
2.5.1. Employed ML Models
The ML models we have employed for predicting thermal loads are summarized in Table 4. In this work, grid search and the Bayesian optimization methods are used to find the optimal hyperparameter settings for each model [,].
Table 4.
Locations and climate characteristics.
2.5.2. Integral Feature Selection
In [], the authors introduced a novel method within the Integral Variable Selection (IVS) framework. This method identifies the optimal combinations of predictor variables for modeling, prediction, and forecasting tasks. It systematically evaluates all possible combinations of input variables to select the most efficient and effective set. Moreover, it aims to maximize the prediction accuracy of the output variable (objective function). The number of possible input combinations is determined using the “n chooses k” formula, represented by the binomial coefficient Equation (1) or simply using ():
where n represents the total number of input variables, and k represents the number of variables to be selected in each combination.
The proposed method comprises several steps, as illustrated in Figure 3. The algorithm is initiated, and input and output data are imported. The data are then split into training and testing–validating sets (70/30 split) and pre-processed using normalization and autonomous anomaly detection techniques. The total number of possible input combinations is computed using Equation (1). The method then enters the first loop based on the size of the data, K1. Within this loop, the number of combinations is computed for each ith considered size, and a second loop is initiated for each value of K2. The combnk (V, K) function generates a matrix with K columns. The ML model is loaded, and the data are processed to predict the corresponding output parameter for each combination. The predicted values are then saved, and the algorithm moves on to the next iteration. After computing the predicted values for all possible combinations, a second algorithm performs statistical analysis to identify the best input combinations that give the best accuracy prediction. Finally, the algorithm concludes after determining the optimal combinations of the predictor variables.
Figure 3.
Flowchart for the employed IFS method [].
2.5.3. Statistical Accuracy Assessment
In [], the authors introduced a modified version of the performance score to rank the effectiveness of the applied ML and hybrid IFS–ML models. The performance score (φ), defined by Equation (2), is used for this evaluation, where higher values of φ indicate poorer model performance.
The performance indicators in Equation (2) include the Mean Bias Error (MBE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), standard deviation (σ), and the Coefficient of Determination (R2). These indicators are detailed in Equations (3)–(7).
Here, , , and represent the ith predicted value, the ith measured value, and the mean value, respectively, while K denotes the total number of measurements.
Hence, this paper only gives brief information for each statistical indicator, and the readers are referred to [] for more details.
3. Results and Discussion
3.1. Sensitivity Analysis: Evaluating Predictor Variable Impacts on Thermal Loads
Understanding the influence of predictor variables on heating and cooling loads is crucial for optimizing building energy management strategies. In this section, we delve into a sensitivity analysis using the coefficient of determination (R2) to assess the impacts of individual predictor variables on the heating (QHEAT) and cooling (QCOOL) thermal loads. Recent studies, such as [], demonstrate the effectiveness of R2 in quantifying the relative importance of factors like building geometry and material properties, while advanced sensitivity frameworks [] highlight methodologies to address uncertainties in load prediction. This analysis is conducted across different climatic conditions, offering insights into the variable dependencies that contribute significantly to the energy dynamics of LEBs. The results found are illustrated in Figure 4, Figure 5 and Figure 6.
Figure 4.
Matrix of correlation between the considered predictor variables and thermal loads for Meknes.
Figure 5.
Matrix of correlation between the considered predictor variables and thermal loads for Ifrane.
Figure 6.
Matrix of correlation between predictor variables and thermal loads for Marrakech.
As a key metric, the coefficient of determination R2 provides a quantitative measure of the proportion of variance in thermal loads explained by each predictor variable. By scrutinizing R2 values, we aim to identify the most influential factors shaping QHEAT and QCOOL under diverse climate scenarios. Recent studies, such as [], demonstrate that LightGBM models achieve R2 values as high as 0.9959 when evaluating features like glazing distribution and insulation efficiency across climatic regions, outperforming methods like random forest (RF) and long short-term memory (LSTM) networks. Similarly, deep learning frameworks incorporating bidirectional gated recurrent units (Bi-GRU) show enhanced predictive accuracy for ultra-short-term heating loads, with R2 values reflecting robust performance in time-shifted feature analysis []. This comprehensive examination allows us to discern the relative importance of each predictor variable and elucidate their unique contributions to the overall predictive accuracy of the ML models. For instance, interpretable classifiers like decision trees and rule induction (RI) leverage R2 to prioritize factors such as relative compactness and glazing area in residential buildings [].
The upcoming figures and paragraphs present a detailed breakdown of pearson correlation r values for each predictor variable, unveiling the nuances of their impacts on heating and cooling loads. This sensitivity analysis not only refines our understanding of the complex interplay between predictors and thermal loads, but also lays the foundation for informed decision-making in designing energy-efficient solutions tailored to specific climates.
In a Mediterranean climate like that of Meknes (Figure 4), there is a strong correlation with QHEAT primarily associated with X6 (r = 0.93). X1, X7, and X2 exhibit very weak correlations, while other predictor variables show almost no correlation with QHEAT. Regarding QCOOL, a moderate correlation is observed with X8 and X7, a minor correlation is present with X6 and X2, and even less of a correlation is seen with X9 and X4. Other variables show a lack of correlation.
In a cold climate like that of Ifrane (Figure 5), almost similar to the results shown for Meknes, the correlation between QHEAT and X6 reaches the confidence interval. X1, X7, X2, and X3 exhibit very weak correlations, while other predictor variables show no correlation with QHEAT. Regarding QCOOL, a moderate correlation is observed with X8, followed by X7; a minor correlation is present with X6 and X2, and even less is seen with X4 and X9. Other variables indicate a lack of correlation.
In a semi-arid climate like that of Marrakech (Figure 6), X6 demonstrates a relatively strong correlation with QHEAT (r = 0.90). X1, X2, X7, and X3 exhibit low to very low correlations, respectively. Other variables show no significant correlation. Regarding QCOOL, a moderate correlation is identified with X7 and X8, a minor correlation is present with X2 and X6, and even less is seen with X4 and X9. Other variables indicate a lack of correlation.
In summary, for the three climates considered concerning QHEAT, it can be concluded that a strong correlation is evident with the air change rate (X6) (average r = 0.92). X1, X7, X2, and, to a lesser extent, X3 show low to very low correlations. No significant correlation is found for other variables. Regarding QCOOL, a moderate correlation is demonstrated by the east window-to-wall ratio (X8) (average r = 0.60), followed by X7 (average r = 0.52). X2, X6, X4, and X9 exhibit low to very low correlations, respectively, while others show no notable correlation.
The inclusion of all available predictor variables at the outset was intentional and aligned with the methodology of performing a comprehensive feature evaluation. The purpose of the sensitivity analysis was twofold, as follows:
- To quantify and rank the relative importance of each predictor variable with respect to its contribution to thermal load prediction. This is given in the form of filter feature selection analysis based only on the Pearson correlation between variables;
- To get information about the predictor variables with a strong impact on thermal loads;
- To support and validate the subsequent application of the Integral Feature Selection (IFS) method, which systematically eliminates redundant or non-informative features, and searches for optimal combinations of predictor variables can be employed to improve model performance and reduce complexity.
By including all potential variables, even those with seemingly weak direct correlations, we ensured that no latent interactions or higher-order effects were prematurely excluded. In some machine learning algorithms, especially nonlinear ones, variables that are weakly correlated individually may still contribute to model accuracy through interactions. Concequently, the sensitivity analysis results helped justify the exclusion of certain features during the optimization phase and provided insights into the relative impact of each variable, which ultimately guided the development of a more robust and interpretable predictive model.
3.2. ML Models: Accurate Predictions of Heating and Cooling Loads
This section presents the outcomes of the investigation into the application of ML algorithms for the accurate prediction of heating and cooling loads in buildings. Leveraging advanced models and methodologies, this section delves into the effectiveness of well-known ML models in optimizing energy consumption prediction and improving thermal comfort within the built environment. The results presented herein offer valuable insights into the performances of these models, shedding light on their potential for revolutionizing the way we approach heating and cooling load predictions in the realm of LEBs.
In Figure 7, Figure 8 and Figure 9, the predictive capabilities of each ML model are evaluated for both heating and cooling loads in three different climates. This assessment uses a combination of inputs that includes all predictor variables. The ranking of the ML models is determined according to the performance score. Additional analyses are conducted considering MAPE, σ, and R2.
Figure 7.
A comparative analysis of ML models in predicting thermal loads within a Mediterranean climate (Meknes City).
Figure 8.
A comparative analysis of ML models in predicting thermal loads within a cold climate (Ifrane City).
Figure 9.
A comparative analysis of ML models in predicting thermal loads within a hot climate (Marrakech City).
For the Mediterranean climate represented by Meknes City, the results show that during the training phase, the XGBoost model achieves the best performance in predicting both heating and cooling loads. In the testing phase, the ELM model demonstrates superior performance for heating load prediction, while the SVM model outperforms others in predicting cooling loads. Moreover, based on the high R2 values obtained, alternative models like SVM display strong potential, exhibiting near-perfect correlations for both thermal loads along with low dispersion indicators. Further details are illustrated in Figure 7.
The results obtained for Ifrane (a cold climate) demonstrate that, during the training phase, the XGBoost model performs optimally for predicting thermal loads. During the testing phase, the SVM model emerges as the most effective, showcasing a perfect correlation and very low values for the dispersion indicators considered in this study. Additional insights can be obtained from Figure 8.
Similar to the findings in a cold climate, the results for Marrakech, representing a semi-arid climate, reveal that, during the training phase, the XGBoost model excels in predicting both heating and cooling thermal loads. However, the SVM model is the most effective in the testing phase. Additional insights can be derived from Figure 9.
The XGBoost model optimally predicts heating and cooling thermal loads across the three considered climates during the training phase. However, excluding the Mediterranean climate, the SVM model is the most effective for predicting the heating and cooling loads. In a Mediterranean climate, the ELM model slightly exceeds the SVM model only in predicting heating loads, while the SVM model outperforms all ML models in predicting cooling loads.
As a summary of the most accurate predictions of heating and cooling loads, we conclude that the XGBoost model consistently performed best during the training phase for all climates. In testing, model performance varied by region:
- Mediterranean climate (Meknes)—ELM was most effective for heating load prediction, while SVM led in cooling load prediction;
- Cold climate (Ifrane)—SVM outperformed others with near-perfect accuracy;
- Semi-arid climate (Marrakech)—SVM again showed superior performance.
Overall, while XGBoost excelled in training, SVM emerged as the most robust model during testing, especially for Ifrane and Marrakech. In Meknes, ELM slightly outperformed SVM for heating loads, but SVM remained dominant for cooling predictions.
In the following subsection, only the top-performing model from the testing phase—specific to each thermal load and each city—will be utilized to identify the optimal combinations of predictor variables, aimed at enhancing both prediction accuracy and model simplicity.
3.3. Optimizing Thermal Load Predictions: Best Hybrid IFS–ML Models
In pursuing accurate and efficient thermal load predictions for LEBs, integrating Integral Feature Selection (IFS) with machine learning (ML) models has emerged as a promising avenue. This section unveils the culmination of our research efforts—the identification and validation of the best-performing hybrid IFS–ML models. While traditional input variable selection (IVS) methods often provide only a single “optimal” combination of predictors, the IFS methodology systematically explores all possible combinations (in our case study there are 210 − 1 = 1023 possible combinations) to identify multiple high-performing feature subsets, ensuring robustness and generalizability.
The primary objective of this section is to showcase the prowess of hybrid models in discerning the most influential predictors for robust heating and cooling load predictions. By strategically applying IFS to exhaustively search the predictor space and subsequently integrating these refined variables into the top-performing model obtained in the previous subsection, we aim to achieve a high accuracy yet interpretable modeling framework.
The optimal combinations of predictors for heating and cooling loads across the three Moroccan climates are illustrated in Figure 10 and Figure 11. Only the combinations achieving R2 ≥ 0.99 were retained as optimal.
Figure 10.
The best combinations of input for predicting heating loads.
Figure 11.
The best combinations of input for predicting cooling loads.
In a Mediterranean climate (Meknes), 63 optimal combinations were found for heating load prediction. For the cold climate (Ifrane), there were 93 combinations, and for the semi-arid climate (Marrakech), 64 combinations were identified. Remarkably, the best combination across all climates included variables X1, X2, X6, and X7, indicating a core set of features critical to accurate load estimation.
For cooling loads, 12 optimal combinations were identified for Meknes and Ifrane, and 11 for Marrakech. While Mediterranean and cold climates shared the same best combinations, the semi-arid climate slightly differed in its top-ranked set. Specifically, for Meknes and Ifrane, the best predictor combination included X2, X3, X4, X6, X7, X8, and X9, while for Marrakech it included X1, X2, X4, X6, X7, X8, and X9.
A statistical validation of the top-ranked combinations was performed (Figure 12 and Figure 13). The results show that the R2 consistently exceeded 0.998, while MAPE and σ values were exceptionally low, confirming the reliability and superiority of the proposed approach compared to benchmarks in the literature.
Figure 12.
A statistical examination of the best combinations of input for predicting heating loads.
Figure 13.
A statistical examination of the best combinations of input for predicting cooling loads.
For heating load, the top-ranked combination was 955 for Meknes and Ifrane and 1001 for Marrakech, achieving R2 = 0.998, MAPE = 0.284 kWh/m2/year, and σ = 1.115 kWh/m2/year. For cooling load, the best-performing combinations were 1013 for Meknes and Ifrane, and 1023 for Marrakech, with R2 = 0.998, MAPE = 0.263 kWh/m2/year, and σ = 1.131 kWh/m2/year. Interestingly, the first-best combination for all climates includes X1, X2, X6, and X7. These findings affirm the effectiveness of the IFS–ML hybrid model in accurately identifying the most relevant variable combinations for thermal load prediction across diverse climates.
Additionally, Figure 12 and Figure 13 present a comprehensive statistical analysis of all optimal input combinations used for predicting heating and cooling thermal loads. The results highlight that the employed performance metrics—R2, MAPE, and σ—consistently approach their ideal values (with R2 nearing 1, and MAPE and σ approaching zero). This demonstrates that highly accurate thermal load predictions can be achieved using multiple combinations of predictor variables, rather than relying on a single “best” configuration. These outcomes outperform those reported in previous studies on thermal load prediction for LEBs, confirming the robustness and versatility of the proposed hybrid IFS–ML framework.
When we compare our findings with those of other research publications, we may conclude the following:
- The incorporation of IFS significantly improved prediction accuracy by isolating critical predictors such as building geometry and material properties. This aligns with recent studies emphasizing feature optimization, including hybrid models combining metaheuristic algorithms (e.g., Particle Swarm Optimization) with XGBoost and SVR [] and interpretable classifiers prioritizing variables like glazing area and relative compactness []. Our results extend these approaches by demonstrating that systematic feature engineering (via IFS) reduces overfitting while maintaining robustness across climates;
- The XGBoost model emerged as the optimal predictor during training across all climates, corroborating its dominance in long-term load prediction tasks reported in prior work []. However, in Mediterranean climates, the SVM model outperformed others for cooling loads, while ELM showed niche superiority for heating loads. This climate-specific divergence contrasts with studies that prioritize general model performance (e.g., LightGBM achieving R2 = 0.9959 globally []), highlighting the need for regionally tailored frameworks—a gap underexplored in recent literature [];
- Our models achieved near-perfect Pearson correlation (close to 1) and low dispersion, surpassing benchmarks set by state-of-the-art techniques such as hybrid CNN architectures (MAE < 2 MW []) and LightGBM ensembles (CVRMSE = 5.25% []). Unlike studies focusing on single-model superiority (e.g., TPE-LightGBM with R2 = 0.9981 []), we identified multiple predictor combinations that maximize accuracy, offering flexibility for diverse design scenarios;
- By bridging feature selection, model optimization, and climate adaptability, this work contributes to the operationalization of hybrid AI systems for sustainable architecture—a priority underscored in recent frameworks integrating ML with climate models []. Our methodology aligns with calls for interpretable, actionable tools to guide HVAC optimization and envelope design [,].
4. Conclusions, Limitations, Future Directions
4.1. Conclusions
This study proposed a hybrid AI-based framework for accurate thermal load prediction in Low-Energy Buildings (LEBs) across three distinct Moroccan climates. Thirteen machine learning models were initially evaluated using a comprehensive set of input features, revealing that while XGBoost demonstrated superior performance during training, SVM emerged as the most robust model during the testing phase—especially for cold and semi-arid climates. In the Mediterranean climate, the ELM model slightly outperformed SVM in heating load prediction.
Building upon these results, an Integral Feature Selection (IFS) approach was employed in conjunction with the top-performing models to identify the most influential predictor combinations. This not only enhanced the accuracy of thermal load prediction, but also contributed to reducing model complexity.
The findings demonstrate the potential of integrating advanced ML models with feature selection techniques to support energy-efficient building design and operation. The proposed framework outperforms existing methods and offers a scalable, climate-adaptive solution for optimizing heating and cooling load predictions in LEBs. Future work will focus on incorporating additional parameters—such as window thermal performance—and extending the approach to real-time prediction scenarios.
The practical value of this research lies in its ability to provide architects, building engineers, and energy consultants with a robust decision-support framework for optimizing thermal load predictions in LEBs. By identifying the most relevant input features through the IFS method and selecting the best-performing machine learning models tailored to specific climates, our approach supports the early design phase of buildings, where decisions on envelope parameters influence energy performance.
Additionally, the proposed framework is applicable across diverse climatic zones, as demonstrated in the three representative Moroccan cities. This adaptability enhances its transferability to other regions with similar climate profiles, supporting national and regional energy efficiency goals.
4.2. Study Limitations
While the proposed hybrid AI framework demonstrated strong performance in predicting thermal loads across different climates, several limitations must be acknowledged, as follows:
- First, the analysis did not include certain detailed building parameters such as the thermal performance of exterior windows, which may influence results in real-world applications;
- Second, the models were developed using simulated data, which, while controlled and consistent, may not capture all the variabilities present in actual building operation;
- Third, the framework has not yet been tested in real-time or online predictive environments, which are critical for practical implementation in Building Energy Management Systems (BEMS).
- Lastly, while the study included three diverse climates in Morocco, the generalizability to other regions requires further validation.
4.3. Future Directions
Building on the promising results of this study, several future research directions are envisioned, as follows:
- Incorporation of additional building parameters. Future work should include more detailed characteristics of building components—particularly the thermal performance of windows, shading devices, and occupancy schedules—to better reflect real-world thermal dynamics;
- Validation with real-world data. While this study relied on simulation data for model training and evaluation, validating the proposed framework using real-world measurements from monitored buildings would enhance its reliability and practical applicability;
- Dynamic and real-time prediction. The integration of the framework into BEMS for real-time thermal load predicting and control represents a valuable extension, especially for smart buildings and grid-responsive operations;
- Cross-regional generalization. Although the study focused on three Moroccan climates, extending the framework to other geographical regions with different climate patterns and building typologies will further test its adaptability and scalability;
- Integration with multi-objective optimization. Future research may explore combining predictive models with optimization algorithms (e.g., genetic algorithms, NSGA-II) to support the design of buildings that balance energy efficiency, cost, and thermal comfort.
- Use of deep learning and hybrid architectures. Further investigation into deep learning models (e.g., CNNs, LSTMs, Transformers) and hybrid architectures that can automatically learn temporal and spatial patterns in energy data may enhance predictive performance.
Author Contributions
Conceptualization, Y.E.M. and M.T.U.; methodology, Y.E.M.; software, Y.E.M.; formal analysis, Y.E.M. and M.T.U.; investigation, Y.E.M. and M.T.U.; resources, M.T.U.; writing—original draft preparation, Y.E.M. and M.T.U.; writing—review and editing, Y.E.M. and M.T.U.; visualization, Y.E.M.; supervision, Y.E.M. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
(a) The dataset, models, or codes supporting this study’s findings are available from the corresponding author upon a reasonable request. (b) All data, models and code generated or used during this study appear in the submitted article.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| ACH | Air change rate (h-1) |
| ANN | Artificial neural network |
| CDD | Cooling degree days |
| HDD | Heating degree days |
| ML | Machine learning |
| N | North (building orientation) |
| R2 | Coefficient of Determination |
| σ | Standard deviation |
| X | Vector of design variables |
| QCOOL | Cooling load |
| QHEAT | Heating load |
| HVAC | Heating, Ventilation and Air Conditioning |
| WWR | Windows-to-Wall Ratio |
| MBE | Mean Bias Error MBE |
| RMSE | Root Mean Square Error |
| MAPE | Mean Absolute Percentage Error |
| φ | Performance score |
| DT | Decision Trees |
| SVM | Support Vector Machine |
| ELM | Extreme Learning Machine |
| XGBoost | Extreme Gradient Boosting |
| RF | Random Forest |
| TreeBag | Tree Bagger |
| GLR | Generalized Linear Regression |
| GR | Gaussian process Regression |
| LR | Linear Regression |
| GAM | Generalized Additive Model |
| KRR | Kernelized Ridge Regression |
| LRR | Linear Ridge Regression |
| IFS | Integral Feature Selection |
| IVS | Input Variable Selection |
| LEBs | Low-Energy Buildings |
| AI | Artificial Intelligence |
| SVR | Support Vector Regression |
| MLP | Multi-Layer Perception |
| RBF | Radial Basis Function |
| RSM | Response Surface Methodology |
References
- Slimani, J.; Kadrani, A.; El Harraki, I.; Ezzahid, E. Towards a sustainable energy future: Modeling Morocco’s transition to renewable power with enhanced OSeMOSYS model. Energy Convers. Manag. 2024, 317, 118857. [Google Scholar] [CrossRef]
- El Hafdaoui, H.; Khallaayoun, A.; Ouazzani, K. Long-term low carbon strategy of Morocco: A review of future scenarios and energy measures. Results Eng. 2024, 21, 101724. [Google Scholar] [CrossRef]
- Bai, Y.; Yu, C.; Pan, W. Systematic examination of energy performance gap in low-energy buildings. Renew. Sustain. Energy Rev. 2024, 202, 114701. [Google Scholar] [CrossRef]
- Smouh, S.; Gargab, F.Z.; Ouhammou, B.; Mana, A.A.; Saadani, R.; Jamil, A. A New Approach to Energy Transition in Morocco for Low Carbon and Sustainable Industry (Case of Textile Sector). Energies 2022, 15, 3693. [Google Scholar] [CrossRef]
- Abdou, N.; El Mghouchi, Y.; Jraida, K.; Hamdaoui, S.; Hajou, A.; Mouqallid, M. Prediction and optimization of heating and cooling loads for low energy buildings in Morocco: An application of hybrid machine learning methods. J. Build. Eng. 2022, 61, 105332. [Google Scholar] [CrossRef]
- Yu, D.; Liu, T.; Wang, K.; Li, K.; Mercangöz, M.; Zhao, J.; Lei, Y.; Zhao, R. Transformer based day-ahead cooling load forecasting of hub airport air-conditioning systems with thermal energy storage. Energy Build. 2024, 308, 114008. [Google Scholar] [CrossRef]
- Suh, Y.; Chandramowlishwaran, A.; Won, Y. Recent progress of artificial intelligence for liquid-vapor phase change heat transfer. NPJ Comput Mater 2024, 10, 65. [Google Scholar] [CrossRef]
- Dasi, H.; Ying, Z.; Ashab, M.F.B. Proposing hybrid prediction approaches with the integration of machine learning models and metaheuristic algorithms to forecast the cooling and heating load of buildings. Energy 2024, 291, 130297. [Google Scholar] [CrossRef]
- Afzal, S.; Shokri, A.; Ziapour, B.M.; Shakibi, H.; Sobhani, B. Building energy consumption prediction and optimization using different neural network-assisted models; comparison of different networks and optimization algorithms. Eng. Appl. Artif. Intell. 2024, 127, 107356. [Google Scholar] [CrossRef]
- Liu, H.; Yu, J.; Dai, J.; Zhao, A.; Wang, M.; Zhou, M. Hybrid prediction model for cold load in large public buildings based on mean residual feedback and improved SVR. Energy Build. 2023, 294, 113229. [Google Scholar] [CrossRef]
- El Alaoui, M.; Rougui, M.; Lamrani, A.; Mouhat, O. Building energy prediction using artificial neural networks and analysis of covariance in the six thermal zones of Morocco. Mater. Today Proc. 2023. [Google Scholar] [CrossRef]
- Al-Shargabi, A.A.; Almhafdy, A.; Ibrahim, D.M.; Alghieth, M.; Chiclana, F. Buildings’ energy consumption prediction models based on buildings’ characteristics: Research trends, taxonomy, and performance measures. J. Build. Eng. 2022, 54, 104577. [Google Scholar] [CrossRef]
- Sick, F.; Schade, S.; Mourtada, A.; Uh, D.; Grausam, M. DYNAMIC BUILDING SIMULATIONS FOR THE ESTABLISHMENT OF A MOROCCAN THERMAL REGULATION FOR BUILDINGS. J. Green Build. 2014, 9, 145–165. [Google Scholar] [CrossRef]
- Abdou, N.; EL Mghouchi, Y.; Hamdaoui, S.; EL Asri, N.; Mouqallid, M. Multi-objective optimization of passive energy efficiency measures for net-zero energy building in Morocco. Build. Environ. 2021, 204, 108141. [Google Scholar] [CrossRef]
- Wu, J.; Chen, X.-Y.; Zhang, H.; Xiong, L.-D.; Lei, H.; Deng, S.-H. Hyperparameter Optimization for Machine Learning Models Based on Bayesian Optimizationb. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar] [CrossRef]
- Liashchynskyi, P.; Liashchynskyi, P. Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS 2019. arXiv 2019, arXiv:1912.06059. [Google Scholar] [CrossRef]
- Chakraborty, K.; Mehrotra, K.; Mohan, C.K.; Ranka, S. Forecasting the behavior of multivariate time series using neural networks. Neural Netw. 1992, 5, 961–970. [Google Scholar] [CrossRef]
- Quinlan, J.R. Induction of decision trees. Mach Learn 1986, 1, 81–106. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Abellán, J.; Masegosa, A.R. Bagging Decision Trees on Data Sets with Classification Noise. In Proceedings of the Foundations of Information and Knowledge Systems, Sofia, Bulgaria, 15–19 February 2010; Link, S., Prade, H., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 248–265. [Google Scholar]
- Lesaffre, E.; Marx, B.D. Collinearity in generalized linear regression. Commun. Stat.-Theory Methods 1993, 22, 1933–1952. [Google Scholar] [CrossRef]
- Najibi, F.; Apostolopoulou, D.; Alonso, E. Enhanced performance Gaussian process regression for probabilistic short-term solar output forecast. Int. J. Electr. Power Energy Syst. 2021, 130, 106916. [Google Scholar] [CrossRef]
- Maulud, D.; Abdulazeez, A.M. A Review on Linear Regression Comprehensive in Machine Learning. JASTT 2020, 1, 140–147. [Google Scholar] [CrossRef]
- Hastie, T.J. Generalized Additive Models. In Statistical Models in S; Routledge: London, UK, 1992. [Google Scholar]
- Vovk, V. Kernel Ridge Regression. In Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik; Schölkopf, B., Luo, Z., Vovk, V., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 105–116. ISBN 978-3-642-41136-6. [Google Scholar]
- Liu, X.-Q.; Gao, F. Linearized Ridge Regression Estimator in Linear Regression. Commun. Stat.-Theory Methods 2011, 40, 2182–2192. [Google Scholar] [CrossRef]
- El Mghouchi, Y.; Chham, E.; Zemmouri, E.M.; El Bouardi, A. Assessment of different combinations of meteorological parameters for predicting daily global solar radiation using artificial neural networks. Build. Environ. 2019, 149, 607–622. [Google Scholar] [CrossRef]
- Udristioiu, M.T.; EL Mghouchi, Y.; Yildizhan, H. Prediction, modelling, and forecasting of PM and AQI using hybrid machine learning. J. Clean. Prod. 2023, 421, 138496. [Google Scholar] [CrossRef]
- Badescu, V. Assessing the performance of solar radiation computing models and model selection procedures. J. Atmos. Sol.-Terr. Phys. 2013, 105–106, 119–134. [Google Scholar] [CrossRef]
- Mehdizadeh Khorrami, B.; Soleimani, A.; Pinnarelli, A.; Brusco, G.; Vizza, P. Forecasting heating and cooling loads in residential buildings using machine learning: A comparative study of techniques and influential indicators. Asian J. Civ. Eng. 2024, 25, 1163–1177. [Google Scholar] [CrossRef]
- Zhu, L.; Zhang, J.; Gao, Y.; Tian, W.; Yan, Z.; Ye, X.; Sun, Y.; Wu, C. Uncertainty and sensitivity analysis of cooling and heating loads for building energy planning. J. Build. Eng. 2022, 45, 103440. [Google Scholar] [CrossRef]
- Chen, Y.; Ye, Y.; Liu, J.; Zhang, L.; Li, W.; Mohtaram, S. Machine Learning Approach to Predict Building Thermal Load Considering Feature Variable Dimensions: An Office Building Case Study. Buildings 2023, 13, 312. [Google Scholar] [CrossRef]
- Lv, R.; Yuan, Z.; Lei, B.; Zheng, J.; Luo, X. Building thermal load prediction using deep learning method considering time-shifting correlation in feature variables. J. Build. Eng. 2022, 61, 105316. [Google Scholar] [CrossRef]
- Abdel-Jaber, F.; Dirks, K.N. Thermal Load Prediction in Residential Buildings Using Interpretable Classification. Buildings 2024, 14, 1989. [Google Scholar] [CrossRef]
- Wang, Z.; Hong, T.; Piette, M.A. Building thermal load prediction through shallow machine learning and deep learning. Appl. Energy 2020, 263, 114683. [Google Scholar] [CrossRef]
- Chen, Y.; Ye, Y.; Chen, Z.; Liu, J.; Su, L.; Ji, Y.; Li, W. Performance Comparison for Building Thermal Load Prediction of Office Buildings Using Machine Learning Methods. SSRN 2021. [Google Scholar] [CrossRef]
- Zhao, A.; Mi, L.; Xue, X.; Xi, J.; Jiao, Y. Heating load prediction of residential district using hybrid model based on CNN. Energy Build. 2022, 266, 112122. [Google Scholar] [CrossRef]
- Slater, L.J.; Arnal, L.; Boucher, M.-A.; Chang, A.Y.-Y.; Moulds, S.; Murphy, C.; Nearing, G.; Shalev, G.; Shen, C.; Speight, L.; et al. Hybrid forecasting: Blending climate predictions with AI models. Hydrol. Earth Syst. Sci. 2023, 27, 1865–1889. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).