Synthetic Residential Building Energy-Consumption Dataset Generation Through Parametric Simulation for Hot–Arid Egypt
Abstract
1. Introduction
Research Contributions
- An interoperable parametric-to-simulation workflow for early-stage residential energy analysis. The study formalises an integrated pipeline that connects Rhino/Grasshopper parametric modelling to EnergyPlus-based annual simulation (via DIVA for Grasshopper), enabling automated generation of consistent design alternatives and their energy labels.
- An open, labelled residential energy dataset for a hot–arid Egyptian context. The dataset contains 12,000 simulation cases for New Cairo/Cairo boundary conditions and is publicly released through Zenodo. It is explicitly structured as input–output pairs linking conceptual design variables (geometry, orientation, façade WWR, glazing properties, setpoints, and discrete envelope options) to annual end-use outputs (heating, cooling, lighting, and equipment), supporting reproducible benchmarking and downstream surrogate/ML research.
- A documented scope with consistent boundary conditions to isolate early-stage effects. To ensure comparability across alternatives, the dataset is generated under controlled assumptions (a single Cairo/New Cairo EPW, a residential operational profile, and fixed baseline internal gains and HVAC-related coefficients). Therefore, the observed energy variance is attributable primarily to the declared early-stage variables.
- A quality-assurance layer for dataset credibility and reuse. The study introduces systematic verification and plausibility benchmarking, including parameter-bound checks, geometry sanity checks, run-completeness/error screening, output integrity checks, and confirmation of expected physical trends for cooling-dominated hot–arid conditions.
- Empirical evidence on the relative influence of early-stage variables in hot–arid housing. Beyond releasing the dataset, the paper reports sensitivity/interpretability signals showing that cooling setpoint and building dimensions are the dominant drivers of annual energy variance, followed by glazing solar gains (SHGC). At the same time, other envelope-related parameters exhibit smaller effects within the tested, code-aligned ranges.
2. Literature Review
- It enables systematic exploration of design spaces by varying key parameters such as building dimensions, orientation, window-to-wall ratios, and basic envelope properties.
- It enables the generation of comprehensive energy consumption databases that capture the relationships between design decisions and performance outcomes.
- It provides visual feedback directly within the design environment, helping architects understand performance implications without switching between different software platforms.
- It supports data-driven decision-making during the critical early phases when the potential for cost-effective optimisation is highest.
Critical Synthesis and Research Gap
3. Research Methodology
3.1. Data Specifications
3.2. Data Generation, Analysis, and Results
- Conceptual design phase: defines the baseline building configuration and the initial set of design variables to be explored;
- Parametric simulation: generates design alternatives by systematically varying key parameters within the modelling environment;
- Energy modelling: translates each design alternative into an EnergyPlus-ready representation, including envelope, internal loads, and operational settings;
- Simulation workflow: executes batch simulations and manages input/output handling, quality checks, and extraction of performance metrics;
- Energy consumption dataset: compiles the labelled input–output pairs into a consistent dataset suitable for analysis and data-driven modelling.
3.2.1. Conceptual Design Phase
3.2.2. Parametric Simulation
3.2.3. Energy Modelling
3.2.4. Simulation Workflow
- Stage 1: Construction Material Assembly
- Wall Assembly: Three types of wall constructions are analysed, each designed to represent typical configurations in the Egyptian residential sector:
- Wall Type 1 (Single Wall 125 mm): Composed of one layer of red brick, two layers of cement mortar, and two layers of plaster.
- Wall Type 2 (Double Wall 250 mm): Includes a double layer of red bricks, two layers of cement mortar, and two layers of plaster.
- Wall Type 3 (Double Red Brick Wall with Air Gap): Features two single red brick layers separated by an air gap, cement mortar, and plaster layers.
- Roof Assembly: The roof, a critical component of the building envelope, is designed to minimise heat gain. Two types of roof constructions are specified:
- Roof Type 1 (Slab 150 mm): Consists of a cement tile layer, a cement mortar layer, a clean sand layer, an insulation bitumen layer, and a reinforced concrete layer.
- Roof Type 2 (Roof Floor Slab 200 mm): Like Roof Type 1 but with an increased slab thickness, enhancing its thermal resistance.
- Slab on Grade: The slab on grade is another essential element for controlling heat transfer between the building and the ground. Two types are highlighted:
- Slab on Grade Type 1 (SOG 150 mm): Comprises a cement tile layer, a cement mortar layer, a clean sand layer, and a reinforced concrete layer.
- Slab on Grade Type 2 (SOG 200 mm): Like Type 1 but with increased thickness for improved insulation properties.
- Stage 2: Building Geometry Modelling
- Stage 3: Thermal Simulation Analysis
- Stage 4: Dataset Generation
3.3. Verification and Benchmarking
3.3.1. Input and Configuration Verification (Pre-Simulation)
3.3.2. Simulation Execution Verification (During and Post-Run)
3.3.3. Output Extraction Checks and Dataset Integrity (Post-Processing)
3.3.4. Plausibility Benchmarking Using Expected Physical Trends (Internal Benchmarking)
3.3.5. External Benchmarking
3.4. Data Significance
4. Methodological Contributions
- Workflow Integration: The study bridges the gap between architectural design and energy performance analysis by creating a direct link between Rhino/Grasshopper and EnergyPlus. This feature allows designers to receive immediate performance feedback without switching between separate software platforms, addressing a well-documented barrier to energy-conscious design.
- Scalable Simulation Framework: Through custom Python scripting and automation, the methodology could overcome computational limitations by executing 12,000 distinct simulations. This comprehensive dataset captures complex interactions between 18 key design parameters, providing unprecedented resolution for understanding how architectural decisions affect energy outcomes in hot climates.
- Empirical Validation of Parameter Significance: The conducted sensitivity analysis provides quantitative evidence that cooling setpoints (sensitivity index: 0.112) and building dimensions (length: 0.081; depth: 0.085) dominate the variance in energy consumption in Egyptian residential buildings, while envelope properties exert a relatively minor influence. These findings provide a clear challenge to conventional assumptions about the importance of thermal mass in hot climates.
5. Study Discussion
5.1. Sensitivity Analysis
5.2. Physical Mechanisms and Design Implications of the Results
5.3. Data Quality, Bias, and Uncertainty Considerations
5.4. Sampling Quality and Interaction-Bias Diagnostics
5.4.1. Marginal Balance
5.4.2. Interaction-Bias Screening
5.5. Theoretical and Practical Implications
- Architects, the framework enables real-time energy evaluation of design alternatives. It is particularly useful for messing, orientation, and fenestration decisions. The finding that the window-to-wall ratio affects cooling loads more significantly than wall insulation may shift design priorities.
- Building code developers can use the sensitivity results to prioritise energy-efficiency measures. The strong influence of cooling setpoints suggests potential energy savings. These savings may be achieved through smart thermostat regulations or passive cooling strategies.
- Machine-learning researchers: the dataset provides curated training data for surrogate models. This supports energy prediction in understudied regions where data scarcity is common.
5.6. Comparison with Optimisation-Driven and Conventional Simulation Workflows
5.7. Extension to Other Building Typologies
5.8. Comparison with Existing Studies
5.9. Modelling Assumptions, Simplifications, and Expected Systematic Effects
6. Study Limitations and Future Recommendations
6.1. Study Limitations
6.2. Future Recommendations
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Awan, A.; Kocoglu, M.; Subhan, M.; Utepkaliyeva, K.; Hossain, M.E. Assessing energy efficiency in the built environment: A quantile regression analysis of CO2 emissions from buildings and manufacturing sector. Energy Build. 2025, 338, 115733. [Google Scholar] [CrossRef]
- Kaloop, M.R.; Ahmad, F.; Samui, P.; Elbeltagi, E.; Hu, J.W.; Wefki, H. Predicting energy consumption of residential buildings using metaheuristic-optimised artificial neural network technique in early design stage. Build. Environ. 2025, 274, 112749. [Google Scholar] [CrossRef]
- Elbeltagi, E.; Wefki, H. Predicting energy consumption for residential buildings using ANN through parametric modelling. Energy Rep. 2021, 7, 2534–2545. [Google Scholar] [CrossRef]
- Pena, M.L.C.; Carballal, A.; Rodríguez-Fernández, N.; Santos, I.; Romero, J. Artificial intelligence applied to conceptual design. A review of its use in architecture. Autom. Constr. 2021, 124, 103550. [Google Scholar] [CrossRef]
- Naboni, R.; Paoletti, I. Advanced Customization in Architectural Design and Construction; Springer International Publishing: Cham, Switzerland, 2015. [Google Scholar] [CrossRef]
- Anton, I.; Tănase, D. Informed geometries. Parametric modelling and energy analysis in early stages of design. Energy Procedia 2016, 85, 9–16. [Google Scholar] [CrossRef]
- Albaik, M.; Muhsen, R. Optimising Building Performance: A Grasshopper Modelling Case Study of the King Hussein Mosque. IEEE Access 2025, 13, 47244–47259. [Google Scholar] [CrossRef]
- Sarkar, D. Application of Grasshopper Optimisation Algorithm for Design and Development of Net Zero Energy Residential Building in Ahmedabad, India. In Proceedings of the 2024 International Conference on Sustainable Energy: Energy Transition and Net-Zero Climate Future (ICUE), Ahmedabad, India, 21–23 October 2024; pp. 1–7. [Google Scholar] [CrossRef]
- Bao, X.; Zhang, J. Multi-objective decision optimisation design for building energy-saving retrofitting design based on improved grasshopper optimisation algorithm. Int. J. Renew. Energy Dev. 2024, 13, 1058–1067. [Google Scholar] [CrossRef]
- de Sousa Freitas, J.; Cronemberger, J.; Soares, R.M.; Amorim, C.N.D. Modelling and assessing BIPV envelopes using parametric Rhinoceros plugins, Grasshopper and Ladybug. Renew. Energy 2020, 160, 1468–1479. [Google Scholar] [CrossRef]
- Elbeltagi, E.; Wefki, H.; Abdrabou, S.; Dawood, M.; Ramzy, A. Visualised strategy for predicting buildings’ energy consumption during early design stage using parametric analysis. J. Build. Eng. 2017, 13, 127–136. [Google Scholar] [CrossRef]
- Peng, J.; Yang, Y.; Fu, X.; Hou, Y.; Ding, Y. Grasshopper platform-assisted design optimization of fujian rural earthen buildings considering low-carbon emissions reduction. Sci. Rep. 2024, 14, 18229. [Google Scholar] [CrossRef]
- Gavaldà-Torrellas, O.; Monsalvete, P.; Ranjbar, S.; Eicker, U. The Urban Building Energy Retrofitting Tool: An Open-Source Framework to Help Foster Building Retrofitting Using a Life Cycle Costing Perspective—First Results for Montréal. Smart Cities 2025, 8, 17. [Google Scholar] [CrossRef]
- Gaterell, M.R.; McEvoy, M.E. The impact of climate change uncertainties on the performance of energy efficiency measures applied to dwellings. Energy Build. 2005, 37, 982–995. [Google Scholar] [CrossRef]
- Khan, H. Microclimatic architectural design by interfacing Grasshopper and Dynamo with Rhino and Revit. Meas. Sens. 2024, 33, 101143. [Google Scholar] [CrossRef]
- Sadeghipour Roudsari, M.; Pak, M.; Viola, A. Ladybug: A parametric environmental plugin for Grasshopper to help designers create an environmentally-conscious design. In Proceedings of the Building Simulation 2013: 13th Conference of IBPSA, Chambery, France, 25–28 August 2013; pp. 3128–3135. [Google Scholar] [CrossRef]
- Ramirez, J.P.D.; Nagarsheth, S.H.; Ramirez, C.E.D.; Henao, N.; Agbossou, K. Synthetic dataset generation of energy consumption for a residential apartment building in cold weather, considering the building’s ageing. Data Brief 2024, 54, 110445. [Google Scholar] [CrossRef] [PubMed]
- Wu, C.; Pan, H.; Luo, Z.; Liu, C.; Huang, H. Multi-objective optimization of residential building energy consumption, daylighting, and thermal comfort based on BO-XGBoost-NSGA-II. Build. Environ. 2024, 254, 111386. [Google Scholar] [CrossRef]
- Waqas, H.; Shang, J.; Munir, I.; Ullah, S.; Khan, R.; Tayyab, M.; Mousa, B.G.; Williams, S. Enhancement of the energy performance of an existing building using a parametric approach. J. Energy Eng. 2023, 149, 04022057. [Google Scholar] [CrossRef]
- Alammar, A.; Jabi, W. Generation of a Large Synthetic Database of Office Tower’s Energy Demand Using Simulation and Machine Learning. In Proceedings of the International Symposium on Formal Methods in Architecture, Singapore, 25–27 May 2022; Springer Nature: Singapore, 2022; pp. 479–500. [Google Scholar] [CrossRef]
- Peronato, G.; Kämpf, J.H.; Rey, E.; Andersen, M. Integrating urban energy simulation in a parametric environment: A Grasshopper interface for CitySim. In Proceedings of the PLEA 2017: 33rd PLEA International Conference on Passive and Low Energy Architecture, Edinburgh, UK, 2–5 July 2017; Available online: https://arodes.hes-so.ch/record/7711?v=pdf (accessed on 11 May 2025).
- Wang, X.; Teigland, R.; Hollberg, A. Identifying influential architectural design variables for early-stage building sustainability optimisation. Build. Environ. 2024, 252, 111295. [Google Scholar] [CrossRef]
- Olu-Ajayi, R.; Alaka, H.; Sulaimon, I.; Sunmola, F.; Ajayi, S. Building energy consumption prediction for residential buildings using deep learning and other machine learning techniques. J. Build. Eng. 2022, 45, 103406. [Google Scholar] [CrossRef]
- Mendes, V.F.; Cruz, A.S.; Gomes, A.P.; Mendes, J.C. A systematic review of methods for evaluating the thermal performance of buildings through energy simulations. Renew. Sustain. Energy Rev. 2024, 189, 113875. [Google Scholar] [CrossRef]
- Cavieres, A.; Gentry, R.; Al-Haddad, T. Knowledge-based parametric tools for concrete masonry walls: Conceptual design and preliminary structural analysis. Autom. Constr. 2011, 20, 716–728. [Google Scholar] [CrossRef]
- Lee, K.S.; Han, K.J.; Lee, J.W. Feasibility study on parametric optimization of daylighting in building shading design. Sustainability 2016, 8, 1220. [Google Scholar] [CrossRef]
- Samuelson, H.; Claussnitzer, S.; Goyal, A.; Chen, Y.; Romo-Castillo, A. Parametric energy simulation in early design: High-rise residential buildings in urban contexts. Build. Environ. 2016, 101, 19–31. [Google Scholar] [CrossRef]
- Wefki, H.; Elbeltagi, E.; Abdrabou, S.; Dawood, M.; Ramzy, A. Conceptual Design for Sustainable Buildings Considering Energy Consumption Using Simulation and ANN. Ph.D. Thesis, Mansoura University, Mansoura, Egypt, 2017. [Google Scholar]
- Attia, S.; Wanas, O. The Database of Egyptian Building Envelopes (DEBE): A Database for Building Energy Simulations. In Proceedings of the SimBuild Conference 2012: 5th Conference of IBPSA-USA, Madison, WI, USA, 1–3 August 2012; pp. 96–103. [Google Scholar]
- ESTIW. The Egyptian Specifications for Thermal Insulation Work Items; No. 176/1998; Ministry of Housing: Cairo, Egypt, 2017.
- Attia, S.; Gratia, E.; De Herde, A.; Hensen, J.L. Simulation-based decision support tool for early stages of zero-energy building design. Energy Build. 2012, 49, 2–15. [Google Scholar] [CrossRef]
- Ihm, P.; Krarti, M. Design optimization of energy efficient residential buildings in Tunisia. Build. Environ. 2012, 58, 81–90. [Google Scholar] [CrossRef]
- Assad, M.N. Towards Promoting Sustainable Construction in Egypt: A Life-Cycle Cost Approach. Master’s Thesis, The American University in Cairo, Cairo, Egypt, 2021. Available online: https://fount.aucegypt.edu/retro_etds/2445/ (accessed on 3 April 2025).
- Alajmi, A.F. Quantifying energy use intensity and peak demand in a hot-arid residential building: Insights from four years of high-resolution monitoring. Energy Rep. 2025, 14, 2204–2216. [Google Scholar] [CrossRef]







| Subject | Building Performance Analysis and Energy Engineering |
|---|---|
| Specific subject area | Energy consumption and efficiency in residential buildings |
| Data type | Synthetic dataset stored in .xlsx files. The dataset is simulation-derived to support early-stage design exploration, where measured consumption data and complete building metadata are typically unavailable. |
| How the data were acquired | The Rhino/Grasshopper was used to generate 12,000 simulations, which were stored in XLSX files. Different design parameters, such as building orientation, dimensions, materials used, and climate conditions. Data were collected in New Cairo City, Egypt (30.0363° N, 31.4758° E) and are available in the Zenodo repository. |
| Data format | Raw |
| Experimental factors | The simulations included diverse scenarios for building orientation, dimensions (width, depth, and height), material properties, and climatic conditions (indoor and outdoor). |
| Data source location | New Cairo City, Cairo, Egypt Geographical Coordinates—30.0363° N, 31.4758° E |
| Dataset access | Repository name: Zenodo Data identification number: 10.5281/zenodo.13622940 Direct URL to data: https://doi.org/10.5281/zenodo.13622940 (accessed on 22 February 2026). Instructions for accessing these data: none |
| Value of data | Useful in analysing the effect of different design factors on residential building energy use. Beneficial for designers, architects, engineers, and researchers in the development of energy optimisation. Support the creation of energy optimisation and performance assessment models. It can be used for training deep-learning models and predicting future energy consumption patterns. |
| Parameter | Description |
|---|---|
| Wall Type | Different wall types |
| Roof Type | Different roof types |
| Slab-on-Grade (S.O.G) Type | Different S.O.G types |
| Building Length | Building dimensions, different lengths (m) |
| Building Width | Building dimensions, different widths (m) |
| Building Height | Building dimensions, different heights (m) |
| Building Orientation | Building orientations from the North in degrees |
| South Window-to-Wall Ratio (WWR) | Window-to-Wall ratio in (%) for South façade |
| East Window-to-Wall Ratio (WWR) | Window-to-Wall ratio in (%) for East façade |
| North Window-to-Wall Ratio (WWR) | Window-to-Wall ratio in (%) for North façade |
| West Window-to-Wall Ratio (WWR) | Window-to-Wall ratio in (%) for West façade |
| Glass U-value 1 | Thermal conductance W/(m2·K) |
| Glass SHGC 2 | Solar Heat Gain Coefficient (SHGC) |
| Glass VT 3 | Visible transmittance (VT) in (%) |
| Heating Setpoint Temperature | Indoor heating comfort temperature (°C) |
| Cooling Setpoint Temperature | Indoor cooling comfort temperature (°C) |
| Model Parameter | Input Information |
|---|---|
| Weather Data | Location, Latitude, and Longitude, and Temperatures |
| Building Geometry | Building shape, Building orientation, Principal building function, Total floor area, and Floor-to-floor height. |
| Envelope | Window-to-wall ratio, Glass (SHGC, U-value, VT), Wall, Roof, Slab on Grade, Thermal zoning, and Infiltration assumptions. |
| Internal Loads | Anticipated building occupancy, Lighting power density, and Plug-load density. |
| HVAC Equipment | Systems type (heating and cooling), distribution type, capacity, efficiency, and schedules of operation and control. |
| Item | Conductivity [W/m·K] | Density [kg/m3] | Specific Heat [J/kg °C] |
|---|---|---|---|
| Red Brick | 0.60 | 1790.00 | 840.00 |
| Cement Mortar | 1.00 | 1570.00 | 896.00 |
| Plaster | 0.16 | 600.00 | 1000.00 |
| Reinforced Concrete | 1.44 | 2460.00 | 1000.00 |
| Cement Tiles | 1.50 | 2100.00 | 1000.00 |
| Sand | 0.33 | 1520.00 | 800.00 |
| Bitumen Damp Insulation | 0.15 | 1055.00 | 1000.00 |
| Parameter | Possibility | Parameter Value | |
|---|---|---|---|
| Min. | Max. | ||
| Building dimension | Length | 10 m | 30 m |
| Depth | 10 m | 30 m | |
| Height | 3 m | 15 m | |
| Building orientation | 0° | 360° | |
| Windows-to-wall ratio | North | 0% | 80% |
| South | 0% | 80% | |
| East | 0% | 80% | |
| West | 0% | 80% | |
| Glazing type | U-value | 0 | 1.2 |
| SHGC | 0 | 1 | |
| VT | 0 | 1 | |
| Temperature set point | Cooling | 18 °C | 28 °C |
| Heating | 8 °C | 12 °C | |
| Parameter | Possibility | Parameter Value |
|---|---|---|
| Building envelope | Wall | Type 1 Type 2 Type 3 |
| Roof | Type 1 Type 2 | |
| SOG | Type 1 Type 2 | |
| Lighting Load 1 | 7.3 W/m2 | |
| Equipment Load 2 | 7.0 W/m2 |
| Setting | Attribute |
|---|---|
| Weather File | EGY_Cairo.Intl.Airport.623660_ETMY.epw |
| Simulation type | Residential |
| Run Period | Annual |
| Time Steps per Hour | 6 |
| Outputs | - Heating Energy Consumption (Annual) - Cooling Energy Consumption (Annual) - Lights Energy (Annual) - Equipment Energy (Annual) |
| Setting | Attribute |
|---|---|
| Number of People (people/m2) | 0.033 people/m2 [31] |
| Lighting Load (W/m2) | 7.3 W/m2 [31] |
| Equipment Load (W/m2) | 7.0 W/m2 [31] |
| Cooling COP | 3.0 [32] |
| Heating COP | 4.0 [33] |
| Infiltration Rate | 0.7 L/s/m2 [32] |
| Fresh Air | 20 m3/h/person [31] |
| Parameters |
|---|
| Building dimensions (Thermal zone) |
| Building orientation |
| WWR (South, North, West, East) |
| U-value for glass |
| Solar Heat Gain Coefficient (SHGC) for glass |
| Visible Transmittance (VT) for glass |
| Cooling Set Point Temperature |
| Heating Set Point Temperature |
| Parameter | Change | (kWh/Year) | (kWh/Year) | p-Value |
|---|---|---|---|---|
| Cooling setpoint (Cooling_SP) | +1 °C | −4922.42 | [−4988.33, −4856.51] | <1 × 10−16 |
| Cooling setpoint (Cooling_SP) | 18 → 28 °C (Δ10 °C) | −49,224.22 | [−49,883.34, −48,565.11] | <1 × 10−16 |
| Glazing SHGC | +0.10 | +2689.96 | [+2622.74, +2757.18] | <1 × 10−16 |
| Mean façade WWR | +0.10 | +2584.68 | [+2419.53, +2749.84] | 1.29 × 10−206 |
| Glazing U-value | +0.10 W/(m2·K) | +294.87 | [+236.19, +353.55] | 6.88 × 10−23 |
| Building length | +1 m | +2055.35 | [+2021.74, +2088.96] | <1 × 10−16 |
| Building depth | +1 m | +2083.59 | [+2050.28, +2116.89] | <1 × 10−16 |
| Building height | +1 m | +2429.04 | [+2367.18, +2490.91] | <1 × 10−16 |
| S.O.G type (1 vs. 0) | switch 0 → 1 | +1159.66 | [+797.46, +1521.86] | 3.49 × 10−10 |
| Wall type (2 vs. 0) | switch 0 → 2 | −575.70 | [−964.14, −187.25] | 3.68 × 10−3 |
| Input Variable | Sampling Type | Target Range/Levels | Coverage Metric | | | ||
|---|---|---|---|---|---|---|
| Length | Uniform random | 10–30 | 10-bin uniformity | 5.08 | 0.027 | 0.017 |
| Depth | Uniform random | 10–30 | 10-bin uniformity | 5.5 | 0.031 | 0.018 |
| Height | Uniform random | 4–15 | 10-bin uniformity | 5.25 | 0.031 | 0.027 |
| Orientation | Uniform random | 0–360 | 10-bin uniformity | 4.83 | 0.028 | 0.016 |
| WWR—South | Uniform random | 0.0000–0.7999 (≈0–80%) | 10-bin uniformity | 3.75 | 0.024 | 0.018 |
| WWR—East | Uniform random | 0.0000–0.7999 (≈0–80%) | 10-bin uniformity | 4.92 | 0.027 | 0.012 |
| WWR—North | Uniform random | 0.0000–0.8000 (≈0–80%) | 10-bin uniformity | 4.58 | 0.027 | 0.011 |
| WWR—West | Uniform random | 0.0001–0.7999 (≈0–80%) | 10-bin uniformity | 3.75 | 0.02 | 0.019 |
| Glazing U-value | Uniform random | 0.01–1.2 | 10-bin uniformity | 11.75 | 0.072 | 0.016 |
| Glazing SHGC | Uniform random | 0.01–0.99 | 10-bin uniformity | 10.42 | 0.055 | 0.027 |
| Glazing VT | Uniform random | 0.01–0.99 | 10-bin uniformity | 11.83 | 0.047 | 0.019 |
| Cooling setpoint | Uniform random | 18–28 | Level balance (n = 11) | 5.6 | 0.031 | 0.013 |
| Heating setpoint | Uniform random | 8–12 | Level balance (n = 5) | 3.42 | 0.024 | 0.017 |
| Approach | Primary Goal | How Simulations Are Selected | Typical Output | Strength/Best Use-Case |
|---|---|---|---|---|
| Proposed workflow (this study) | Generate a large, structured dataset for ML training and analysis | Broad sampling of parameter space (e.g., uniform random sampling across ranges) | Dataset of inputs + EnergyPlus outputs across many configurations | Best when the goal is dataset availability and design-space coverage for predictive modelling |
| Conventional simulation (case-based) | Evaluate a small number of design alternatives | Manually defined scenarios; limited runs | Detailed results for a few cases | Best for project-specific analysis; limited suitability for ML training due to small sample size |
| NSGA-II (multi-objective optimisation) | Find Pareto-optimal designs under multiple objectives | Iterative evolutionary search based on objective evaluation | Pareto front/optimal candidate solutions | Best for optimisation and trade-off exploration, not primarily intended for producing general-purpose datasets |
| BO-XGBoost-assisted optimisation (surrogate + search) | Accelerate optimisation using surrogate models | Iterative sampling guided by Bayesian optimisation and surrogate learning | Pareto front and surrogate model | Best when simulation is expensive and the aim is faster convergence to good designs; the dataset is typically optimisation-focused rather than broadly representative. |
| Dataset | Dataset Type | Typical Scope | Geography/Climate Coverage | Temporal Resolution | Input Variables (Design/Metadata) | Output Variables |
|---|---|---|---|---|---|---|
| Present study. https://doi.org/10.5281/zenodo.13622940 (accessed on 22 February 2026) | Simulation-derived (EnergyPlus via Grasshopper) | 12,000 residential parametric cases | Single hot–arid boundary condition (Cairo/New Cairo EPW) | Annual end-use outputs | Explicit early-design variables (geometry, orientation, WWR, glazing properties, setpoints, and discrete envelope options) | Annual heating, cooling, lighting, and equipment |
| Building Data Genome Project 2 (BDG2). https://github.com/buds-lab/building-data-genome-project-2. (accessed on 26 January 2026) | Measured metre time-series | 3053 m from 1636 buildings | Portfolio-based (multi-building; not a controlled single climate boundary condition) | Hourly (2016–2017) | Building-level metadata; limited explicit parametric geometry/envelope variables compared with early-design sweeps | Multiple metre types (electricity, heating/cooling water, steam, etc.) |
| ASHRAE Great Energy Predictor III (GEPIII). https://www.kaggle.com/c/ashrae-energy-prediction. (accessed on 26 January 2026) | Measured metre data for ML benchmarking | >1000 buildings; multiple metre types | Portfolio-based (multi-site); includes weather + building metadata | Hourly metre readings (multi-year) | Metadata + weather; not organised as a parametric early-design variable sweep | Metered usage for chilled water, electric, hot water, and steam |
| ResStock/End-Use Load Profiles (U.S.). https://resstock.nrel.gov/datasets. (accessed on 28 January 2026) | Simulation-derived stock model (calibrated/validated) | U.S. residential building stock (portfolio/stock) | U.S. climate regions (multi-climate) | 15-min calibrated load profiles (EULP) | Stock/characteristic variables; not primarily conceptual massing variables under a single controlled boundary condition | End-use load profiles (time-series) |
| ComStock/End-Use Load Profiles (U.S.). https://comstock.nrel.gov/ and https://natlabrockies.github.io/ComStock.github.io/docs/data.html (accessed on 22 February 2026) | Simulation-derived commercial stock model | U.S. commercial building stock | U.S. climate regions (multi-climate) | 15-min calibrated load profiles (EULP) | Stock/typology descriptors; not structured as an early-stage parametric geometry/envelope sweep | End-use load profiles (time-series) |
| Assumption | Observed in Dataset (N = 12,000) | Systematic Influence/Interpretation |
|---|---|---|
| Dataset scale and balance (envelope stratification) | 12,000 cases; 12 envelope combos (3 walls × 2 roofs × 2 S.O.G); exactly 1000 cases per combo; Wall codes: 0/1/2 = 4000 each; Roof codes: 0/1 = 6000 each; S.O.G: 0/1 = 6000 each | Prevents training bias toward one construction category; supports fair ML benchmarking across envelope classes. |
| Conceptual massing geometry | Length 10–30 (mean 20.01); Depth 10–30 (mean 20.08); Height 4–15 (mean 9.47); Orientation 0–360 (361 discrete values) | Captures early-stage “massing-level” effects; outputs reflect conceptual abstraction (not room-level zoning). Therefore, labels should not be interpreted as multi-zone detailed design truth. |
| Façade WWR (four sides) | South ~0.00003–0.79988; East ~0.000004–0.79993; North (“Nourth”) ~0.000035–0.79996; West ~0.000058–0.79992 | WWR strongly drives solar gains → cooling (especially in hot–arid climates); interactions with orientation and SHGC are expected and are a “real signal” in ML training. |
| Glazing properties | UValue 0.01–1.20 (120 discrete levels); SHGC 0.01–0.99 (99 levels); VT 0.01–0.99 (99 levels) | These bounds define what the model can learn; outside these ranges, ML predictions become extrapolation. SHGC changes are expected to systematically shift cooling demand through solar-gain control. |
| Thermostat setpoints (as labels are conditioned on them) | Cooling_SP: 18–28 °C (mean 22.97); Heating_SP: 8–12 °C (mean 9.99) | Setpoints directly shift delivered energy totals. ML predictions are only valid under the setpoint ranges used. |
| Energy label definition (target variable) | pEUI min 6903.46, max 214,819.62, mean 51,612.50, median 45,336.51; P5–P95: 18,484.28–105,763.72. | Highlights label scale and outliers; supports sanity checks and helps future users choose normalisation/log transforms for ML. |
| Fixed boundary conditions (not stored as columns; constant across all runs) | Climate (single EPW), residential schedules, internal gains, infiltration/ventilation, and HVAC efficiency proxies are held constant. | These constants systematically shift absolute energy magnitudes; ML models should be interpreted as conditional on these fixed assumptions, not universal across other climates/schedules/HVAC efficiencies. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wefki, H.; Elbeltagi, E.; Elnabwy, M.T.; ElAgroudy, M. Synthetic Residential Building Energy-Consumption Dataset Generation Through Parametric Simulation for Hot–Arid Egypt. Buildings 2026, 16, 976. https://doi.org/10.3390/buildings16050976
Wefki H, Elbeltagi E, Elnabwy MT, ElAgroudy M. Synthetic Residential Building Energy-Consumption Dataset Generation Through Parametric Simulation for Hot–Arid Egypt. Buildings. 2026; 16(5):976. https://doi.org/10.3390/buildings16050976
Chicago/Turabian StyleWefki, Hossam, Emad Elbeltagi, Mohamed T. Elnabwy, and Mohamed ElAgroudy. 2026. "Synthetic Residential Building Energy-Consumption Dataset Generation Through Parametric Simulation for Hot–Arid Egypt" Buildings 16, no. 5: 976. https://doi.org/10.3390/buildings16050976
APA StyleWefki, H., Elbeltagi, E., Elnabwy, M. T., & ElAgroudy, M. (2026). Synthetic Residential Building Energy-Consumption Dataset Generation Through Parametric Simulation for Hot–Arid Egypt. Buildings, 16(5), 976. https://doi.org/10.3390/buildings16050976

