Improving Building Heat Load Forecasting Models with Automated Identification and Attribution of Day Types
Abstract
1. Introduction
2. Literature Review
3. Methodology
- -
- Step 1: Matching data sources and pre-processing of the datasets
- -
- Step 2: Identification of day types (DT) by means of the clusterization of intra-day HL patterns
- -
- Step 3: Development of a classification model for the attribution of specific DT each day, based only on exogenous information
- -
- Step 4: Development of hourly HL prediction models for each DT
- -
- Step 5: Evaluation of the prediction efficiency using error metrics.
3.1. Data Source and Definition of Buildings
- Building A (Residential): Characterized by continuous SH and DHW demand throughout the year.
- Building B (Educational): Includes both SH and DHW loads year-round. Unlike Building A, its HL profile is non-linear, with multiple trends emerging under low outdoor temperature conditions.
- Building C (Commercial): Exhibits very low HLs during summer months, while in winter, two distinct load trends are observed under low outdoor temperatures.
3.2. Data Preprocessing
3.3. Day Type Identification and Attribution
3.4. Heat–Load Prediction Models
- -
- The so-called Q–T algorithm [8]: A piecewise linear regression model that characterizes HLs (Q) as the maximum of two components: a baseload, independent of ambient conditions (e.g., DHW demand), and a variable load, defined as a negative linear relationship with outdoor temperature (T) and solar irradiation.
- -
- Multivariate Linear Regression (MVLR): A linear regression model in which HLs are predicted directly from climatic variables.
- -
- -
- -
- -
- Climatic data: hourly outdoor temperature and solar irradiation on the horizontal plane, consistent with the significance analysis in [8].
- -
- Calendar data: hour of the day, day of the week, month, holiday indicator, and attributed Day Type (DT, as defined in Section 3.3). Depending on the model, this information is either used to generate separate models for calendar-segmented subsets (e.g., Q–T model for Mondays at 8 h) or incorporated directly as additional input variables (e.g., day of the week included alongside climate data in RF).
- -
- Without DT attribution: using only climate and calendar data.
- -
- With DT attribution: using climate and calendar data, along with the attributed DT from Section 3.3. In this case, the DT information is passed to the prediction models, acknowledging the potential impact of misclassification errors.
3.5. Model Validation and Error Metrics
4. Results & Discussion
4.1. DT Identification and Attribution
4.2. Heat Load Prediction
4.2.1. Performance of Linear Models
4.2.2. Performance of ML Models
5. Conclusions
- Unsupervised learning is employed to cluster HL profiles into a set of Day Types (DTs). Unlike conventional approaches based on the day of the week, this method assigns DTs based on actual performance, allowing for the better representation of variations such as bank holidays and seasonal shifts in peak loads.
- Supervised learning is applied to attribute DTs to each day using only exogenous information (e.g., calendar and weather data).
- Supervised learning is then used to predict hourly HLs for each building.
- Prediction accuracy: All models achieve good predictive performance, with MAPE values below 1.5% across all cases. R2 values range from moderate (~0.5–0.7 for Building B) to good (~0.7–0.85).
- The most accurate model varies for each case: In building A, RF, SVR, and XGB perform comparably and substantially outperform MVLR. For Building C, XGB achieves the highest accuracy, with SVR and MVLR_2 outperforming RF. And for Building B, all models except MVLR_1 achieve similar MAPE values; MVLR_1 shows the highest (but still relatively low) R2.
- Computational efficiency: All models exhibit substantially lower computation times than the Q–T algorithm. Even with the additional DT attribution step, the total computation time remains lower than for Q–T, and the time required for DT classification can be offset by faster HL models.
- Impact of DT attribution on model accuracy: DT attribution improves predictive accuracy in most cases, with gains ranging from 2% to 50%, without significant computational overhead. This is particularly noticeable in Buildings B and C when assessing MVLR and RF (only in Building C).
- Impact of DT attribution on model size: DT-based segmentation allows for smaller models, reducing the number of independent hourly models required. For Buildings B and C, three DTs are sufficient, halving a model size relative to Q–T. Building A required six DTs.
- Impact of DT attribution on computational efficiency: Passing DT information to models reduces the need to internally infer occupant behavior, contributing to reduced computation times—up to 90% faster for MVLR compared to Q–T.
- ML model comparison: XGB consistently achieves the highest predictive performance across all three buildings. MVLR and XGB models maintain computation times below 20 s for all buildings, whereas RF and SVR show higher and more variable runtimes (25–60 s, up to 120 s for Building C).
- -
- In our approach, we have been able to identify different usage patterns (clusters), but there is still a clearly observable physical significance in energy performance, associated with a cold climate. To what extent will this be so in milder climates? Is this approach also possible for buildings with heating and cooling loads?
- -
- The record in our dataset corresponds to actual observations in heat loads and meteorological data. In forecasting applications, it is likely that some deviation between the predicted and actual climate will result in variations in the actual load. Its impact on the accuracy of our method is yet to be determined.
- -
- The proposed methods develop a pattern identification and attribution process, based on the performance for a single year (2019). Can these patterns and their distribution throughout the year be stable and predictable with the CART approach? This is a relevant question calling for longitudinal research, even without considering the sharp behavioral changes in events such as those arising from the COVID crisis in 2020.
- -
- We define DT based on calendar information, where bank holidays are encoded. But our approach did not explicitly consider behavioral holidays (i.e., periods between a bank holiday and a weekend). Were these to be considered, it would potentially be a source for accuracy improvement in the CART process.
- -
- What is the trade-off when defining the optimal number of clusters? We believe that clustering processes are only reasonable to use if they allow for a reduction in model size and/or sorting out specific seasonal performance (i.e., spring break), but any cluster should still be useful in describing a relatively large number of days. Considering this, the number of clusters should be somewhere between 3 and 10–15, but this requires further research.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Nomenclature
Acronyms | |
ARX | Auto-Regressive models with eXogenous |
ASHRAE | American Society of Heating, Refrigerating, and Air-Conditioning Engineers |
DBSCAN | Density-Based Spatial Clustering of Applications with Noise |
DT | Day type |
DHN | District-Heating Network |
DHW | Domestic Hot Water |
EC | European Commission |
ES | Energy Signature |
EU | European Union |
HDD | Heating Degree Day |
HL | Heat Load |
HVAC | Heating Ventilation and Air Conditioning |
kNN | k-Nearest Neighborhood |
MAPE | Mean Absolute Percentage Error |
ML | Machine Learning |
MVLR | Multi Variate Linear Regression |
NN | Neural Network |
PRISM | PRInceton Scorekeeping Method |
Q–T | So-called Q–T algorithm in [8] |
RES | Renewable Energy Sources |
RF | Random Forest |
SARIMA | Seasonal Autoregressive Integrated Moving Average |
SH | Space Heating |
SVR | Support Vector Regressor |
XGB | Extreme Gradient Boosting |
Symbols | |
GT | Solar Radiation [W/m2] |
MAPE | Mean Absolute Percentage Error [%] |
n | Number of observations [-] |
R2 | Coefficient of Determination [-] |
TOUT | Outdoor temperature [°C] |
Y | Predicted Heat Load [kWh] or heat load vector |
Known Heat Load [kWh] or heat load vector | |
Load Mean Value [kWh] |
References
- International Energy Agency. Global Energy & CO2 Status Report 2019; IEA: Paris, France, 2019. [Google Scholar]
- Pérez-Lombard, L.; Ortiz, J.; Pout, C. A review on buildings energy consumption information. Energy Build. 2008, 40, 394–398. [Google Scholar] [CrossRef]
- Somu, N.; Ramam, G.; Ramamritham, K. A hybrid model for building energy consumption forecasting using long short term memory networks. Appl. Energy 2020, 261, 114131. [Google Scholar] [CrossRef]
- Jang, J.; Han, J.; Leigh, S.B. Prediction of heating energy consumption with operation pattern variables for non-residential buildings using LSTM networks. Energy Build. 2022, 255, 111647. [Google Scholar] [CrossRef]
- European Commission. Directive 2012/27/EU of the European Parliament and of the Council of 25 October 2012 on Energy Efficiency, Amending Directives 2009/125/EC and 2010/30/EU and Repealing Directives 2004/8/EC and 2006/32/EC Text with EEA Relevance OJ L 315; European Commission: Brussels, Belgium, 2012. [Google Scholar]
- European Commission. Directive (EU) 2018/844 of the European Parliament and of the Council of 30 May 2018 Amending Directive 2010/31/EU on the Energy Performance of Buildings and Directive 2012/27/EU on Energy Efficiency; European Commission: Brussels, Belgium, 2018. [Google Scholar]
- Liu, D.; Wang, W.; Liu, J. Sensitivity Analysis of Meteorological Parameters on Building Energy Consumption. Energy Procedia 2017, 132, 634–639. [Google Scholar] [CrossRef]
- Lumbreras, M.; Garay-Martinez, R.; Arregi, B.; Martin-Escudero, K.; Diarce, G.; Raud, M.; Hagu, I. Data driven model for heat load prediction in buildings connected to District Heating by using smart heat meters. Energy 2022, 239, 122318. [Google Scholar] [CrossRef]
- Buffa, S.; Cozzini, M.; D’Antoni, M.; Baratieri, M.; Fedrizzi, R. 5th generation district heating and cooling systems: A review of existing cases in Europe. Renew. Sustain. Energy Rev. 2019, 104, 504–522. [Google Scholar] [CrossRef]
- Frederiksen, S.; Werner, S. District Heating and Cooling; Studentlitteratur: Lund, Sweden, 2013; ISBN 9789144085302. [Google Scholar]
- Garay-Martinez, R.; Garrido-Marijuan, A. (Eds.) Handbook of Low Temperature District Heating; Green Energy and Technology; Springer: Cham, Swizerland, 2022; ISBN 978-3-031-10409-1. [Google Scholar] [CrossRef]
- Lumbreras, M.; Garay, R.; Marijuan, A.G. Energy meters in District-Heating Substations for Heat Consumption Characterization and Prediction Using Machine-Learning Techniques. IOP Conf. Ser. Earth Environ. Sci. 2020, 588, 032007. [Google Scholar] [CrossRef]
- Eguiarte, O.; Garrido-Marijuan, A.; Garay-Martinez, R.; Raud, M.; Hagu, I. Data-driven assessment for the supervision of District Heating Networks. Energy Rep. 2022, 8 (Suppl. 16), 34–40. [Google Scholar] [CrossRef]
- Sakkas, N.P.; Abang, R. Thermal load prediction of communal district heating systems by applying data-driven machine learning methods. Energy Rep. 2022, 8, 1883–1895. [Google Scholar] [CrossRef]
- do Carmo, C.M.R.; Christensen, T.H. Cluster analysis of residential heat load profiles and the role of technical and household characteristics. Energy Build. 2016, 125, 171–180. [Google Scholar] [CrossRef]
- Andersen, F.M.; Larsen Hv Boomsma, T.K. Long-term forecasting of hourly electricity load: Identification of consumption profiles and segmentation of customers. Energy Convers. Manag. 2013, 68, 244–252. [Google Scholar] [CrossRef]
- Hu, Y.; Li, J.; Hong, M.; Ren, J.; Man, Y. Industrial artificial intelligence based energy management system: Integrated framework for electricity load forecasting and fault prediction. Energy 2022, 244, 123195. [Google Scholar] [CrossRef]
- Jang, Y.; Byon, E.; Jahani, E.; Cetin, K. On the long-term density prediction of peak electricity load with demand side management in buildings. Energy Build. 2020, 228, 110450. [Google Scholar] [CrossRef]
- Chen, S.; Ren, Y.; Friedrich, D.; Yu, Z.; Yu, J. Prediction of office building electricity demand using artificial neural network by splitting the time horizon for different occupancy rates. Energy AI 2021, 5, 100093. [Google Scholar] [CrossRef]
- Dagdougui, H.; Bagheri, F.; Le, H.; Dessaint, L. Neural network model for short-term and very-short-term load forecasting in district buildings. Energy Build. 2019, 203, 109408. [Google Scholar] [CrossRef]
- Sandberg, A.; Wallin, F.; Li, H.; Azaza, M. An Analyze of Long-term Hourly District Heat Demand Forecasting of a Commercial Building Using Neural Networks. Energy Procedia 2017, 105, 3784–3790. [Google Scholar] [CrossRef]
- Cholewa, T.; Siuta-Olcha, A.; Smolarz, A.; Muryjas, P.; Wolszczak, P.; Guz, Ł.; Balaras, C.A. On the short term forecasting of heat power for heating of building. J. Clean. Prod. 2021, 307, 127232. [Google Scholar] [CrossRef]
- el Bouchefry, K.; de Souza, R.S. Learning in Big Data: Introduction to Machine Learning. In Knowledge Discovery in Big Data from Astronomy and Earth Observation: Astrogeoinformatics; Elsevier: Amsterdam, The Netherlands, 2020; pp. 225–249. [Google Scholar] [CrossRef]
- Belyadi, H.; Haghighat, A. Supervised learning. In Machine Learning Guide for Oil and Gas Using Python; Gulf Professional Publishing: Cambridge, MA, USA, 2021; pp. 169–295. [Google Scholar] [CrossRef]
- Celebi, M.E.; Aydin, K. Unsupervised Learning Algorithms; Springer International Publishing: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
- Hammarsten, S. A critical appraisal of energy-signature models. Appl. Energy 1987, 26, 97–110. [Google Scholar] [CrossRef]
- Fels, M.F. PRISM: An introduction. Energy Build. 1986, 9, 5–18. [Google Scholar] [CrossRef]
- Kissock, J.K.; Haberl, J.S.; Claridge, D.E. Change-Point Linear and Multiple-Linear Inverse Building Energy Analysis Models; Energy Systems Laboratory, Texas A&M University: College Station, TX, USA, 2002. [Google Scholar]
- Ferbar Tratar, L.; Strmčnik, E. The comparison of Holt–Winters method and Multiple regression method: A case study. Energy 2016, 109, 266–276. [Google Scholar] [CrossRef]
- Verbai, Z.; Lakatos, Á.; Kalmár, F. Prediction of energy demand for heating of residential buildings using variable degree day. Energy 2014, 76, 780–787. [Google Scholar] [CrossRef]
- Zhan, S.; Liu, Z.; Chong, A.; Yan, D. Building categorization revisited: A clustering-based approach to using smart meter data for building energy benchmarking. Appl. Energy 2020, 269, 114920. [Google Scholar] [CrossRef]
- Lumbreras, M.; Diarce, G.; Martin, K.; Garay-Martinez, R.; Arregi, B. Unsupervised recognition and prediction of daily patterns in heating loads in buildings. J. Build. Eng. 2023, 65, 105732. [Google Scholar] [CrossRef]
- Grosswindhager, S.; Voigt, A.; Kozek, M. Online Short-Term Forecast of System Heat Load in District Heating Networks. In Proceedings of the 31st International Symposium on Forecasting, Prague, Czech Republic, 27–29 June 2011. [Google Scholar]
- Eguizabal, M.; Garay-Martinez, R.; Flores-Abascal, I. Simplified model for the short-term forecasting of heat loads in buildings. Energy Rep. 2022, 8 (Suppl. 16), 79–85. [Google Scholar] [CrossRef]
- De Eulate, I.G.; Garay-Martinez, R.; Goikolea, B.A.; Eguiarte, O.; Macarulla, A.M. Simplified geometric processing of solar radiation for improved data-driven modelling of short-term energy & comfort performance in buildings. In Proceedings of the 2024 9th International Conference on Smart and Sustainable Technologies (SpliTech), Bol and Split, Croatia, 25–28 June 2024. [Google Scholar] [CrossRef]
- Bacher, P.; Madsen, H.; Nielsen, H.A.; Perers, B. Short-term heat load forecasting for single family houses. Energy Build. 2013, 65, 101–112. [Google Scholar] [CrossRef]
- Lei, L.; Chen, W.; Wu, B.; Chen, C.; Liu, W. A building energy consumption prediction model based on rough set theory and deep learning algorithms. Energy Build. 2021, 240, 110886. [Google Scholar] [CrossRef]
- Potočnik, P.; Škerl, P.; Govekar, E. Machine-learning-based multi-step heat demand forecasting in a district heating system. Energy Build. 2021, 233, 110673. [Google Scholar] [CrossRef]
- Dong, Z.; Liu, J.; Liu, B.; Li, K.; Li, X. Hourly energy consumption prediction of an office building based on ensemble learning and energy consumption pattern classification. Energy Build. 2021, 241, 110929. [Google Scholar] [CrossRef]
- MacQueen, J.B. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 21 June–18 July 1965 and 27 December 1965–7 January 1966; Volume 1, pp. 281–297. [Google Scholar]
- Park, J.Y.; Yang, X.; Miller, C.; Arjunan, P.; Nagy, Z. Apples or oranges? Identification of fundamental load shape profiles for benchmarking buildings using a large and diverse dataset. Appl. Energy 2019, 236, 1280–1295. [Google Scholar] [CrossRef]
- Wen, L.; Zhou, K.; Yang, S. A shape-based clustering method for pattern recognition of residential electricity consumption. J. Clean. Prod. 2019, 212, 475–488. [Google Scholar] [CrossRef]
- Gianniou, P.; Liu, X.; Heller, A.; Nielsen, P.S.; Rode, C. Clustering-based analysis for residential district heating data. Energy Convers. Manag. 2018, 165, 840–850. [Google Scholar] [CrossRef]
- R Core-Team. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2013. [Google Scholar]
- University of Tartu, Institute of Physics, Laboratory of Environmental Physics. 2021. Available online: http://meteo.physic.ut.ee/?lang=en (accessed on 30 September 2022).
- Ester, M.; Kriegel, H.; Xu, X.; Miinchen, D. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the 2nd ACM SIGKDD, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
- Hashler, M.; Piekenbrock, M.; Arya, S.; Mount, D.R. Package ’dbscan’ 2020. 2021. Available online: https://cran.r-project.org/ (accessed on 30 September 2022).
- Walsh, A.; Cóstola, D.; Labaki, L.C. Performance-based climatic zoning method for building energy efficiency applications using cluster analysis. Energy 2022, 255, 124477. [Google Scholar] [CrossRef]
- Fix, E.; Hodges, J.L. Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties. Int. Stat. Rev. 1989, 57, 238. [Google Scholar] [CrossRef]
- class: Functions for Classification 2022. Available online: https://cran.r-project.org/package=class (accessed on 30 September 2022).
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Meyer, D.; Dimitriadou, E.; Weingessel, A.; Leisch, F. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien 2022. Available online: https://cran.r-project.org/ (accessed on 30 September 2022).
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Liaw, A.; Wiener, M. Classification and Regression by randomforest.r News, vol 2/3, 2002. Available online: https://journal.r-project.org/issues/2002-3/ (accessed on 6 October 2025).
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16), San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Chen, T.; He, T.; Benesty, M. xgboost: Extreme Gradient Boosting. 2022. Available online: https://cran.r-project.org/ (accessed on 30 September 2022).
- ASHRAE. ASHRAE Guideline 14-2014, Measurement of Energy, Demand, and Water Savings; ASHRAE: Atlanta, GA, USA, 2014. [Google Scholar]
Variable | Units | Type of Data |
---|---|---|
Daily Mean Temperature | °C | Numeric |
Daily Total Radiation | Wh/m2 | Numeric |
Holiday | [-] | Boolean |
Day of the Week | [-] | Categorical |
Month of the Year | [-] | Categorical |
Day preceding a Holiday | [-] | Boolean |
Algorithm | Model | TOUT | GT | Week Day | Month | Hour Day | Holiday | Cluster |
---|---|---|---|---|---|---|---|---|
Q–T | Q–T | X | X | X | ||||
MVLR | MVLR_1 | X | X | X | ||||
MVLR_2 | X | X | X | X | ||||
SVR | SVR_1 | X | X | X | X | X | X | |
SVR_2 | X | X | X | X | X | X | X | |
RF | RF_1 | X | X | X | X | X | X | |
RF_2 | X | X | X | X | X | X | X | |
XGB | XGB_1 | X | X | X | X | X | X | |
XGB_2 | X | X | X | X | X | X | X |
Q–T R2 [-] | MVLR_2 Optimal Number of Clusters | MVLR_2 R2 [%] | Q–T Time [s] | MVLR_2 Time [%] | |
---|---|---|---|---|---|
Building A | 0.867 | 6 | −1.98% | 123.81 | −90.79% |
Building B | 0.704 | 3 | +3.81% | 93.27 | −95.50% |
Building C | 0.811 | 3 | −2.53% | 82.97 | −86.60% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lumbreras, M.; Garay-Martinez, R.; Diarce, G.; Martin-Escudero, K.; Arregi, B. Improving Building Heat Load Forecasting Models with Automated Identification and Attribution of Day Types. Buildings 2025, 15, 3604. https://doi.org/10.3390/buildings15193604
Lumbreras M, Garay-Martinez R, Diarce G, Martin-Escudero K, Arregi B. Improving Building Heat Load Forecasting Models with Automated Identification and Attribution of Day Types. Buildings. 2025; 15(19):3604. https://doi.org/10.3390/buildings15193604
Chicago/Turabian StyleLumbreras, Mikel, Roberto Garay-Martinez, Gonzalo Diarce, Koldobika Martin-Escudero, and Beñat Arregi. 2025. "Improving Building Heat Load Forecasting Models with Automated Identification and Attribution of Day Types" Buildings 15, no. 19: 3604. https://doi.org/10.3390/buildings15193604
APA StyleLumbreras, M., Garay-Martinez, R., Diarce, G., Martin-Escudero, K., & Arregi, B. (2025). Improving Building Heat Load Forecasting Models with Automated Identification and Attribution of Day Types. Buildings, 15(19), 3604. https://doi.org/10.3390/buildings15193604