Comparing the Performance of Regression and Machine Learning Models in Predicting the Usable Area of Houses with Multi-Pitched Roofs
Abstract
:1. Introduction
2. Materials and Methods
2.1. Polish Standards for Calculating Usable Floor Area
2.2. Data from Design Offices and Data on Existing Single-Family Houses in Koszalin
2.3. Linear and Non-Linear Modelling
2.4. Machine Learning Models
3. Results
3.1. Data Characteristics and Preprocessing
3.2. Linear Regression
3.3. Non-Linear Regression
3.4. Prediction of Usable Area in a Test Set for Linear, Non-Linear, and Regularised Models
3.5. Machine Learning for Buildings’ Designs
3.5.1. XGBoost and NN Results for the New Designs Dataset
3.5.2. Hybrid Models—Combining Regression with ML
- Values of the explanatory variable below 90 m2 are very rare, so a threshold can be set below which the estimated values of the AU variable will be replaced by a threshold value.
- The baseline linear D and non-linear DN models handle observations closer to the extreme values of the domain better than the regularised and ML models. In particular:
- Both D and DN models predict well the usable area values in the area of the domain referred to above (values lower than 90 m2), so it is possible to replace the estimated values of the AU variable that are below the threshold with the values estimated by model D or DN,
- Models D and DN predict usable area values for buildings with a large covered area quite well; thus, a threshold can be found above which the ML model estimate will be replaced by the D or DN model estimate (two thresholds were tested: 180 and 200 m2)
3.5.3. XGBoost and NN Results for the Old Designs Dataset
3.6. Machine Learning for Existing Buildings
4. Discussion
5. Final Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
BDOT10k | Database of Topographic Objects (pol. Baza Danych Obiektów Topograficznych) |
LiDAR | Light Detection and Ranging |
LoD | Level of Detail |
REPR | Real Estate Price Register (pol. Rejestr Cen Nieruchomości) |
MAE | Mean Absolute Error |
RMSE | Root Mean Squared Error |
ML | Machine Learning |
References
- Dawid, L.; Barańska, A.; Baran, P.; Ala-Karvia, U. Linear and Nonlinear Modelling of the Usable Area of Buildings with Multi-Pitched Roofs. Appl. Sci. 2024, 14, 11850. [Google Scholar] [CrossRef]
- Dydenko, J. Redakcja, Szacowanie Nieruchomości; Wolters Kluver: Warszawa, Poland, 2024. (In Polish) [Google Scholar]
- Sawiłow, E. Analysis of the real estate valuation methods in comparative approach. Geod. Rev. 2008, 80, 3–7. (In Polish) [Google Scholar]
- Cymerman, R.; Hopfer, A.; Kotlewski, L. Zasady Określania Wartości Nieruchomości: Metodyczne i Prawne; Educaterra: Olsztyn, Poland, 2022. (In Polish) [Google Scholar]
- Rozporządzenie Ministra Rozwoju, Pracy i Technologii z Dnia 27 Lipca 2021 r. w Sprawie Ewidencji Gruntów i Budynków, Dz.U. 2021 poz. 1390. Available online: https://isap.sejm.gov.pl/isap.nsf/DocDetails.xsp?id=WDU20210001390 (accessed on 12 January 2025). (In Polish)
- Hycner, R. Basics of the Cadastre; AGH University of Science and Technology Press: Kraków, Poland, 2004; pp. 241–282. (In Polish) [Google Scholar]
- Cienciała, A.; Sobolewska-Mikulska, K.; Sobura, S. Credibility of the cadastral data on land use and the methodology for their verification and update. Land Use Policy 2021, 102, 105204. [Google Scholar] [CrossRef]
- Wierzbicki, D.; Matuk, O.; Bielecka, E. Polish Cadastre Modernization with Remotely Extracted Buildings from High-Resolution Aerial Orthoimagery and Airborne LiDAR. Remote Sens. 2021, 13, 611. [Google Scholar] [CrossRef]
- Kocur-Bera, K.; Frąszczak, H. Coherence of Cadastral Data in Land Management—A Case Study of Rural Areas in Poland. Land 2021, 10, 399. [Google Scholar] [CrossRef]
- Larsson, K.; Paasch, J.M.; Paulsson, J. Representation of 3D cadastral boundaries—From analogue to digital. Land Use Policy 2020, 98, 104178. [Google Scholar] [CrossRef]
- Mika, M. An Analysis of Possibilities for the Establishment of a Multipurpose and Multidimensional Cadastre in Poland. Land Use Policy 2018, 77, 446–453. [Google Scholar] [CrossRef]
- Dawid, L. Analysis of Data Completeness in the Register of Real Estate Prices and Values Used for Real Estate Valuation on the Example of Koszalin District in the Years 2010–2016. Folia Oecon. Stetin. 2018, 18, 17–26. [Google Scholar] [CrossRef]
- Dawid, L. Analysis of Completeness of Data from the Price and Value Register on the Example of Kołobrzeg and Koszalin Districts in Years 2010–2017. Stud. Res. FEM SU 2018, 1, 91–102. (In Polish) [Google Scholar]
- Foryś, I.; Kokot, S. Problems with Real Estate Market Analysis. In Microeconomy in Theory and Practice; Res. Bull. Univ. Szczec.: Szczecin, Poland, 2001; pp. 175–182. (In Polish) [Google Scholar]
- Database of Topographic Objects (pol. Baza Danych Obiektów Topologicznych) (BDOT). Available online: https://www.geoportal.gov.pl/pl/dane/baza-danych-obiektow-topograficznych-bdot10k/ (accessed on 10 September 2024).
- Wężyk, P. (Ed.) Textbook for Participants of Trainings on Using LiDAR Products; Head Offi ce of Land Surveying and Cartography: Cracow, Poland, 2015. (In Polish) [Google Scholar]
- Ren, X.; Yu, B.; Wang, Y. Semantic Segmentation Method for Road Intersection Point Clouds Based on Lightweight LiDAR. Appl. Sci. 2024, 14, 4816. [Google Scholar] [CrossRef]
- 2.0 CityGML; Open Geospatial Consortium: Arlington, TX, USA, 2012.
- QGIS Development Team. QGIS Geographic Information System. Open Source Geospatial Foundation Project. Available online: http://qgis.osgeo.org (accessed on 21 May 2024).
- Head Office of Land Surveying and Cartography. Geoportal of National Spatial Data Infrastructure. Available online: https://www.geoportal.gov.pl/ (accessed on 12 May 2024).
- Dawid, L.; Cybiński, K.; Stręk, Z. Machine Learning of Usable Area of Gable-Roof Residential Buildings Based on Topographic Data. Remote Sens. 2023, 15, 863. [Google Scholar] [CrossRef]
- Dudzik, P. Geometria dachów (Roof geometry). Inżynieria I Bud. 2023, LXXIX, 293–298. (In Polish) [Google Scholar]
- Barańska, A. Linear and Nonlinear Weighing of Property Features. Real Estate Manag. Valuat. 2019, 27, 59–68. [Google Scholar] [CrossRef]
- Pinter, G.; Mosavi, A.; Felde, I. Artificial Intelligence for Modeling Real Estate Price Using Call Detail Records and Hybrid Machine Learning Approach. Entropy 2020, 22, 1421. [Google Scholar] [CrossRef]
- Baldominos, A.; Blanco, I.; Moreno, A.J.; Iturrarte, R.; Bernárdez, Ó.; Afonso, C. Identifying Real Estate Opportunities Using Machine Learning. Appl. Sci. 2018, 8, 2321. [Google Scholar] [CrossRef]
- Kim, J.; Lee, Y.; Lee, M.-H.; Hong, S.-Y. A Comparative Study of Machine Learning and Spatial Interpolation Methods for Predicting House Prices. Sustainability 2022, 14, 9056. [Google Scholar] [CrossRef]
- Dawid, L.; Tomza, M.; Dawid, A. Estimation of usable area of fl at-roof residential buildings using topographic data with machine learning methods. Remote Sens. 2019, 11, 2382. [Google Scholar] [CrossRef]
- Janowski, A.; Renigier-Biłozor, M.; Walacik, M.; Chmielewska, A. Remote measurement of building usable floor area–Algorithms fusion. Land Use Policy 2021, 100, 104938. [Google Scholar] [CrossRef]
- PN-70/B-02365; Surface Area of Buildings—Classification, Definitions, and Methods of Measurement. Polish Committee of Standardization: Warszawa, Poland, 1970. Available online: http://rzeczoznawca-zachodniopomorskie.pl/pliki/PN_70_B_02365.pdf (accessed on 22 April 2024). (In Polish)
- Zbroś, D. The Rules for Calculating the Usable Area by Two Current Polish Standards. Saf. Eng. Anthropog. Objects 2016, 3, 19–22. (In Polish) [Google Scholar]
- Pogorzelski, A.; Sieczkowski, J. Obliczanie Powierzchni i Kubatur Budynku; Polcen: Warszawa, Poland, 2023. (In Polish) [Google Scholar]
- PN-ISO 9836:1997; Performance Standards in Building—Definition and Calculation of Area and Space Indicators. Polish Commitee of Standardization: Warszawa, Poland, 1997. Available online: http://rzeczoznawca-zachodniopomorskie.pl/pliki/ PN_ISO_9836_1997.pdf (accessed on 10 April 2024). (In Polish)
- PN-ISO 9836:2015-12; Performance Standards in Building—Definition and Calculation of Area and Space Indicators. Polish Commitee of Standardization: Warszawa, Poland, 2015. Available online: https://sklep.pkn.pl/pn-iso-9836-2015-12p.html (accessed on 18 October 2024). (In Polish)
- Benduch, P.; Butryn, K. Legal and standard principles of buildings and their parts usable fl oor area quantity surveying. In Infrastructure and Ecology of Rural Areas; Polish Academy of Sciences: Cracow, Poland, 2018; pp. 225–238. ISSN 1732-5587. (In Polish) [Google Scholar]
- Pogorzelski, A.; Sieczkowski, J. Wysokość i powierzchnia użytkowa pomieszczeń w budynkach ze stropami pochyłymi. Bud. I Prawo 2024, 27, 17–20. (In Polish) [Google Scholar]
- Ebing, J. Calculating of Area and Cubic Volume of Facilities with Different Intended Use. Dashofer Sp. z o.o. Publishing House: Ljubljana, Slovenia, 2011; ISBN 978-83-7537-108-6. (In Polish) [Google Scholar]
- Benduch, P.; Hanus, P. The Concept of Estimating Usable Floor Area of Buildings Based on Cadastral Data. Rep. Geod. Geoinform. 2018, 105, 29–41. [Google Scholar] [CrossRef]
- Pogorzelski, A.; Sieczkowski, J. Wybrane zagadnienia dotyczące obliczania powierzchni zabudowy i powierzchni użytkowej budynków. Przegląd Bud. 2023, 94, 54–58. (In Polish) [Google Scholar]
- Regulation of the Minister of Transport, Construction and Maritime Economy of April 25, 2012 on Detailed Scope and Form of a Construction Project. In J. Laws; 2012; p. 462. Available online: https://isap.sejm.gov.pl/isap.nsf/DocDetails.xsp?id=wdu20120000462 (accessed on 20 May 2020). (In Polish)
- Lipińscy, M.L. Design Office. Houses Projects. Available online: https://lipinscy.pl/ (accessed on 21 May 2024).
- Mendel, B. ARCHON+ Project Office. Available online: https://www.archon.pl/ (accessed on 21 May 2024).
- Extradom, Design Office. Available online: https://www.extradom.pl/projekty (accessed on 2 December 2024).
- Barańska, A. Statystyczne Metody Analizy i Weryfikacji Proponowanych Algorytmów Wyceny Nieruchomości; AGH Publishing: Kraków, Poland, 2010. [Google Scholar]
- Santosa, F.; Symes, W.W. Linear inversion of band-limited reflection seismograms. SIAM J. Sci. Stat. Comput. 1986, 7, 1307–1330. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression Shrinkage and Selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the KDD’16 and the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–16 August 2016. [Google Scholar] [CrossRef]
- Shapley, L.S. Notes on the n-Person Game—II: The Value of an n-Person Game. RAND Corporation: Santa Monica, CA, USA, 1951; Available online: https://www.rand.org/content/dam/rand/pubs/research_memoranda/2008/RM670.pdf (accessed on 20 March 2025).
- Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, 3rd ed. 2025. Available online: https://christophm.github.io/interpretable-ml-book/ (accessed on 28 March 2025).
- Lundberg, S.M.; Lee, S. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; NIPS’17. Red Hook. pp. 4768–4777. Available online: https://dl.acm.org/doi/pdf/10.5555/3295222.3295230 (accessed on 25 March 2025).
- Liu, Y.; Just, A. SHAPforxgboost: SHAP Plots for ’XGBoost’. R package version 0.1.0. Available online: https://github.com/liuyanguu/SHAPforxgboost/ (accessed on 15 January 2025).
- Chawla, N.V.; Bowyer, K.W.; Lawrence, O.H.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Torgo, L.; Ribeiro, R.P.; Pfahringer, B.; Branco, P. SMOTE for Regression. In Progress in Artificial Intelligence; EPIA 2013; Lecture Notes in Computer Science; Correia, L., Reis, L.P., Cascalho, J., Eds.; Springer: Berlin, Heidelberg, 2013; Volume 8154. [Google Scholar] [CrossRef]
- Branco, P.; Torgo, L.; Ribeiro, R. SMOGN: A Pre-Processing Approach for Imbalanced Regression. Proc. Mach. Learn. Res. 2017, 74, 36–50. [Google Scholar]
- Song, X.Y.; Dao, N.; Branco, P. DistSMOGN: Distributed SMOGN for Imbalanced Regression Problems. Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications. Proc. Mach. Learn. Res. 2022, 183, 38–52. [Google Scholar]
- Kakoulaki, G.; Martinez, A.; Florio, P. Non-Commercial Light Detection and Ranging (LiDAR) Data in Europe; Publications Office of the European Union: Luxembourg, 2021; ISBN 978-92-76-41150-5. EUR 30817 EN. [Google Scholar]
Feature | Variable | Design Projects | Existing Buildings (Koszalin) |
---|---|---|---|
Usable area | AU | 48.12–302.42 m2 | 80.71–407.51 m2 |
Covered area | AC | 69.8–334.68 m2 | 100.5–394.71 m2 |
Number of storeys | SN | 1–2 | 1–2 |
Height | H | 5.7–9.57 m | 5.7–10.17 m |
Knee wall’s height | h | 0–1.8 m | No data |
Garage area | GA | 0–56.55 m2 | 0–66.25 m2 |
Number of rooms | R | No data | 4–12 |
Number of roof surfaces | RN | No data | 4–15 |
Presence of a boiler room | B | 0; 1 | No data |
Presence of an in-built garage (0.5–attached or partially in-built) | GP | No data | 0; 0.5; 1 |
Roof slope | RS | 23.8–40° | No data |
N = 219 | β | σ(β) | a | σ(a) | t(213) | p-Value |
---|---|---|---|---|---|---|
intercept | 12.080 | 8.301 | 1.455 | 0.147 | ||
AC | 1.086 | 0.040 | 0.532 | 0.019 | 27.363 | <0.001 |
H | 0.091 | 0.025 | 3.432 | 0.955 | 3.596 | <0.001 |
GA | −0.260 | 0.039 | −0.403 | 0.061 | −6.660 | <0.001 |
SN | −0.158 | 0.079 | −9.111 | 4.580 | −1.989 | 0.048 |
RS | 0.236 | 0.078 | 0.453 | 0.150 | 3.022 | 0.003 |
Independent Variable | Function Type |
---|---|
Covered area—AC | linear function |
Building’s height—H | linear function |
Garage area—GA | exponential function polynomial 2° |
Boiler room—B | linear function polynomial 3° |
Knee–wall’s height—h | polynomial 3° |
Number of storeys—SN | linear function |
Roof slope —RS | linear function |
N = 217 | β | σ(β) | a | σ(a) | t(210) | p-Value |
---|---|---|---|---|---|---|
intercept | 11.625 | 8.108 | 1.434 | 0.153 | ||
AC | 1.026 | 0.041 | 0.505 | 0.020 | 24.732 | <0.001 |
H | 0.102 | 0.024 | 3.950 | 0.939 | 4.208 | <0.001 |
GA | −0.521 | 0.083 | −0.818 | 0.131 | −6.252 | <0.001 |
GA2 | 0.324 | 0.089 | 0.013 | 0.004 | 3.638 | <0.001 |
SN | −0.130 | 0.076 | −7.665 | 4.480 | −1.711 | 0.089 |
RS | 0.214 | 0.075 | 0.419 | 0.146 | 2.863 | 0.005 |
Technique | Model Type | Design Data | Design Data (Old Dataset) | Existing Buildings |
---|---|---|---|---|
Regression | Linear | D | A+ 1 | C 1 |
Non-linear | DN | — | — | |
Regularised regression | LASSO | LASSO λ = 0.41 LASSO + λ = 0.72 | LASSO+ 1 | LASSO 1 |
Ridge | Ridge λ = 0.37 | — | — | |
Elastic net | Elastic net+ | — | — | |
Machine | XGBoost | XGBoost | XGBoost ods | XGBoost eb |
learning | Neural networks | NN6-(10-6) NN5-(10-5) | NN 6-(10-6)ods | NN 6-(10-6)eb |
Hybrid models | LASSO A180DN XGB A180D XGB A180DN NN5 90 NN5 90DNA180DN NN6 90 A180D NN6 90 A180DN | — | C+ LASSO LASSO A180C NN6 A200C | |
XGB + NN6 |
Linear Models | Non-Linear Model | |||||
---|---|---|---|---|---|---|
Model D | LASSO λ = 0.41+ | Ridge λ = 0.37 | Elastic net+ | LASSO+ λ = 0.72 | ModelDN | |
MAE | 12.82 | 13.33 | 13.52 | 13.39 | 13.04 | 12.12 |
RMSE | 15.09 | 16.15 | 16.23 | 16.20 | 15.67 | 14.09 |
27.73 | 35.08 | 35.05 | 35.19 | 34.12 | 23.83 |
ML Model | Neural Networks | ||
---|---|---|---|
XGBoost | NN6-(10-6) | NN5-(10-5) | |
MAE | 12.61 | 11.89 | 11.15 |
RMSE | 15.64 | 15.19 | 15.15 |
34.09 | 33.83 | 37.67 |
Hybrid Models | ||||||||
---|---|---|---|---|---|---|---|---|
LASSO A180DN | XGB A180D | XGB A180DN | NN5 90 | NN5 90DN A180DN | NN6 90 A180D | NN6 90 A180DN | XGB +NN6 | |
MAE | 11.69 | 10.11 | 9.58 | 10.88 | 10.60 | 9.26 | 8.73 | 8.87 |
RMSE | 13.58 | 12.39 | 11.60 | 14.71 | 12.83 | 12.07 | 11.25 | 11.15 |
23.06 | 27.73 | 23.06 | 34.95 | 23.06 | 27.73 | 23.06 | 23.06 |
Linear Models * | ML and Neural Networks ** | |||
---|---|---|---|---|
Model A+ | LASSO+ | XGBoost ods | NN 6-(10-6)ods | |
MAE | 23.19 | 27.42 | 27.58 | 19.51 |
RMSE | 30.22 | 33.57 | 32.05 | 23.31 |
59.36 | 67.22 | 53.13 | 41.60 |
Linear Models 1 | ML and Neural Nets 2 | Hybrid Models 2 | |||||
---|---|---|---|---|---|---|---|
Model C | LASSO | XGBoost eb | NN6(10-6)eb | C+ LASSO | LASSO A180C | NN6 A200C | |
MAE | 20.95 | 20.43 | 15.94 | 10.81 | 16.65 | 16.95 | 9.99 |
RMSE | 27.92 | 24.16 | 18.81 | 12.77 | 23.12 | 21.14 | 11.91 |
55.34 | 39.55 | 30.73 | 20.73 | 47.45 | 39.55 | 20.73 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dawid, L.; Barańska, A.M.; Baran, P. Comparing the Performance of Regression and Machine Learning Models in Predicting the Usable Area of Houses with Multi-Pitched Roofs. Appl. Sci. 2025, 15, 6297. https://doi.org/10.3390/app15116297
Dawid L, Barańska AM, Baran P. Comparing the Performance of Regression and Machine Learning Models in Predicting the Usable Area of Houses with Multi-Pitched Roofs. Applied Sciences. 2025; 15(11):6297. https://doi.org/10.3390/app15116297
Chicago/Turabian StyleDawid, Leszek, Anna Marta Barańska, and Paweł Baran. 2025. "Comparing the Performance of Regression and Machine Learning Models in Predicting the Usable Area of Houses with Multi-Pitched Roofs" Applied Sciences 15, no. 11: 6297. https://doi.org/10.3390/app15116297
APA StyleDawid, L., Barańska, A. M., & Baran, P. (2025). Comparing the Performance of Regression and Machine Learning Models in Predicting the Usable Area of Houses with Multi-Pitched Roofs. Applied Sciences, 15(11), 6297. https://doi.org/10.3390/app15116297