Explainable AI for Urban Real-Estate Prediction: A Machine-Learning Framework for Urban Decision Support
Abstract
1. Introduction
- RQ1: To what extent do advanced non-linear machine learning models, e.g., MLP, Random Forest, LightGBM, provide competitive predictive performance relative to simpler linear baselines, and what is gained in explanatory structure and decision-support value?
- RQ2: Can the integration of Explainable AI (XAI) techniques, specifically SHAP values, effectively bridge the gap between ‘black-box’ predictive models and the interpretability requirements of international professional valuation standards (e.g., IVS, RICS)?
- RQ3: Does treating missing data as an informative signal (Missing Not At Random—MNAR) rather than mere noise enhance the model’s capacity to capture context-dependent market behaviours?
2. Related Work
3. Materials and Methods
3.1. RE-VAL Framework Architecture
- Data Preprocessing and MNAR encoding cleaning, harmonization, and treatment of missing information through informative proxy signals preservation.
- Feature Transformation: Differentiated preprocessing of heterogeneous tabular inputs, with geographically informed encoding for location-related information, conventional encoding for the remaining categorical variables, and standard preprocessing for numerical features.
- Multi-model Predictive Benchmarking: Systematic evaluation of alternative predictive models within a common experimental setting.
- Post hoc Interpretability Analysis: Examination of model behaviour through SHAP-based explainability to identify feature contributions, non-linear effects, and context-sensitive valuation patterns.
- Operational Monetary Synthesis: Translation of model outputs into currency-based Bonus/Malus adjustment tables to support appraisal reasoning, scenario analysis, and auditable urban decision-making.
3.2. Data Harvesting and Multidimensional Harmonization
- Property Class: Grouped into four market segments (economic, middle-range, upscale, and luxury), reflecting increasing levels of prestige and construction quality.
- Renovation Status: Coded into four segments (new/under construction, excellent/renovated, good/habitable, and to be renovated).
- Exposure: Classified according to the prevailing orientation or openness of the unit (e.g., north, south, east, west, double exposure, internal, external),
- Window: Encoded with attention to insulation quality: (single, double, and triple glazing).
- Heating: Represented through both heating system type (mainly autonomous vs. centralized) and heating source/delivery system (e.g., air-based, radiator-based, underfloor heating, stove-based), while the corresponding energy supply was retained when specified.
- Energy Class: Mapped according to the Italian certification scale (from A to G).
3.3. Informative Proxy Signal Encoding (MNAR Management)
3.4. Comparative Predictive Benchmarking
3.5. XAI-Driven Monetary Synthesis
4. Case Study: Residential Market in Cagliari, Italy
4.1. Dataset Composition and Feature Engineering
- Structural Attributes: Net area, floor level, and presence of balconies/terraces.
- Qualitative Features: Renovation status (ranging from “to be refurbished” to “new construction”), heating systems, and window frame types.
- Energy and Sustainability: Energy class (A to G), included to capture the possible valuation effect of energy performance in the local market.
- Location Proxies: Instead of explicit GPS coordinates, the model utilizes 11 distinct urban zones (categorical proxies) to capture spatial heterogeneity and local prestige effects (e.g., the historic centre Zone B4).
4.2. Missing Data as Informative Signals
4.3. Experimental Setup
4.4. Interpretative Results and Comparative Explainability (SHAP Analysis)
4.5. Substantive Insights for the RE-VAL Framework
5. Discussion: Beyond Predictive Accuracy
5.1. Explanatory Balance as a Criterion of Model Adequacy
5.2. Robustness to Incomplete Information Through Informative Proxy Signals
6. Practical Implications: Decision-Support Potential of the RE-VAL Framework
6.1. Supporting Context-Sensitive Appraisal Reasoning
6.2. Operational Use Under Imperfect Listing Conditions
6.3. From Point-Estimates to Decision Support: Monetary Bonus/Malus Synthesis
7. Conclusions and Policy Implications
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wei, C.; Fu, M.; Wang, L.; Yang, H.; Tang, F.; Xiong, Y. The Research Development of Hedonic Price Model-Based Real Estate Appraisal in the Era of Big Data. Land 2022, 11, 334. [Google Scholar] [CrossRef]
- Simonotti, M. I Procedimenti di Stima su Larga Scala (Mass Appraisal). In Ce. SET: Quaderni. 8-Mercato Immobiliare, Innovazione e Gestione Dei Catasti Urbani; Firenze University Press: Florence, Italy, 2002; Volume 1, pp. 446–471. Available online: https://www.torrossa.com/en/resources/an/2241278 (accessed on 7 April 2026).
- International Valuation Standards Council (IVSC). Standards. Available online: https://ivsc.org/standards/ (accessed on 7 April 2026).
- TEGOVA. European Valuation Standards (EVS). Available online: http://tegova.org/european-valuation-standards-evs (accessed on 7 April 2026).
- RICS Valuation—Global Standards (Red Book). Available online: https://www.rics.org/profession-standards/rics-standards-and-guidance/sector-standards/valuation-standards/red-book (accessed on 7 April 2026).
- Sirmans, G.S.; Macpherson, D.A.; Zietz, E.N. The Composition of Hedonic Pricing Models. J. Real Estate Lit. 2005, 13, 3–43. [Google Scholar] [CrossRef]
- Bárcena, M.J.; Menéndez, P.; Palacios, M.B.; Tusell, F. Alleviating the Effect of Collinearity in Geographically Weighted Regression. J. Geogr. Syst. 2014, 16, 441–466. [Google Scholar] [CrossRef]
- Osland, L. An Application of Spatial Econometrics in Relation to Hedonic House Price Modeling. J. Real Estate Res. 2010, 32, 289–320. [Google Scholar] [CrossRef]
- Jang, M.; Kang, C.-D. Retail Accessibility and Proximity Effects on Housing Prices in Seoul, Korea: A Retail Type and Housing Submarket Approach. Habitat Int. 2015, 49, 516–528. [Google Scholar] [CrossRef]
- Chica-Olmo, J.; Cano-Guervos, R.; Tamaris-Turizo, I. Determination of Buffer Zone for Negative Externalities: Effect on Housing Prices. Geogr. J. 2019, 185, 222–236. [Google Scholar] [CrossRef]
- Ma, J.; Cheng, J.C.P.; Jiang, F.; Chen, W.; Zhang, J. Analyzing Driving Factors of Land Values in Urban Scale Based on Big Data and Non-Linear Machine Learning Techniques. Land Use Policy 2020, 94, 104537. [Google Scholar] [CrossRef]
- Guo, J.; Chiang, S.; Liu, M.; Yang, C.-C.; Guo, K. Can Machine Learning Algorithms Associated with Text Mining from Internet Data Improve Housing Price Prediction Performance? Int. J. Strateg. Prop. Manag. 2020, 24, 300–312. [Google Scholar] [CrossRef]
- Wu, C.; Ye, X.; Ren, F.; Du, Q. Modified Data-Driven Framework for Housing Market Segmentation. J. Urban Plan. Dev. 2018, 144, 04018036. [Google Scholar] [CrossRef]
- Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data; John Wiley & Sons: Hoboken, NJ, USA, 2019; ISBN 978-0-470-52679-8. [Google Scholar]
- Çılgın, C.; Gökşen, Y.; Gökçen, H. The Effect of Outlier Detection Methods in Real Estate Valuation with Machine Learning [Makine Öğrenimi İle Mülk Değerlemesinde Aykırı Değer Tespit Yöntemlerinin Etkisi]. İzmir J. Soc. Sci. 2023, 5, 9–20. [Google Scholar] [CrossRef]
- Trindade Neves, F.; Aparicio, M.; de Castro Neto, M. The Impacts of Open Data and eXplainable AI on Real Estate Price Predictions in Smart Cities. Appl. Sci. 2024, 14, 2209. [Google Scholar] [CrossRef]
- Hernes, M.; Tutak, P.; Nadolny, M.; Mazurek, A. Real Estate Valuation Using Machine Learning. Procedia Comput. Sci. 2024, 246, 4592–4599. [Google Scholar] [CrossRef]
- Ho, W.K.O.; Tang, B.-S.; Wong, S.W. Predicting Property Prices with Machine Learning Algorithms. J. Prop. Res. 2021, 38, 48–70. [Google Scholar] [CrossRef]
- Abidoye, R.B.; Chan, A.P.C. Artificial Neural Network in Property Valuation: Application Framework and Research Trend. Prop. Manag. 2017, 35, 554–571. [Google Scholar] [CrossRef]
- Kalliola, J.; Kapočiūtė-Dzikienė, J.; Damaševičius, R. Neural Network Hyperparameter Optimization for Prediction of Real Estate Prices in Helsinki. PeerJ Comput. Sci. 2021, 7, e444. [Google Scholar] [CrossRef] [PubMed]
- Martínez Díaz, E.J. Predicting the Housing Market with Machine Learning; Polytechnic University of Puerto Rico: San Juan, Puerto Rico, 2023. [Google Scholar]
- Mukesh, S.; Gowda, H.M.; Adarsh, H.R.; Darshan, Y.K. Real Estate Price Prediction Using Machine Learning Techniques. Int. J. Res. Pub. Rev. 2025, 6, 4211–4215. [Google Scholar]
- Francke, M.; van de Minne, A. Combining Machine Learning and Econometrics: Application to Commercial Real Estate Prices. Real Estate Econ. 2024, 52, 1308–1339. [Google Scholar] [CrossRef]
- Zitoune, I.; Arabov, M.K. Comparative Analysis of Ensemble and Linear Machine Learning Models in the Task of House Price Prediction. In Proceedings of the 2024 International Russian Automation Conference (RusAutoCon), Sochi, Russia, 8–14 September 2024; pp. 50–55. [Google Scholar]
- Berry, N. Modern Web Scraping and Data Analysis Tools to Discover Historic Real Estate Development Opportunities. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2022. [Google Scholar]
- Üzümcü, A.C.; EliGüzel, N. Predictive Analysis Using Web Scraping for the Real Estate Market in Gaziantep. Bitlis Eren Üniversitesi Fen Bilim. Derg. 2023, 12, 17–24. [Google Scholar] [CrossRef]
- Santos, J.M.A. Real Estate Market Data Scraping and Analysis for Financial Investments. Master’s Thesis, Universidade do Porto, Porto, Portugal, 2018. [Google Scholar]
- Pineda Montserrat, B. Predictive Business Analytics for Real Estate: A Tool for Estimating and Analyzing Housing Prices. Master’s Thesis, Universitat Politècnica de Catalunya, Barcelona, Spain, 2024. [Google Scholar]
- Souza, T.G.D.; Fonseca, F.D.R.; Fernandes, V.D.O.; Pedrassoli, J.C. Exploratory Spatial Analysis of Housing Prices Obtained from Web Scraping Technique. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2021, XLIII-B4-2021, 135–140. [Google Scholar] [CrossRef]
- Meyberg, C.; Rendtel, U.; Leerhoff, H. Flat Rent Price Prediction in Berlin with Web Scraping. AStA Wirtsch. Sozialstatistisches Arch. 2024, 18, 245–278. [Google Scholar] [CrossRef]
- Helbich, M.; Brunauer, W.; Hagenauer, J.; Leitner, M. Data-Driven Regionalization of Housing Markets. Ann. Assoc. Am. Geogr. 2013, 103, 871–889. [Google Scholar] [CrossRef]
- Cugurullo, F. New Stories of Urban AI: Exploring the Artificial Intelligence–City Nexus beyond Frankenstein Urbanism. Urban Geogr. 2024, 45, 1300–1307. [Google Scholar] [CrossRef]
- Rey-Blanco, D.; Arbues, P.; Lopez, F.; Paez, A. A Geo-Referenced Micro-Data Set of Real Estate Listings for Spain’s Three Largest Cities. Environ. Plan. B Urban Anal. City Sci. 2024, 51, 1369–1379. [Google Scholar] [CrossRef]
- Silaghi, V.; Alssadi, Z.; Mathew, B.; Alotaibi, M.; Alqarni, A.; Silaghi, M. Modeling the Feedback of AI Price Estimations on Actual Market Values. arXiv 2024, arXiv:2405.18434. [Google Scholar]
- Wheaton, W.C.; Xu, C. Using AI to Improve Price Transparency in Real Estate Valuation. MIT Cent. Real Estate Res. Pap. 24/16 2024. [Google Scholar] [CrossRef]
- Jaouhari, A.E.; Samadhiya, A.; Kumar, A.; Šešplaukis, A.; Raslanas, S. Mapping the Landscape: A Systematic Literature Review on Automated Valuation Models and Strategic Applications in Real Estate. Int. J. Strateg. Prop. Manag. 2024, 28, 286–301. [Google Scholar] [CrossRef]
- Coletta, A.; Prata, M.; Conti, M.; Mercanti, E.; Bartolini, N.; Moulin, A.; Vyetrenko, S.; Balch, T. Towards Realistic Market Simulations: A Generative Adversarial Networks Approach. arXiv 2021, arXiv:2110.13287. [Google Scholar]
- D’Amato, M.; Kauko, T. Advances in Automated Valuation Modeling; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]





| Category | Original Data | Variable Type | Transformation for the Model |
|---|---|---|---|
| Target | Sale Price (€) | Continuous (target) | Converted to unit price (€/m2) as target y |
| Location | Address/Zone | Categorical | OMI Urban Zones |
| Size & layout | Net floor area (m2) | Continuous | Standardized; used to derive unit price |
| Dwelling Traits | Floor level | Ordinal | 0 = ground, 1 = intermediate, 2 = top |
| Balcony | Binary | 0 = absent, 1 = present | |
| Garden | Binary | 0 = absent, 1 = present | |
| Parking availability | Binary | 0 = absent, 1 = present | |
| Quality and Condition | Property class | Categorical | One-hot (Economy to Premium) |
| Renovation status | Categorical | One-hot (including Unspecified as proxy) | |
| Exposure/Windows | Categorical | One-hot (including Unspecified as proxy) | |
| Heating source | Categorical | One-hot (including Unspecified as proxy) | |
| Heating type | Categorical | One-hot (including Unspecified as proxy) | |
| Window frame type | Categorical | One-hot (including Unspecified as proxy) | |
| Energy class | Ordinal | Mapped scale (A = 4 … G = 1; Unspecified = 0) | |
| Accessibility | Elevator | Binary | 0 = absent, 1 = present |
| Property Availability | Categorical | One-hot encoding (Available, Occupied) | |
| Wheelchair access | Binary | 0 = absent, 1 = present |
| Model | Learning Type | Description |
|---|---|---|
| Linear Regression | Linear baseline | Simple and interpretable but limited to additive linear relationships. |
| SVR | Kernel-based | Captures non-linear patterns through Radial Basis Functions (RBF). |
| Random Forest | Ensemble | Reduces variance through bagging and feature sampling. |
| MLP Regressor | Deep learning | Multi-Layer Perceptron; learns non-linear mappings via backpropagation. |
| LightGBM | Gradient boosting | Captures non-linear patterns and feature interactions through tree-based boosting |
| Model | Key Parameters | Type |
|---|---|---|
| SVR (RBF kernel) | C = 100, ε = 1.0, γ = auto | Kernel-based |
| Linear Regression | Default | Linear baseline |
| Random Forest | 100 trees, max depth = 10, leaf size = 2 | Ensemble |
| MLP | hidden-layers size: 128 and 64 neurons, α = 0.0001, LR = 0.005 | Deep learning |
| LightGBM | Leaves = 31, LR = 0.05, N_est = 2000 | Gradient Boosting |
| Model | MAE (€/m2) | RMSE (€/m2) | Median AE (€/m2) | R2 |
|---|---|---|---|---|
| SVR | 706.32 ± 67.39 | 948.77 ± 80.27 | 529.57 ± 65.24 | 0.10 |
| Linear Regression | 483.50 ± 41.69 | 642.22 ± 57.70 | 384.16 ± 49.40 | 0.58 |
| Random Forest | 475.39 ± 48.03 | 647.10 ± 80.85 | 357.01 ± 51.60 | 0.57 |
| MLP | 474.47 ± 29.00 | 632.38 ± 59.44 | 374.59 ± 32.70 | 0.59 |
| LightGBM | 508.55 ± 49.60 | 688.16 ± 88.46 | 385.69 ± 53.94 | 0.51 |
| Analytical Dimension | Traditional Appraisal (Linear/Static) | RE-VAL Framework (Explainable Workflow) | Operational Impact |
|---|---|---|---|
| Valuation Logic | Additive & Rigid: Assumes constant marginal contributions (e.g., fixed €/m2). | Non-linear & Interactive: Supports the modelling of variable marginality and cross-feature synergies. | Help identify heterogeneous valuation effects across property characteristics and urban contexts. |
| Transparency | Expert-driven and partly implicit: relies on professional judgement, with limited formal decomposition of feature-level contributions. | XAI-supported interpretation: SHAP-based decomposition of feature contributions. | Supports alignment with IVS/RICS standards through transparent monetary Bonus/Malus tables. |
| Resilience | Fragile: Incomplete technical information may reduce consistency and interpretability. | Preserves missingness-related patterns and treats them as potentially informative signals when appropriate. | May improve robustness, or help retain useful information, when working with fragmented and imperfect real-estate listing data. |
| Final Purpose | Static Point-Estimate: A one-time snapshot of property value. | Decision-support-oriented workflow supports scenario interpretation and what-if reasoning. | Facilitates legal and administrative admissibility (e.g., for taxation or courts) by providing auditable evidence. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Saiu, V.; Mocci, M. Explainable AI for Urban Real-Estate Prediction: A Machine-Learning Framework for Urban Decision Support. Urban Sci. 2026, 10, 315. https://doi.org/10.3390/urbansci10060315
Saiu V, Mocci M. Explainable AI for Urban Real-Estate Prediction: A Machine-Learning Framework for Urban Decision Support. Urban Science. 2026; 10(6):315. https://doi.org/10.3390/urbansci10060315
Chicago/Turabian StyleSaiu, Valeria, and Matteo Mocci. 2026. "Explainable AI for Urban Real-Estate Prediction: A Machine-Learning Framework for Urban Decision Support" Urban Science 10, no. 6: 315. https://doi.org/10.3390/urbansci10060315
APA StyleSaiu, V., & Mocci, M. (2026). Explainable AI for Urban Real-Estate Prediction: A Machine-Learning Framework for Urban Decision Support. Urban Science, 10(6), 315. https://doi.org/10.3390/urbansci10060315

