Development of a Predictive Tool for Real Estate Analysis Using Machine Learning Techniques
Abstract
1. Introduction
2. Literature Review
2.1. The Importance of the Housing Market in the Economy
2.2. Cycles and Bubbles in the Real Estate Market: Causes and Consequences
2.3. Technological Applications in the Real Estate Market: Advantages, Disadvantages, and the Role of Artificial Intelligence
3. Materials and Methods
- Comparative Property Valuation Method: This method involves comparing the property being appraised with similar properties recently sold within the same geographic area, adjusting for differences in location, size, features, and conditions between the comparable properties and the subject property to determine its value.
- Haversine Formula: The Haversine formula plays a crucial role in this study because it allows the calculation of the shortest geodesic distance between two points on the Earth’s surface based on their latitude and longitude coordinates. Its use is especially relevant in real estate valuation and predictive modeling projects, as location directly influences access to services, transportation, and nearby amenities, which can increase or decrease a property’s value. Additionally, factors such as neighborhood safety, crime rates, and the quality of local schools are highly dependent on location. Geographic coordinates also enable spatial analysis by comparing nearby properties to estimate a more accurate market value.
- Use of Tools and Resources for Integrating Custom Maps: In this study, as will be detailed later, specific tools such as the Mapbox API are used to obtain the geographic coordinates (longitude and latitude) of a specific location based only on knowledge of the city, street, and house number. Using Python (version Python 3.13) and Mapbox’s Geocoding API (use the latest stable version, V6), HTTP requests can be sent to convert an address into coordinates. This process is crucial for mapping data, performing spatial analyses, integrating geographic data, and improving predictive models in machine learning projects.
- Data Collection: Data were collected from publicly accessible online real estate platforms specialising in property listings and transactions in Madrid. These platforms include major digital marketplaces commonly used in the Spanish real estate sector, such as Idealista and Fotocasa, which provide detailed, structured information on residential properties. The dataset was constructed using an automated web scraping process with Robotic Process Automation (RPA) tools, specifically UiPath. This approach enabled the systematic extraction of large volumes of data directly from property listings. The collected variables include the listing price, the built area (m2), the number of rooms, the presence of amenities (e.g., elevator, garage, balcony, swimming pool), textual descriptions, and the geographic location (address-level information). These variables were selected based on their relevance in the real estate valuation literature and their availability across platforms. Data collection was carried out over a defined time period to ensure consistency in market conditions and avoid temporal distortions. Duplicate listings and repeated entries across platforms were identified and removed using automated matching procedures based on address, price and structural characteristics. In short, the study involves the combination of automated data collection, geocoding and predictive modelling. Data is collected via web scraping from various platforms.
- Data Preprocessing: This includes data cleaning and transformation, the encoding of categorical variables, and the removal of duplicates. Data governance is essential to ensure the quality, consistency, and coherence of the data used.
- Predictive Models: Training and validation of machine learning models, specifically employing techniques such as Gradient Boosting with HistGradientBoostingRegressor, to predict prices and trends in the real estate market. Cross-validation and model evaluation are crucial to avoid overfitting and to provide accurate estimates.
- Geocoding: Obtaining the geographic coordinates (latitude and longitude) of property addresses, enabling detailed spatial analysis that considers factors such as the urban environment and proximity to services and amenities.
- Application Development: Creation of an interactive application using Streamlit, which facilitates data analysis and predictive modeling. The application allows users to adjust property features and visualize their location on an interactive map of Madrid.
Selection of the Predictive Algorithm
- XGBRegressor: This algorithm belongs to the Gradient Boosting family and is well-known for its effectiveness in building accurate predictive models. It uses a set of sequential decision trees that are iteratively trained to minimize a loss function.
- LGBMRegressor: LightGBM is another Gradient Boosting algorithm distinguished by its speed and efficiency. It employs a leaf-wise tree growth approach, allowing for more effective splits at tree nodes, thereby achieving faster training times.
- HistGradientBoostingRegressor: HistGradientBoosting is an optimized variant of Gradient Boosting that uses histograms to improve training speed and efficiency. By leveraging histograms to calculate the best splits at tree nodes, HistGradientBoosting can deliver superior performance on large datasets.
4. Results
- Learning Rate: 0.05, Max Depth: 7;
- Mean Square Error (MSE): 353,605,890,528.1876;
- Root Mean Square Error (RMSE): 592,031.1993;
- Mean Absolute Error (MAE): 253,406.4904;
- Mean Absolute Percentage Error (MAPE): 0.3424;
- Average Coefficient of Determination (R2): 0.6877.
| HistGradientBoostingRegressor Parameters: Learning Rate: 0.05, Max Depth: 7 | |
|---|---|
| Parameter/Metric | Value |
| Mean square error (MSE) | 353,605,890,528.1876 |
| Root Mean Square Error (RMSE) | 592,031.1993021306 |
| Mean Absolute Average Error (MAE) | 253,406.49038236085 |
| Mean Absolute Percentage Error (MAPE) | 0.34244135004850396 |
| Average Coefficient of Determination (R2) | 0.6876900719421352 |
- Performance in Evaluation Metrics: This configuration yields the best results across evaluation metrics, including Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Coefficient of Determination (R2). Compared to other configurations, it shows the lowest MSE, RMSE, MAE, and MAPE, alongside the highest R2, indicating the best model fit to the data.
- Balance Between Bias and Variance: The combination of a moderately low learning rate (0.05) and a maximum depth of 7 provides an optimal balance between bias and variance. A moderate learning rate helps prevent overfitting, while a higher max depth enables the model to capture more complex relationships between predictor variables and the target variable.
- Capacity to Capture Complex Relationships: The maximum depth of 7 allows the HistGradientBoostingRegressor model to capture intricate relationships in the data. This capability is crucial in the analysis of Madrid’s real estate market, where the relationships between property features and prices tend to be highly complex.
Results Testing
- Property to be appraised and located through comparisons: Address: Avenida de Burgos No. 22, Madrid; Area: 100 m2; Bedrooms: 3.
- Query parameters: Surface Area, Gated Community, Number of Bedrooms, Has Garage, No Elevator, Has Balcony, Green Area, Swimming Pool.
5. Discussion
- Property size (44.15%).
- Proximity to green areas (10.76%).
- Absence of an elevator (6.71%).
- Geographical coordinates (latitude 6.34%; longitude 5.49%).
- Swimming pool (5.77%) and gated community (5.70%).
- Increasing usable floor area or locating properties near green spaces yields the highest market value return.
- Features such as elevator, swimming pool, balcony, garage, or gated community add value, though to a lesser extent.
- The model’s very low MAPE (0.02) confirms its suitability for integration into real-time decision-support systems, which is particularly beneficial in volatile markets like Madrid.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Acharya, D. B., Divya, B., & Kuppan, K. (2024). Explainable and fair AI: Balancing performance in financial and real estate machine learning models. IEEE Access, 12, 154022–154034. [Google Scholar] [CrossRef]
- Ali, S., Abuhmed, T., El-Sappagh, S., Muhammad, K., Alonso-Moral, J. M., Confalonieri, R., Guidotti, R., Del Ser, J., Díaz-Rodríguez, N., & Herrera, F. (2023). Explainable Artificial Intelligence (XAI): What we know and what is left to attain trustworthy artificial intelligence. Information Fusion, 99, 101805. [Google Scholar] [CrossRef]
- Almeida, R. P. (2025). Cycles, trends, disruptions: Real estate centrality on the global financial crisis, COVID-19 Pandemic, and new techno-economic paradigm. Real Estate, 2(1), 1. [Google Scholar] [CrossRef]
- Alzain, E., Alshebami, A. S., Aldhyani, T. H. H., & Alsubari, S. N. (2022). Application of artificial intelligence for predicting real estate prices: The case of Saudi Arabia. Electronics, 11(21), 3448. [Google Scholar] [CrossRef]
- Asensio-Soto, J. C., & Navarro-Astor, E. (2022). Proptech: A qualitative analysis of online real estate brokerage agencies in Spain. Intangible Capital, 18(3), 489. [Google Scholar] [CrossRef]
- Atalay, K., & Edwards, R. (2022). House prices, housing wealth and financial well-being. Journal of Urban Economics, 129, 103438. [Google Scholar] [CrossRef]
- Azam Khan, M. D., Debnath, P., Al Sayeed, A., Sumon, M. F. I., Rahman, A., Tushar Khan, M. D., & Pant, L. (2024). Explainable AI and machine learning model for California house price predictions: Intelligent model for homebuyers and policymakers. Journal of Business and Management Studies, 6(5), 73–84. [Google Scholar] [CrossRef]
- Álvarez-Román, L., & García-Posada, M. (2021). Are house prices overvalued in Spain? A regional approach. Economic Modelling, 99, 105499. [Google Scholar] [CrossRef]
- Basco, S., & Schäfer-i-Paradís, M. (2025). A model-free test of rational bubbles: An application to the US housing market. Economics and Business Letters, 14(2), 117–125. [Google Scholar] [CrossRef]
- Boelhouwer, P. (2017). The role of government and financial institutions during a housing market crisis: A case study of the Netherlands. International Journal of Housing Policy, 17(4), 591–602. [Google Scholar] [CrossRef]
- Bogatyreva, M. V., Leskinen, M. I., & Kolmakov, M. A. (2021). The domestic real estate market during financial crises. IOP Conference Series: Earth and Environmental Science, 751(1), 012134. [Google Scholar] [CrossRef]
- Byrne, M., & Norris, M. (2019). Housing market financialization, neoliberalism and everyday retrenchment of social housing. Environment and Planning A: Economy and Space, 54(1), 182–198. [Google Scholar] [CrossRef]
- Capellán, R. U., Luis Sánchez Ollero, J., & Pozo, A. G. (2021). The influence of the real estate investment trust in the real estate sector on the Costa del Sol. European Research on Management and Business Economics, 27(1), 100133. [Google Scholar] [CrossRef]
- Cellmer, R., & Kobylińska, K. (2024). Housing price prediction—Machine learning and geostatistical methods. Real Estate Management and Valuation, 33(1), 1–10. [Google Scholar] [CrossRef]
- Christophers, B. (2019). A tale of two inequalities: Housing-wealth inequality and tenure inequality. Environment and Planning A: Economy and Space, 53(3), 573–594. [Google Scholar] [CrossRef]
- Colomb, C., & Gallent, N. (2022). Post-COVID-19 mobilities and the housing crisis in European urban and rural destinations. Policy challenges and research agenda. Planning Practice & Research, 37(5), 624–641. [Google Scholar] [CrossRef]
- Corrigan, E., Foley, D., McQuinn, K., O’Toole, C., & Slaymaker, R. (2019). Exploring affordability in the Irish housing market. The Economic and Social Review, 50(1), 119–157. [Google Scholar]
- Crisci, M. (2021). The impact of the real estate crisis on a South European metropolis: From urban diffusion to reurbanisation. Applied Spatial Analysis and Policy, 15(3), 797–820. [Google Scholar] [CrossRef]
- Deppner, J., & Cajias, M. (2024). Accounting for spatial autocorrelation in algorithm-driven hedonic models: A spatial cross-validation approach. Journal of Real Estate Finance and Economics, 68, 235–273. [Google Scholar] [CrossRef]
- D’Lima, W., & Thibodeau, M. (2022). Health crisis and housing market effects—Evidence from the U.S. opioid epidemic. The Journal of Real Estate Finance and Economics, 67(4), 735–752. [Google Scholar] [CrossRef]
- Dou, M., Gu, Y., & Fan, H. (2023). Incorporating neighborhoods with explainable artificial intelligence for modeling fine-scale housing prices. Applied Geography, 158, 103032. [Google Scholar] [CrossRef]
- Fan, R., Xie, X., Wang, Y., & Lin, J. (2024). Effect of financial contagion between real and financial sectors on asset bubbles: A two-layer network game approach. Managerial and Decision Economics, 46(1), 393–408. [Google Scholar] [CrossRef]
- Fernandez-Perez, A., Gómez-Puig, M., & Sosvilla-Rivero, S. (2025). El clasico of housing: Bubbles in Madrid and Barcelona’s real estate markets. Elsevier BV. [Google Scholar] [CrossRef]
- Gil García, J., & Martínez López, M. A. (2021). State-Led actions reigniting the financialization of housing in Spain. Housing, Theory and Society, 40(1), 1–21. [Google Scholar] [CrossRef]
- Gilman, M. E. (2024). The impact of proptech and the datafication of real estate on the human right to housing. SSRN Electronic Journal. [Google Scholar] [CrossRef]
- Gong, X.-L., Lu, J.-Y., Xiong, X., & Zhang, W. (2025). Liquidity constraints, real estate regulation, and local government debt risks. Financial Innovation, 11(1), 5. [Google Scholar] [CrossRef]
- Guren, A. M., McKay, A., Nakamura, E., & Steinsson, J. (2020). Housing wealth effects: The long view. The Review of Economic Studies, 88(2), 669–707. [Google Scholar] [CrossRef]
- Gusciute, E., Mühlau, P., & Layte, R. (2020). Discrimination in the rental housing market: A field experiment in Ireland. Journal of Ethnic and Migration Studies, 48(3), 613–634. [Google Scholar] [CrossRef]
- Gyger, T., Hauri, S., Bühlmann, S., Lehner, M., Schlesinger, J., & Sigrist, F. (2025). Explainable spatial machine learning for hedonic real estate modeling. Available online: https://ssrn.com/abstract=5191260 (accessed on 10 March 2026).
- Hamm, P., Klesel, M., Coberger, P., & Wittmann, H. F. (2023). Explanation matters: An experimental study on explainable AI. Electron Markets 33, 17. [Google Scholar] [CrossRef]
- Higgins, C. R., & Sapci, A. (2023). Time-varying volatility and the housing market. Macroeconomic Dynamics, 28(2), 426–461. [Google Scholar] [CrossRef]
- Hjort, A., Pensar, J., Scheel, I., & Sommervoll, D. E. (2022). House price prediction with gradient boosted trees under different loss functions. Journal of Property Research, 39(4), 338–364. [Google Scholar] [CrossRef]
- Hromada, E., Heralová, R. S., Čermáková, K., Piecha, M., & Kadeřábková, B. (2023). Impacts of crisis on the real estate market depending on the development of the region. Buildings, 13(4), 896. [Google Scholar] [CrossRef]
- James, B. V., Joseph, D., & Daniel, N. (2023). Young adults’ experience of housing and real estate chatbots in India: Effort expectancy moderated model. International Journal of Housing Markets and Analysis, 17(4), 1050–1066. [Google Scholar] [CrossRef]
- Jeung, Y.-B., & Choi, J. (2024). Factors of the behavioral intention to adopt chatbot services for real estate complaints. Journal of Digital Contents Society, 25(2), 573–584. [Google Scholar] [CrossRef]
- Jin, B., & Xu, X. (2024). Pre-owned housing price index forecasts using Gaussian process regressions. Journal of Modelling in Management, 19(6), 1927–1958. [Google Scholar] [CrossRef]
- Kabaivanov, S., & Markovska, V. (2021). Artificial intelligence in real estate market analysis. AIP Conference Proceedings, 2333, 030001. [Google Scholar] [CrossRef]
- Kassner, A. J. (2024). Factors influencing investment into PropTech and FinTech—Only new rules or a new game? Journal of European Real Estate Research, 17(3), 395–411. [Google Scholar] [CrossRef]
- Kaur, T., & Solomon, P. (2021). A study on automated property management in commercial real estate: A case of India. Property Management, 40(2), 247–264. [Google Scholar] [CrossRef]
- Kenyon, G. E., Arribas-Bel, D., Robinson, C., Gkountouna, O., Arbués, P., & Rey-Blanco, D. (2024). Intra-urban house prices in Madrid following the financial crisis: An exploration of spatial inequality. npj Urban Sustainability, 4(1), 26. [Google Scholar] [CrossRef]
- Kettunen, H., & Ruonavaara, H. (2020). Rent regulation in 21stcentury Europe. Comparative perspectives. Housing Studies, 36(9), 1446–1468. [Google Scholar] [CrossRef]
- Kriegbaum, A., Ebert, C., & Raghabendra, K. (2024). Chatbots selling condos: Generative artificial intelligence in real estate. Journal of Marketing Development and Competitiveness, 18(4), 69. [Google Scholar] [CrossRef]
- Lakševics, K., Franz, Y., Haase, A., Nasya, B., Patti, D., Reeger, U., Raubiško, I., Schmidt, A., & Šuvajevs, A. (2023). The permanent regime of temporary solutions: Housing of forced migrants in Europe as a policy challenge. European Urban and Regional Studies, 31(1), 81–87. [Google Scholar] [CrossRef]
- Lamas, M., & Romaniega, S. (2022). Designing a price index for the Spanish commercial real estate market (Elaboración de un índice de precios para el mercado inmobiliario comercial de España). SSRN Electronic Journal. [Google Scholar] [CrossRef]
- Latif, S. N. F. A., Nawawi, A. H., & Wahab, M. A. (2023). PropTech: Technological innovation for sustainable real estate. AIP Conference Proceedings, 2947, 020016. [Google Scholar] [CrossRef]
- Li, S., Liu, J., Dong, J., & Li, X. (2021). 20 years of research on real estate bubbles, risk and exuberance: A bibliometric analysis. Sustainability, 13(17), 9657. [Google Scholar] [CrossRef]
- Liu, X., & Xinyu, L. (2025). Regional differences and dynamic evolution of house price bubble risks in provincial areas of China. Elsevier BV. [Google Scholar] [CrossRef]
- Lupu, R., Călin, A. C., Dumitrescu, D. G., & Lupu, I. (2025). Introducing a novel fragility index for assessing financial stability amid asset bubble episodes. The North American Journal of Economics and Finance, 75, 102291. [Google Scholar] [CrossRef]
- Ma, F., Wang, J., Wahab, M., & Ma, Y. (2023). Stock market volatility predictability in a data-rich world: A new insight. International Journal of Forecasting, 39(4), 1804–1819. [Google Scholar] [CrossRef]
- Ma, X., & Xie, H. (2025). Real estate policy regulation and corporate financial risk: China’s Three Red Lines policy. Pacific Economic Review, 30(1), 46–87. [Google Scholar] [CrossRef]
- Mach, Ł. (2019). Measuring and assessing the impact of the global economic crisis on European real property market. Journal of Business Economics and Management, 20(6), 1189–1209. [Google Scholar] [CrossRef]
- Madani, N., Bagalkotkar, A., Anand, S., Arnson, G., Srihari, R., & Joseph, K. (2024). A recipe for building a compliant real estate chatbot. arXiv, arXiv:2410.10860v1. [Google Scholar] [CrossRef]
- Mikulić, J., Vizek, M., Stojčić, N., Payne, J. E., Čeh Časni, A., & Barbić, T. (2021). The effect of tourism activity on housing affordability. Annals of Tourism Research, 90, 103264. [Google Scholar] [CrossRef]
- Mora-Garcia, R.-T., Cespedes-Lopez, M.-F., & Perez-Sanchez, V. R. (2022). Housing price prediction using machine learning algorithms in COVID-19 times. Land, 11(11), 2100. [Google Scholar] [CrossRef]
- Moro, M. F., de Souza Mendonça, A. K., & de Andrade, D. F. (2022). COVID-19 pandemic accelerates the perception of digital transformation on real estate websites. Quality & Quantity, 57(3), 2165–2181. [Google Scholar] [CrossRef]
- Mrsic, L., Jerkovic, H., & Balkovic, M. (2020). Real estate market price prediction framework based on public data sources with case study from Croatia. In P. Sitek, M. Pietranik, M. Krótkiewicz, & C. Srinilta (Eds.), Intelligent information and database systems. (ACIIDS 2020, Communications in Computer and Information Science, Vol. 1178). Springer. [Google Scholar] [CrossRef]
- Mubarak, M., Tahir, A., Waqar, F., Haneef, I., McArdle, G., Bertolotto, M., & Saeed, M. T. (2022). A map-based recommendation system and house price prediction model for real estate. ISPRS International Journal of Geo-Information, 11(3), 178. [Google Scholar] [CrossRef]
- Nguyen, M.-L. T., & Bui, T. N. (2021). The macroeconomy and the real estate market: Evidence from the global financial crisis and the COVID-19 pandemic crisis. Industrial Engineering & Management Systems, 20(3), 373–383. [Google Scholar] [CrossRef]
- Norris, M., & Byrne, M. (2018). Housing market (in)stability and social rented housing: Comparing Austria and Ireland during the global financial crisis. Journal of Housing and the Built Environment, 33(2), 227–245. [Google Scholar] [CrossRef]
- Pastukh, O., & Khomyshyn, V. (2025, October 23–25). Using ensemble methods of machine learning to predict real estate prices. ITTAP’2024: 4th International Workshop on Information Technologies: Theoretical and Applied Problems, Ternopil, Ukraine and Opole, Poland. [Google Scholar] [CrossRef]
- Pfeffer, F. T., & Waitkus, N. (2021). The wealth inequality of nations. American Sociological Review, 86(4), 567–602. [Google Scholar] [CrossRef]
- Potturu, S. M. (2023). UiPath bot framework: Accelerating RPA development and innovation. IJRDO—Journal of Computer Science Engineering, 9(4), 1–15. [Google Scholar] [CrossRef]
- Rampini, L., & Re Cecconi, F. (2021). Artificial intelligence algorithms to predict Italian real estate market prices. Journal of Property Investment & Finance, 40(6), 588–611. [Google Scholar] [CrossRef]
- Reisenbichler, A. (2021). The politics of quantitative easing and housing stimulus by the federal reserve and European central bank, 2008–2018. In Bricks in the wall (pp. 190–210). Routledge. [Google Scholar] [CrossRef]
- Rey-Blanco, D., Arbués, P., López, F. A., & Páez, A. (2023). Using machine learning to identify spatial market segments. A reproducible study of major Spanish markets. Environment and Planning B: Urban Analytics and City Science, 51(1), 89–108. [Google Scholar] [CrossRef]
- Rico-Juan, J. R., & Taltavull de La Paz, P. (2021). Machine learning with explainability or spatial hedonics tools? An analysis of the asking prices in the housing market in Alicante, Spain. Expert Systems with Applications, 171, 114590. [Google Scholar] [CrossRef]
- Seagraves, P. (2023). Real Estate Insights: Is the AI revolution a real estate boon or bane? Journal of Property Investment & Finance, 42(2), 190–199. [Google Scholar] [CrossRef]
- Sequera, J., Nofre, J., Díaz-Parra, I., Gil, J., Yrigoy, I., Mansilla, J., & Sánchez, S. (2022). The impact of COVID-19 on the short-term rental market in Spain: Towards flexibilization? Cities, 130, 103912. [Google Scholar] [CrossRef]
- Sevgen, S. C., & Tanrivermiş, Y. (2024). Comparison of machine learning algorithms for mass appraisal of real estate data. Real Estate Management and Valuation, 32(2), 100–111. [Google Scholar] [CrossRef]
- Sing, T. F., Yang, J. J., & Yu, S. M. (2021). Boosted tree ensembles for artificial intelligence based automated valuation models (AI-AVM). The Journal of Real Estate Finance and Economics, 65(4), 649–674. [Google Scholar] [CrossRef]
- Sohrabi, H., & Noorzai, E. (2024). Risk-supported case-based reasoning approach for cost overrun estimation of water-related projects using machine learning. Engineering, Construction & Architectural Management, 31(2), 544–570. [Google Scholar] [CrossRef]
- Soltani, A., Heydari, M., Aghaei, F., & Pettit, C. F. (2022). Housing price prediction incorporating spatio-temporal dependency into machine learning algorithms. Cities, 131, 103941. [Google Scholar] [CrossRef]
- Sorge, M. M. (2023). Politics, financial regulation and housing bubbles. The Journal of Real Estate Finance and Economics, 70(1), 65–91. [Google Scholar] [CrossRef]
- Soundararaj, B., Pettit, C., & Lock, O. (2022). Using real-time dashboards to monitor the impact of disruptive events on real estate market. Case of COVID-19 pandemic in Australia. Computational Urban Science, 2(1), 14. [Google Scholar] [CrossRef] [PubMed]
- Starr, C. W., Saginor, J., & Worzala, E. (2020). The rise of PropTech: Emerging industrial technologies and their impact on real estate. Journal of Property Investment & Finance, 39(2), 157–169. [Google Scholar] [CrossRef]
- Sun, Q., Javeed, S. A., Tang, Y., & Feng, Y. (2024). Correction: The impact of housing prices and land financing on economic growth: Evidence from Chinese 277 cities at the prefecture level and above. PLoS ONE, 19(5), e0304494. [Google Scholar] [CrossRef]
- Szumilo, N., & Wiegelmann, T. (2024). Real Estate Insights AI: Real estate’s new roommate—The good, the bad and the algorithmic. Journal of Property Investment & Finance, 42(2), 211–217. [Google Scholar] [CrossRef]
- Tagliaro, C., Pomè, A. P., Migliore, A., & Danivska, V. (2024). Technology “like a fork”. How PropTech shapes real estate innovation. Journal of European Real Estate Research, 18(1), 4–26. [Google Scholar] [CrossRef]
- Tan, Z., & Miller, N. G. (2023). Connecting digitalization and sustainability: Proptech in the real estate operations and management. Journal of Sustainable Real Estate, 15(1), 2203292. [Google Scholar] [CrossRef]
- Tanović, A., & Hasibović, A. Ć. (2024, May 20–24). Automated real estate chatbot. 2024 47th MIPRO ICT and Electronics Convention (MIPRO) (pp. 241–246), Opatija, Croatia. [Google Scholar] [CrossRef]
- Tapia, J., Chavez-Garzon, N., Pezoa, R., Suarez-Aldunate, P., & Pilleux, M. (2025). Comparing automated valuation models for real estate assessment in the Santiago Metropolitan Region: A study on machine learning algorithms and hedonic pricing with spatial adjustments. PLoS ONE, 20(3), e0318701. [Google Scholar] [CrossRef]
- Tarasov, S., & Dessoulavy-Śliwiński, B. (2024). Algorithm-driven hedonic real estate pricing—An explainable AI approach. Real Estate Management and Valuation, 33(1), 22–34. [Google Scholar] [CrossRef]
- Tchuente, D., & Nyawa, S. (2022). Real estate price estimation in French cities using geocoding and machine learning. Annals of Operations Research, 308, 571–608. [Google Scholar] [CrossRef]
- Tekouabou, S. C. K., Gherghina, Ş. C., Kameni, E. D., Filali, Y., & Idrissi Gartoumi, K. (2023). AI-Based on machine learning methods for urban real estate prediction: A systematic survey. Archives of Computational Methods in Engineering, 31(2), 1079–1095. [Google Scholar] [CrossRef]
- Trojanek, R., & Gluszak, M. (2022). Short-run impact of the Ukrainian refugee crisis on the housing market in Poland. Finance Research Letters, 50, 103236. [Google Scholar] [CrossRef]
- Tse, C.-B., Rodgers, T., & Niklewski, J. (2014). The 2007 financial crisis and the UK residential housing market: Did the relationship between interest rates and house prices change? Economic Modelling, 37, 518–530. [Google Scholar] [CrossRef]
- Vergara-Perucich, J.-F. (2023). A systematic bibliometric analysis of the real estate bubble phenomenon: A comprehensive review of the literature from 2007 to 2022. International Journal of Financial Studies, 11(3), 106. [Google Scholar] [CrossRef]
- Wetzstein, S. (2017). The global urban housing affordability crisis. Urban Studies, 54(14), 3159–3177. [Google Scholar] [CrossRef]
- Whitehouse, E. J., Harvey, D. I., & Leybourne, S. J. (2025). Real-time monitoring procedures for early detection of bubbles. International Journal of Forecasting, 41(3), 1260–1277. [Google Scholar] [CrossRef]
- Xu, X., & Zhang, Y. (2022). Residential housing price index forecasting via neural networks. Neural Computing and Applications, 34(17), 14763–14776. [Google Scholar] [CrossRef]
- Yang, G., Yin, X., Sun, Z., Bi, P., & Ma, Q. (2024). The spillover effect of real estate boom on stock market efficiency: Evidence from China. Applied Economics, 57(24), 3164–3179. [Google Scholar] [CrossRef]
- Zhang, Y., & Buyuklieva, B. (2025). Spatial cluster pattern and influencing factors of the housing market: An empirical study from the Chinese city of Shanghai. Buildings, 15(5), 708. [Google Scholar] [CrossRef]












| XGB_Regressor | LGBMRegressor | HistGradientBoostingRegressor |
|---|---|---|
| Parameters Learning Rate: 0.02, Max Depth: 3 | ||
| Mean square error (MSE): 437,327,781,957.5555 Root Mean Square Error (RMSE): 657,943.1985676086 Mean Absolute Average Error (MAE): 330,502.5669765516 Mean Absolute Percentage Error (MAPE): 64.05195464396995 Average Coefficient of Determination (R2): 0.6148389918004791 | Mean square error (MSE): 436,040,579,264.73083 Root Mean Square Error (RMSE): 657,216.7087550496 Mean Absolute Average Error (MAE): 330,176.1469023515 Mean Absolute Percentage Error (MAPE): 0.6393866816953104 Average Coefficient of Determination (R2): 0.615494164449667 | Mean square error (MSE): 434,481,166,208.1042 Root Mean Square Error (RMSE): 655,851.2252418908 Mean Absolute Average Error (MAE): 329,782.9046161583 Mean Absolute Percentage Error (MAPE): 0.6386068479129388 Average Coefficient of Determination (R2): 0.6173249683089861 |
| Parameters Learning Rate: 0.02, Max Depth: 5 | ||
| Mean square error (MSE): 401,089,882,502.91315 Root Mean Square Error (RMSE): 630,329.2117607286 Mean Absolute Average Error (MAE): 299,521.6134590299 Mean Absolute Percentage Error (MAPE): 0.5576029051127928 Average Coefficient of Determination (R2): 0.6464536666590949 | Mean square error (MSE): 411,009,894,538.5702 Root Mean Square Error (RMSE): 637,939.4689227754 Mean Absolute Average Error (MAE): 296,062.1300118515 Mean Absolute Percentage Error (MAPE): 55.04306959932567 Average Coefficient of Determination (R2): 0.6381901433346157 | Mean square error (MSE): 395,742,359,814.4373 Root Mean Square Error (RMSE): 625,531.7961160316 Mean Absolute Average Error (MAE): 295,231.9551222862 Mean Absolute Percentage Error (MAPE): 0.5479353863650582 Average Coefficient of Determination (R2): 0.6518736046237075 |
| Parameters Learning Rate: 0.02, Max Depth: 7 | ||
| Mean square error (MSE): 386,467,787,228.2837 Root Mean Square Error (RMSE): 618,377.4603887532 Mean Absolute Average Error (MAE): 291,691.43414570054 Mean Absolute Percentage Error (MAPE): 0.5580107059067501 Average Coefficient of Determination (R2): 0.6598476881642226 | Mean square error (MSE): 429,692,823,430.15405 Root Mean Square Error (RMSE): 649,075.5502877138 Mean Absolute Average Error (MAE): 279,651.53318349994 Mean Absolute Percentage Error (MAPE): 51.324829172456916 Average Coefficient of Determination (R2): 0.6251781222516023 | Mean square error (MSE): 381,206,610,299.6483 Root Mean Square Error (RMSE): 613,366.4628542269 Mean Absolute Average Error (MAE): 284,739.20626096963 Mean Absolute Percentage Error (MAPE): 0.5406088148067931 Average Coefficient of Determination (R2): 0.6654844513266426 |
| Parameters Learning Rate: 0.05, Max Depth: 3 | ||
| Mean square error (MSE): 379,089,610,435.84766 Root Mean Square Error (RMSE): 613,898.8096839334 Mean Absolute Average Error (MAE): 290,407.8659147612 Mean Absolute Percentage Error (MAPE): 0.43717833841561493 Average Coefficient of Determination (R2): 0.6632505464519336 | (MSE): 386,569,853,182.9364 Root Mean Square Error (RMSE): 618,783.1823176598 Mean Absolute Average Error (MAE): 288,657.4749560142 Mean Absolute Percentage Error (MAPE): 43.235672273711536 Average Coefficient of Determination (R2): 0.6589985562945958 | (MSE): 378,063,160,354.20746 Root Mean Square Error (RMSE): 612,300.4309475098 Mean Absolute Average Error (MAE): 286,419.53654474515 Mean Absolute Percentage Error (MAPE): 0.42998827917198384 Average Coefficient of Determination (R2): 0.6658259277181333 |
| Parameters Learning Rate: 0.05, Max Depth: 5 | ||
| Mean square error (MSE): 362,099,564,116.8421 Root Mean Square Error (RMSE): 600,056.5283480572 Mean Absolute Average Error (MAE): 268,593.9456009671 Mean Absolute Percentage Error (MAPE): 0.3719642250380258 Average Coefficient of Determination (R2): 0.678384293946424 | Mean square error (MSE): 364,541,137,363.85236 Root Mean Square Error (RMSE): 600,958.0625704266 Mean Absolute Average Error (MAE): 259,604.7203724448 Mean Absolute Percentage Error (MAPE): 35.78030649892446 Average Coefficient of Determination (R2): 0.6782013827272406 | Mean square error (MSE): 360,506,999,108.92737 Root Mean Square Error (RMSE): 598,231.5474839646 Mean Absolute Average Error (MAE): 261,384.55247444194 Mean Absolute Percentage Error (MAPE): 0.35856381831104694 Average Coefficient of Determination (R2): 0.680556840402599 |
| Parameters Learning Rate: 0.05, Max Depth: 7 | ||
| Mean square error (MSE): 358,102,418,553.0781 Root Mean Square Error (RMSE): 596,443.6125319558 Mean Absolute Average Error (MAE): 263,334.98314521444 Mean Absolute Percentage Error (MAPE): 0.36457479738276216 Average Coefficient of Determination (R2): 0.6826867866950237 | Mean square error (MSE): 385,397,112,391.8459 Root Mean Square Error (RMSE): 614,459.1865695618 Mean Absolute Average Error (MAE): 247,418.9341862863 Mean Absolute Percentage Error (MAPE): 32.47192964250027 Average Coefficient of Determination (R2): 0.662916339162665 | Learning Rate: 0.05, Max Depth: 7 Mean square error (MSE): 353,605,890,528.1876 Root Mean Square Error (RMSE): 592,031.1993021306 Mean Absolute Average Error (MAE): 253,406.49038236085 Mean Absolute Percentage Error (MAPE): 0.34244135004850396 Average Coefficient of Determination (R2): 0.6876900719421352 |
| Parameters Learning Rate: 0.1, Max Depth: 3 | ||
| Mean square error (MSE): 365,151,684,517.3063 Root Mean Square Error (RMSE): 603,009.6636422386 Mean Absolute Average Error (MAE): 281,780.2585758836 Mean Absolute Percentage Error (MAPE): 0.4150602981853167 Average Coefficient of Determination (R2): 0.6742126680784819 | Mean square error (MSE): 363,131,908,682.92114 Root Mean Square Error (RMSE): 600,397.3996485502 Mean Absolute Average Error (MAE): 274,373.7720657455 Mean Absolute Percentage Error (MAPE): 40.36714866794354 Average Coefficient of Determination (R2): 0.6785935757077579 | Mean square error (MSE): 358,330,160,249.951 Root Mean Square Error (RMSE): 596,877.1117985161 Mean Absolute Average Error (MAE): 274,639.79255875346 Mean Absolute Percentage Error (MAPE): 0.4060731553645008 Average Coefficient of Determination (R2): 0.6817419681214746 |
| Parameters Learning Rate: 0.1, Max Depth: 5 | ||
| Mean square error (MSE): 363,802,850,744.1417 Root Mean Square Error (RMSE): 601,976.346174561 Mean Absolute Average Error (MAE): 267,227.5604717435 Mean Absolute Percentage Error (MAPE): 0.3655855115372483 Average Coefficient of Determination (R2): 0.6754722953291219 | Mean square error (MSE): 353,574,484,563.452 Root Mean Square Error (RMSE): 590,792.706251847 Mean Absolute Average Error (MAE): 251,931.48744144678 Mean Absolute Percentage Error (MAPE): 34.046403366658495 Average Coefficient of Determination (R2): 0.688873336142018 | Mean square error (MSE): 357,737,186,286.3001 Root Mean Square Error (RMSE): 596,296.2949192963 Mean Absolute Average Error (MAE): 258,676.79656639433 Mean Absolute Percentage Error (MAPE): 0.3505309187565855 Average Coefficient of Determination (R2): 0.6825035289538606 |
| Parameters Learning Rate: 0.1, Max Depth: 7 | ||
| Mean square error (MSE): 363,001,803,164.3329 Root Mean Square Error (RMSE): 600,780.3468661088 Mean Absolute Average Error (MAE): 263,242.5591691387 Mean Absolute Percentage Error (MAPE): 0.35945234940377385 Average Coefficient of Determination (R2): 0.6776334579508448 | Mean square error (MSE): 385,180,751,270.6857 Root Mean Square Error (RMSE): 615,143.8190166078 Mean Absolute Average Error (MAE): 244,609.0183077359 Mean Absolute Percentage Error (MAPE): 31.516165758512994 Average Coefficient of Determination (R2): 0.6623796503978164 | Mean square error (MSE): 351,564,624,791.25244 Root Mean Square Error (RMSE): 590,726.7643884304 Mean Absolute Average Error (MAE): 251,454.25676581875 Mean Absolute Percentage Error (MAPE): 0.33535032411705346 Average Coefficient of Determination (R2): 0.6888671067626047 |
| Model | R2 Training | R2 Test | R2 Difference | RMSE Test (Error in Price Units) |
|---|---|---|---|---|
| XGBoost | 0.8577 | 0.7015 | 0.156 | 554,062.04 |
| LightGBM | 0.7784 | 0.7047 | 0.074 | 551,026.33 |
| HistGradientBoosting | 0.7720 | 0.7036 | 0.068 | 552,090.34 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Forradellas, R.F.R.; Acedo Benítez, G. Development of a Predictive Tool for Real Estate Analysis Using Machine Learning Techniques. Int. J. Financial Stud. 2026, 14, 130. https://doi.org/10.3390/ijfs14050130
Forradellas RFR, Acedo Benítez G. Development of a Predictive Tool for Real Estate Analysis Using Machine Learning Techniques. International Journal of Financial Studies. 2026; 14(5):130. https://doi.org/10.3390/ijfs14050130
Chicago/Turabian StyleForradellas, Ricardo Francisco Reier, and Gregorio Acedo Benítez. 2026. "Development of a Predictive Tool for Real Estate Analysis Using Machine Learning Techniques" International Journal of Financial Studies 14, no. 5: 130. https://doi.org/10.3390/ijfs14050130
APA StyleForradellas, R. F. R., & Acedo Benítez, G. (2026). Development of a Predictive Tool for Real Estate Analysis Using Machine Learning Techniques. International Journal of Financial Studies, 14(5), 130. https://doi.org/10.3390/ijfs14050130

