Development of a Predictive Tool for Real Estate Analysis Using Machine Learning Techniques

Forradellas, Ricardo Francisco Reier; Acedo Benítez, Gregorio

doi:10.3390/ijfs14050130

Open AccessArticle

Development of a Predictive Tool for Real Estate Analysis Using Machine Learning Techniques

by

Ricardo Francisco Reier Forradellas

^1,*

and

Gregorio Acedo Benítez

²

¹

DEKIS Research Group, Department of Economics, Catholic University of Ávila, 05005 Avila, Spain

²

Catholic University of Ávila, 05005 Avila, Spain

^*

Author to whom correspondence should be addressed.

Int. J. Financial Stud. 2026, 14(5), 130; https://doi.org/10.3390/ijfs14050130

Submission received: 18 March 2026 / Revised: 14 April 2026 / Accepted: 23 April 2026 / Published: 11 May 2026

(This article belongs to the Special Issue Machine Learning Applications in Computational Finance)

Download

Browse Figures

Versions Notes

Abstract

The real estate market is a complex and dynamic sector that plays a key role in economic stability and wealth generation. In many regions, real estate assets represent around 80% of household wealth, while rising housing prices have turned access to housing into a major social and economic challenge. In this context, the availability of accurate and accessible information is essential for decision-making by buyers, investors, and public administrations. This study proposes the development of an advanced technological tool based on Artificial Intelligence and Machine Learning techniques to predict and analyze real estate market dynamics within a specific geographic area. Using the city of Madrid as a case study, the research presents a digital application capable of estimating the market value of a property by analyzing comparable recently sold properties and incorporating key housing characteristics. By entering an address and a set of property features, the system generates a precise and data-driven valuation. The results demonstrate that AI-based approaches can significantly improve the accuracy and accessibility of real estate valuation processes. The proposed methodology enables real-time price estimation, graphical comparisons, and dynamic market analysis. Furthermore, the framework is scalable and can be extended to other geographic areas where relevant data are available, providing valuable insights for both academic research and practical decision-making in the real estate sector.

Keywords:

machine learning; innovation; real estate market

1. Introduction

The current state of the housing market presents significant challenges in certain cities and countries: high prices and limited accessibility. In Spain, particularly in major cities such as Madrid, property prices have reached levels not seen since the previous housing bubble in 2007, while difficulties in accessing housing persist (Kenyon et al., 2024; Fernandez-Perez et al., 2025). During 2024, housing costs increased by approximately 9.3%, nearly double the growth recorded the previous year, pushing prices above historic highs (Fernandez-Perez et al., 2025). This sustained increase—far outpacing wage growth—has left a growing portion of the population excluded from the housing market, to the point where housing became the primary socio-economic concern for 22.3% of Spaniards by the end of 2024.

The current housing market situation is not exclusive to Spain. Various studies show that the sustained rise in prices and the deterioration of affordability are recurring phenomena—albeit with local specificities—in numerous advanced economies. At the global level, Wetzstein (2017) identifies an “affordability crisis cycle” in which housing costs grow faster than incomes, particularly in major metropolitan areas, leading to residential exclusion and other negative macroeconomic effects. In Ireland, tensions are especially visible: while the loss of purchasing power has placed large social groups outside the market (Corrigan et al., 2019), field experiments reveal systematic ethnic discrimination that further restricts access to rental housing for certain groups (Gusciute et al., 2020).

In Eastern Europe, Poland offers an example of how migration flows can exert additional pressure: just months after the onset of the war in Ukraine, the arrival of refugees pushed rental prices in Warsaw and Poznan up by as much as 15% in the short term (Trojanek & Gluszak, 2022). The interaction between external shocks and local dynamics is also observable in other contexts. In the United States, D’Lima and Thibodeau (2022) demonstrate that the opioid crisis reduced housing prices in the most affected counties, illustrating how health risks can translate into significant property devaluations. In Austria and Ireland, the strength of the social housing stock acted as a partial buffer against post-2008 volatility, though it did not entirely prevent instability (Norris & Byrne, 2018). In the Netherlands, the price adjustment between 2011 and 2013 was closely tied to public policy decisions and the exposure of financial institutions, rather than simply being the result of a demand-side “bubble” (Boelhouwer, 2017). Even in markets such as the United Kingdom, the historic link between interest rates and housing prices changed after the 2007–2008 crisis, suggesting that conventional monetary policies have increasingly unpredictable effects on price dynamics (Tse et al., 2014).

Therefore, rising prices and limited access to housing are systemic, multifaceted, and increasingly internationalized challenges, driven by financial factors, demographic trends, and political decisions that transcend national borders. Consequently, there is an urgent need for tools and analyses that allow for a better understanding of these price dynamics and support informed decision-making in the real estate sector (Kabaivanov & Markovska, 2021; Soundararaj et al., 2022; Tekouabou et al., 2023).

The availability of real-time, granular information on the housing market is now essential for the various stakeholders involved in the sector (Tekouabou et al., 2023). A detailed and updated analysis of the real estate market in Madrid, for instance, provides valuable information for buyers and sellers by indicating price trends and optimal transaction timings. Similarly, investors and real estate agencies can identify areas with the greatest potential for appreciation and anticipate changes in demand, enabling them to minimize risks and plan more effective strategies. Even public administrations can rely on this data to design evidence-based housing policies, responding to the growing public concern over residential accessibility. The availability of accurate and dynamic indicators improves transparency and decision-making efficiency for all parties involved in the sector.

It is important to note that obtaining reliable real-time property valuations remains a methodological challenge, especially in more dynamic real estate markets such as those in major metropolitan areas. Traditional appraisal approaches, including many online valuation platforms, tend to base their results on a relatively limited set of variables (Rico-Juan & Taltavull de La Paz, 2021). These limitations arise when valuations are calculated based solely on basic structural information, such as size, number of rooms or general location. While these approaches can provide a valid preliminary estimate, they are significantly limited when it comes to more complex valuations that must account for intricate interactions among variables such as micro-location, contextual factors, access to basic services and market trends (Gyger et al., 2025). This becomes a particular problem in cities like Madrid, which was used as a reference, where significant heterogeneity leads to substantial price differences among properties, even those close together (Tarasov & Dessoulavy-Śliwiński, 2024).

This methodological limitation highlights a clear gap in the research. While previous studies have demonstrated the ability of machine learning techniques to improve housing price prediction, there is still a need to incorporate a scalable framework that can integrate diverse and highly heterogeneous data (Tapia et al., 2025). In this context, this study proposes developing an advanced technological application based on big data and artificial intelligence techniques to estimate a property’s market value more precisely. This application will incorporate geospatial information and be capable of cross-referencing multiple multifactorial variables to accurately determine housing prices. Unlike conventional valuation tools, whether through online platforms or financial institutions (such as Idealista, Fotocasa, or banking services), the approach developed in this study is designed to incorporate larger volumes of heterogeneous, georeferenced data, allowing for more effective modelling of non-linear relationships and incorporating trends in market evolution.

The choice of Madrid as a reference city is justified by the fact that it is a large and dynamic real estate market representative of high-demand urban trends. Madrid, as Spain’s primary economic and cultural center, experiences constant fluctuations in property prices that affect both residents and sector professionals. This makes it an ideal environment to test the effectiveness of the proposed tool before potentially expanding it to other regions.

The objective of this research is to create a decision-support system that transforms the way in which the real estate market is understood and navigated, providing a valuable tool for all stakeholders involved. The hypothesis is that the use of advanced machine learning techniques and big data will make it possible to develop this application successfully, offering accurate and useful information on real-time property values. Therefore, this study seeks to determine whether such an AI-based application can be effectively developed and to what extent its estimates can provide meaningful value to buyers, sellers, investors, and even public managers in the housing sector.

Following the introductory section, this paper is structured as follows: the first section presents a literature review on the impact of the real estate sector on the economy, paying particular attention to new technological tools shaping a new reality. A subsequent section is dedicated to the development of materials and methodology, explaining their academic relevance. The following section presents the results obtained, including a concrete case study. Finally, the main discussion and conclusions are presented, highlighting the study’s scientific rigor and its contribution to the existing body of literature.

2. Literature Review

2.1. The Importance of the Housing Market in the Economy

The housing market plays a fundamental role in the global economy. Residential investment (construction of new homes, renovations, etc.) and real estate activities (sales, rentals, and associated services) contribute significantly to a country’s Gross Domestic Product (GDP); in developed economies, the combined share of these components is estimated to represent around 15% of GDP on average, reflecting the economic weight of the sector (Zhang & Buyuklieva, 2025). Beyond its direct contribution to output, housing influences the business cycle: real estate booms tend to drive growth and employment—particularly in construction—while abrupt downturns can lead to recessions, as evidenced by the 2008 global financial crisis triggered by the U.S. housing bubble (Higgins & Sapci, 2023). In many countries, housing also functions as a barometer of financial health: as a high-value asset, price fluctuations affect household wealth and can alter consumption and investment patterns through the so-called “wealth effect” (Sun et al., 2024).

From a social and wealth accumulation perspective, housing is the primary investment for most families worldwide. Homeownership rates typically exceed 50% across most countries, implying that a large portion of the population has its wealth tied to the value of their home (Pfeffer & Waitkus, 2021). In fact, real estate wealth constitutes the largest component of household net worth in many economies, surpassing financial investments (Christophers, 2019). Rising housing prices therefore increase net household wealth, improve creditworthiness, and facilitate access to credit—through mortgages or home equity loans—generating a positive wealth effect on consumption (Guren et al., 2020). However, this mechanism entails risks: if housing prices fall sharply, household wealth may be eroded and spending curtailed, potentially transmitting problems from the housing sector to the broader economy (Atalay & Edwards, 2022).

In the European context, the housing market also has a significant economic and social impact, albeit with regional specificities. On average, the European Union records homeownership rates around 70%, reflecting a deeply rooted cultural preference for owning rather than renting; countries like Italy, Greece, and Spain exceed this average, while others such as Germany or Switzerland have higher proportions of renters, although housing remains a core component of family wealth (Kettunen & Ruonavaara, 2020). The residential sector contributes significantly to both GDP and employment: construction and real estate activities generate jobs throughout the value chain—from construction workers and developers to real estate agents—and act as a key indicator of macroeconomic stability. This is why the European Central Bank incorporates housing price inflation and mortgage indebtedness into its monetary and macroprudential policy design (Reisenbichler, 2021; Moro et al., 2022).

In recent decades, many European cities have experienced sharp increases in housing prices, driven by historically low interest rates, international investment flows, and land-use restrictions in major urban centers. As a result, housing access has become a central issue on the public agenda due to its impact on the cost of living and inequality, especially among young people and low-income households in high-pressure markets such as Paris, London, Amsterdam, and Munich (Colomb & Gallent, 2022). Additional factors, such as the expansion of tourism in urban and coastal destinations, have further increased prices and reduced affordability for local populations (Mikulić et al., 2021). Likewise, recent inflows of refugees and migrants have strained housing capacities in several cities and municipalities, prompting local governments to implement temporary housing solutions and highlighting the shortage of public rental stock (Lakševics et al., 2023). Finally, the increasing financialization of the housing market, with the entry of investment funds and the expansion of the buy-to-let model, has driven up prices and diminished the role of social housing in countries such as Ireland and the Netherlands (Byrne & Norris, 2019).

In Spain, housing holds a central place in both the economy and household wealth: around 75–80% of families are homeowners, and real estate assets represent nearly 80% of their wealth (Kenyon et al., 2024; Fernandez-Perez et al., 2025). During the 2000s boom, rapid housing appreciation fueled consumption through mortgages and pushed residential construction to account for 10–12% of GDP, positioning the sector as a pillar of growth and employment. However, the subsequent bubble and its burst in 2008 revealed excessive dependence: the sharp fall in prices and building activity plunged the country into a deep recession. Today, although real estate investment has declined to around 5% of GDP, the market remains crucial for household wealth, tax revenue, and the activity of related industries, meaning its evolution continues to shape Spain’s economic and financial stability (Kenyon et al., 2024; Fernandez-Perez et al., 2025).

While the macroeconomic and social importance of the housing market is widely recognised, its complexity also presents major challenges in terms of accurate valuation and decision-making. The fact that economic activity, household wealth and financial stability depend so heavily on real estate dynamics highlights the need for reliable valuation tools. Improving the accuracy and adaptability of these methods is therefore not only a technical issue but also a critical requirement for ensuring efficient market functioning and informed policy design.

As outlined above, the economic and social importance of the housing market highlights its role as a driver of growth and wealth accumulation, as well as its vulnerability to structural imbalances and external shocks. In this context, it is essential to understand the mechanisms through which housing prices evolve. However, the sector’s relevance cannot be fully assessed without considering its inherent instability and cyclical behaviour. The following section therefore examines the dynamics of real estate cycles and bubbles, which are key to understanding price volatility and the limitations of traditional valuation approaches.

2.2. Cycles and Bubbles in the Real Estate Market: Causes and Consequences

The real estate market has historically exhibited cyclical behavior, alternating between expansion phases—characterized by rapid increases in demand, construction activity, and prices—and phases of contraction or adjustment. These cycles can extend over years or even decades and are often synchronized with macroeconomic factors such as interest rates, household income, or demographic dynamics (Hromada et al., 2023; Nguyen & Bui, 2021). A notable feature of many housing cycles is the formation of bubbles: periods in which prices diverge from economic fundamentals—such as buyer incomes or potential rental yields—and are instead driven by expectations of continued appreciation and speculative behavior (Bogatyreva et al., 2021). During a bubble, the belief that “housing prices always rise” fuels mass purchases and further price increases, generating a vicious cycle that ultimately becomes unsustainable (Crisci, 2021; Mach, 2019).

Among the key drivers of global housing booms are loose financial conditions and abundant credit. Low interest rates and relaxed mortgage lending standards expand the base of potential buyers, boosting demand and prices (Gong et al., 2025; Sorge, 2023). This pattern was evident during the early 2000s, when financial innovation and subprime mortgages contributed to housing booms in both the U.S. and Europe (Li et al., 2021; Vergara-Perucich, 2023). Economic growth and rising household incomes reinforce these dynamics, particularly when accompanied by favorable demographic trends—such as household formation or migration flows—which add further upward pressure on demand (Liu & Xinyu, 2025; Whitehouse et al., 2025). When land or housing supply is inelastic, as in major cities with urban planning constraints, competition for existing properties drives prices even higher (Almeida, 2025; X. Ma & Xie, 2025). Psychological factors and market expectations also play a role: initial price increases generate optimism and fear of “being left out,” attracting both end-user demand and speculative investors seeking short-term gains (Fan et al., 2024; Yang et al., 2024). In the absence of countercyclical credit and fiscal regulations—such as loan-to-value limits or anti-speculation taxes—the market can overheat unchecked (Basco & Schäfer-i-Paradís, 2025; Lupu et al., 2025). When the bubble bursts, the consequences are long-lasting: household wealth is eroded, construction halts, credit conditions tighten, and public revenues decline, prolonging the recession and deepening barriers to housing access (Whitehouse et al., 2025; Vergara-Perucich, 2023).

After bottoming out in 2013–2014, the Spanish housing market entered a period of sustained recovery. Prices began to rise again in major cities and tourist areas, although at a slower pace than during the previous bubble. This more moderate growth reflected domestic demand still weighed down by unemployment and household deleveraging, as well as more conservative lending practices among financial institutions (Lamas & Romaniega, 2022; Álvarez-Román & García-Posada, 2021). Nevertheless, credit activity picked up again before the pandemic, and the entry of foreign buyers in coastal and urban areas added further upward pressure (Capellán et al., 2021). COVID-19 temporarily interrupted the market in 2020, but since 2021, demand has rebounded while the supply of new housing remains limited. Prices are now rising sharply in Madrid, Barcelona, the Balearic Islands, and the Costa del Sol, generating concerns over affordability: although the average mortgage burden is lower than in 2007, high relative prices and insufficient savings exclude many young people and low-income households (Sequera et al., 2022). With the memory of 2008 still fresh, both policymakers and market actors are monitoring for signs of overvaluation. Spain’s 2023 Housing Law introduced rent caps in stressed areas and strengthened tenant protections, while ongoing debates focus on how to expand affordable supply and curb the financialization of housing (Gil García & Martínez López, 2021). Recent research has also demonstrated the value of machine learning techniques to map submarkets and price patterns—tools that can support access and transparency-oriented housing policies (Rey-Blanco et al., 2023).

Analysing real estate cycles and bubbles reveals that housing markets are influenced not only by fundamental variables but also by nonlinear dynamics, speculative behaviour and sudden structural changes. These characteristics generate high levels of volatility and spatial heterogeneity, particularly in large urban markets. In such circumstances, traditional valuation methods, which are often based on static assumptions and limited sets of variables, may be unable to capture the complexity of price formation and the rapid evolution of market conditions. This limitation has led to growing interest in alternative approaches that can integrate large volumes of heterogeneous data and model complex relationships. The following section therefore explores the role of technological innovation, and more specifically artificial intelligence, as a superior alternative for real estate analysis and valuation.

2.3. Technological Applications in the Real Estate Market: Advantages, Disadvantages, and the Role of Artificial Intelligence

Before beginning this section on the use of artificial intelligence (AI) techniques in real estate valuation, it is important to introduce the general concept of explainable AI (XAI). The increasing use of machine learning in different sectors has led to concerns about how these complex models function and how opaque they are, especially when their results influence economic and social decisions (Ali et al., 2023). XAI has therefore emerged as an approach aimed at increasing the transparency, interpretability and accountability of AI systems, making their predictions more understandable and transparent to users. Rather than focusing solely on predictive accuracy, XAI seeks to explain its search process and how results are derived transparently (Hamm et al., 2023). This is a fundamental aspect for a sensitive sector such as real estate, as it fosters trust in the system and helps to identify potential biases and errors. These characteristics are particularly important in the proposed model, since automated valuations can influence many investment decisions and ultimately affect housing affordability.

Over the past two decades, the emergence of technology—grouped under the umbrella of PropTech—has rapidly transformed the global real estate market. Digital brokerage platforms, big data analytics, AI applied to asset valuation and management, and IoT devices monitoring buildings illustrate an industry transitioning toward a Real Estate 4.0 model (Starr et al., 2020). This wave of innovation gained even greater momentum after the pandemic, which acted as a catalyst for digitalization and accelerated the adoption of contactless solutions in both transactions and property management (Latif et al., 2023).

The effects of PropTech are ambivalent. On one hand, it offers operational efficiency, greater transparency, and a smoother user experience; on the other hand, it raises challenges related to data concentration and the potential erosion of the right to housing when algorithms follow purely financial logics (Gilman, 2024). Recent studies also show that the penetration of startups varies across regulatory ecosystems: factors such as market maturity, venture capital availability, and entrepreneurial culture influence the density and typology of emerging firms (Kassner, 2024). From a comparative perspective, the convergence between digitalization and sustainability is beginning to emerge as a competitive advantage for operators integrating ESG metrics into smart management platforms (Tan & Miller, 2023). At the local level, the Spanish experience confirms that fully online intermediation and data-driven models are reshaping traditional value chains and demanding new digital skills from real estate professionals (Asensio-Soto & Navarro-Astor, 2022). Finally, interviews with industry stakeholders in various European countries emphasize that technology acts “like a fork”: a versatile tool whose impact depends on how it is used by developers, property owners, and tenants (Tagliaro et al., 2024).

Artificial Intelligence (AI) has become a core pillar in the growth of PropTech, bringing machine learning capabilities and large-scale data analysis to the entire housing cycle. On the predictive front, machine learning models—from boosted tree ensembles to neural networks—are already achieving price estimates with lower error margins than traditional hedonic methods, thanks to their ability to capture non-linear relationships and fine-grained spatial effects (Sing et al., 2021; Dou et al., 2023; Xu & Zhang, 2022). These tools allow developers, mortgage fintech firms, and property portals to update valuations almost in real time and to detect “hotspots” or signs of overvaluation before they become visible to the market. AI is also evolving toward explainable and ethically robust models. The use of XAI techniques aims to clarify why two similar properties may have divergent prices—an essential factor to prevent bias that could perpetuate access inequalities—while fairness metrics are beginning to complement accuracy as deployment criteria (Acharya et al., 2024; Azam Khan et al., 2024). Internationally, studies from Saudi Arabia, Italy, and China confirm the transferability of these algorithms when recalibrated with local variables—such as construction typologies or urban planning regulations (Alzain et al., 2022; Rampini & Re Cecconi, 2021; Jin & Xu, 2024). In the Spanish case, recent research shows that combining satellite imagery, listing descriptions, and cadastral data can improve price range accuracy by 10% to 15% in cities with heterogeneous housing supply (Mora-Garcia et al., 2022). AI is also gaining ground in macroprudential management: continuous monitoring algorithms identify interest rate shocks and mortgage credit tensions, providing regulators with early warning dashboards that complement traditional indicators (Tekouabou et al., 2023). Taken together, these applications point to a more efficient housing market, but one that also requires regulatory oversight to ensure transparency, data protection, and fairness in automated decision-making.

AI applications focused on client experience and operational management are changing how real estate assets are marketed and operated. Conversational chatbots integrated into portals and apps answer frequently asked questions, filter user preferences, and schedule property visits without human intervention, reducing response times and improving user satisfaction (James et al., 2023; Seagraves, 2023). Recent versions, based on large language models, already include compliance safeguards and bias detection mechanisms, such as protections against redlining (Madani et al., 2024). Algorithms that monitor search history and geolocation now prioritize listings aligned with declared preferences, increasing conversion rates and reducing customer acquisition costs (Kriegbaum et al., 2024; Szumilo & Wiegelmann, 2024). Large portfolio managers apply machine learning models to estimate the likelihood of tenant turnover and adjust renewal offers or rental prices accordingly to minimize vacancy (Kaur & Solomon, 2021). In short-term rentals, AI systems dynamically recalculate rates on an hourly basis according to projected demand, local events, and cancellation patterns, mimicking the yield management logic of the airline industry. Integration of chatbots with CRM systems enables real-time prospect scoring and agent assignment—an approach that, according to studies in both developed and emerging markets, reduces sales funnel duration by 20% to 40% (Tanović & Hasibović, 2024; Jeung & Choi, 2024). The turning point came during the pandemic, which accelerated digitalization: an analysis of real estate websites reveals that conversational tools and virtual reality were rapidly adopted to compensate for restrictions on physical visits, consolidating AI use in marketing and customer service (Moro et al., 2022).

However, the adoption of AI in the Spanish and broader European real estate market also faces specific challenges. On one hand, Europe’s linguistic and legal fragmentation requires AI solutions to be adapted to each country: a tool trained on Anglo-Saxon market data may not be directly applicable to Spain, where housing dynamics, buyer behavior, and data sources differ. For instance, in Spain, real estate transactions must be recorded by a notary and registered, but such data are not always open or easily exploitable. Furthermore, specific housing categories, such as regulated-price social housing, must be distinguished by the algorithm to avoid distorting its estimates. On the other hand, cultural sensitivity around privacy in Europe is high: the use of personal data in algorithms is met with greater caution and must comply strictly with legal standards, which may limit certain Big Data initiatives that thrive in less regulated environments.

3. Materials and Methods

As previously indicated, the primary objective of this study is the development of an application that, by employing advanced data analysis and machine learning techniques, enables the accurate estimation of the market value of any property based on its specific characteristics. For its development, advanced and reliable machine learning algorithms have been implemented, capable of processing large volumes of data and generating precise, real-time analyses that anticipate market trends and support informed decision-making. The methods and calculations used to estimate real estate prices were as follows:

Comparative Property Valuation Method: This method involves comparing the property being appraised with similar properties recently sold within the same geographic area, adjusting for differences in location, size, features, and conditions between the comparable properties and the subject property to determine its value.
Haversine Formula: The Haversine formula plays a crucial role in this study because it allows the calculation of the shortest geodesic distance between two points on the Earth’s surface based on their latitude and longitude coordinates. Its use is especially relevant in real estate valuation and predictive modeling projects, as location directly influences access to services, transportation, and nearby amenities, which can increase or decrease a property’s value. Additionally, factors such as neighborhood safety, crime rates, and the quality of local schools are highly dependent on location. Geographic coordinates also enable spatial analysis by comparing nearby properties to estimate a more accurate market value.
Use of Tools and Resources for Integrating Custom Maps: In this study, as will be detailed later, specific tools such as the Mapbox API are used to obtain the geographic coordinates (longitude and latitude) of a specific location based only on knowledge of the city, street, and house number. Using Python (version Python 3.13) and Mapbox’s Geocoding API (use the latest stable version, V6), HTTP requests can be sent to convert an address into coordinates. This process is crucial for mapping data, performing spatial analyses, integrating geographic data, and improving predictive models in machine learning projects.

The methodological process developed in this work consists of the following steps:

Data Collection: Data were collected from publicly accessible online real estate platforms specialising in property listings and transactions in Madrid. These platforms include major digital marketplaces commonly used in the Spanish real estate sector, such as Idealista and Fotocasa, which provide detailed, structured information on residential properties. The dataset was constructed using an automated web scraping process with Robotic Process Automation (RPA) tools, specifically UiPath. This approach enabled the systematic extraction of large volumes of data directly from property listings. The collected variables include the listing price, the built area (m²), the number of rooms, the presence of amenities (e.g., elevator, garage, balcony, swimming pool), textual descriptions, and the geographic location (address-level information). These variables were selected based on their relevance in the real estate valuation literature and their availability across platforms. Data collection was carried out over a defined time period to ensure consistency in market conditions and avoid temporal distortions. Duplicate listings and repeated entries across platforms were identified and removed using automated matching procedures based on address, price and structural characteristics. In short, the study involves the combination of automated data collection, geocoding and predictive modelling. Data is collected via web scraping from various platforms.
Data Preprocessing: This includes data cleaning and transformation, the encoding of categorical variables, and the removal of duplicates. Data governance is essential to ensure the quality, consistency, and coherence of the data used.
Predictive Models: Training and validation of machine learning models, specifically employing techniques such as Gradient Boosting with HistGradientBoostingRegressor, to predict prices and trends in the real estate market. Cross-validation and model evaluation are crucial to avoid overfitting and to provide accurate estimates.
Geocoding: Obtaining the geographic coordinates (latitude and longitude) of property addresses, enabling detailed spatial analysis that considers factors such as the urban environment and proximity to services and amenities.
Application Development: Creation of an interactive application using Streamlit, which facilitates data analysis and predictive modeling. The application allows users to adjust property features and visualize their location on an interactive map of Madrid.

The study combines the automated data collection of real estate information with precise geocoding and advanced predictive modeling. For this web scraping, data were gathered from open-source online platforms concerning property prices, locations, characteristics, and other relevant details of properties in Madrid. The data were obtained from specialized property buying and selling websites. This methodology is widely used in academic research. Mrsic et al. (2020) developed a housing price prediction framework in Zagreb (Croatia) using publicly available data automatically extracted from a real estate portal; Mora-Garcia et al. (2022) applied a similar approach to the real estate market in major Spanish cities; Tchuente & Nyawa (2022) addressed price prediction using machine learning techniques in the French real estate market. Comparable studies conducted in urban real estate markets demonstrate that traditional approaches are widely outperformed by advanced machine learning models in predictive accuracy (Soltani et al., 2022; Zhang & Buyuklieva, 2025; Pastukh & Khomyshyn, 2025).

For data extraction, a Robotic Process Automation (RPA) platform was used to automate repetitive tasks such as web data collection (web scraping) without requiring complex programming. In this project, UiPath RPA was employed. UiPath programming code is characterized by its ability to collect large volumes of data in less time and with fewer errors than manual scraping, ensuring a more robust dataset for training various advanced learning models (Potturu, 2023; Kriegbaum et al., 2024; Szumilo & Wiegelmann, 2024). The main data elements collected to obtain detailed property information include price, size, number of rooms, geographic location on the map, and full description. The following figure (Figure 1) illustrates, as an example, the code used in the project with UiPath.

A meticulous approach to data cleaning is a fundamental pillar in any data science project (Gusciute et al., 2020). Data quality underpins all analysis and decision-making processes and plays a crucial role in obtaining robust and reliable conclusions. Proper data governance is considered essential to ensure data quality and consistency, a key element in similar large-scale valuation projects (Mora-Garcia et al., 2022; Pastukh & Khomyshyn, 2025). The following figures (Figure 2 and Figure 3) illustrate, by way of example, the data cleaning process.

Precisely defining the geocoding of properties is fundamental for accurately predicting their market value. Location influences property value due to various factors, such as the urban environment, accessibility to services and amenities, neighborhood quality, proximity to commercial and entertainment areas, and local market demand (Soltani et al., 2022). Incorporating exact geographic coordinates enables advanced geospatial analyses that identify market behavior patterns and area-specific trends, thereby enhancing the model’s ability to explain price variability (Tchuente & Nyawa, 2022). Furthermore, segmenting the market according to spatial attributes allows for the development of models tailored to specific urban contexts, which increases prediction accuracy and improves the model’s usefulness for investment decisions and urban planning (Mora-Garcia et al., 2022). Therefore, the inclusion of detailed location data is not only desirable but methodologically essential in machine learning-based real estate valuation studies. Based on these methodological premises, the predictive algorithm must be developed in accordance with the final objectives of the study.

Selection of the Predictive Algorithm

The algorithms used to analyze the real estate market in the city of Madrid are as follows:

XGBRegressor: This algorithm belongs to the Gradient Boosting family and is well-known for its effectiveness in building accurate predictive models. It uses a set of sequential decision trees that are iteratively trained to minimize a loss function.
LGBMRegressor: LightGBM is another Gradient Boosting algorithm distinguished by its speed and efficiency. It employs a leaf-wise tree growth approach, allowing for more effective splits at tree nodes, thereby achieving faster training times.
HistGradientBoostingRegressor: HistGradientBoosting is an optimized variant of Gradient Boosting that uses histograms to improve training speed and efficiency. By leveraging histograms to calculate the best splits at tree nodes, HistGradientBoosting can deliver superior performance on large datasets.

These algorithms were selected due to their ability to generate accurate predictions, handle large volumes of data, and offer interpretability of results, making them robust choices for analyzing the Madrid real estate market. Similar academic works have also demonstrated the suitability of this selection of Gradient Boosting-based algorithms with georeferenced models (Mubarak et al., 2022; Soltani et al., 2022). Research conducted in French cities shows that adding geocoding variables to Gradient Boosting estimators can improve price prediction accuracy by up to 50% compared to models without spatial variables (Tchuente & Nyawa, 2022). Studies such as Hjort et al. (2022) and Mora-Garcia et al. (2022) confirm that Boosting algorithms significantly outperform other methods in terms of accuracy and reduced overfitting, especially when spatial variables are incorporated. The integration of hybrid methods combining Machine Learning with geospatial analysis substantially enhances the explanatory power of the model by accounting for residual spatial variation not captured by purely statistical models, confirming that Gradient Boosting offers an optimal balance between accuracy and computational efficiency, even when handling large volumes of data with multiple structural and spatial variables (Cellmer & Kobylińska, 2024; Mora-Garcia et al., 2022; F. Ma et al., 2023).

The model training process will be conducted using cross-validation (K-fold), a machine learning evaluation technique that splits the dataset into K subsets (folds). This approach avoids overfitting and improves generalization to unseen data. Cross-validation divides the data into multiple folds, training the model on some folds and testing it on the remaining ones. This process is repeated multiple times to ensure the model is evaluated across different data partitions, thereby avoiding overfitting and providing a more accurate estimate of performance on unseen data. This technique is essential for objectively evaluating the performance of predictive models in real estate valuation contexts. A KFold object with 5 folds (n_splits = 5) will be defined -as shown in Figure 4-, shuffling the data randomly (shuffle = True) and ensuring reproducibility by setting a random seed (random_state = 42). Specific research in the real estate field highlights that the standard use of KFold with n = 5 is a validated methodological practice that improves model stability and the selection of optimal parameters before final deployment (Sevgen & Tanrivermiş, 2024; Sohrabi & Noorzai, 2024; Deppner & Cajias, 2024).

For each combination of parameters (learning_rate and max_depth), a HistGradientBoostingRegressor model is created, and cross-validation is performed by calculating various evaluation metrics. Cross-validation allows the model to be assessed on separate test datasets generated during each fold iteration. By using cross_val_score, the model is trained on k-1 folds and tested on the remaining fold, repeating this process k times. This provides a comprehensive evaluation of the model based on different subsets of the original dataset, as shown in Figure 5.

Various evaluation metrics will be used to provide different perspectives on the model’s performance. This study focuses on five key metrics: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and the Average Coefficient of Determination (R²). The following table (Table 1), referring to the different algorithm configurations, offers an excellent overview of its effectiveness and accuracy.

The table below (Table 2) summarizes the performance of the models using the optimal hyperparameter configuration found (in this case, Learning Rate, LR = 0.1 and Maximum Depth, MaxDepth = 7):

Once the various options for selecting the most suitable algorithm were analyzed, HistGradientBoostingRegressor was chosen. Its use in this study is fully justified by its ability to maintain high levels of predictive accuracy while reducing computational cost, thanks to its histogram-based optimization. By grouping continuous values into discrete intervals, the training process is significantly accelerated without compromising result quality, which is essential in urban contexts with large datasets. Comparative studies have demonstrated that optimized Gradient Boosting variants using this specific model are particularly effective in housing price prediction tasks, being faster and more scalable than traditional models such as XGBoost or Random Forest, without sacrificing accuracy (Mora-Garcia et al., 2022; Mubarak et al., 2022). Furthermore, the analysis of different loss functions and boosting configurations shows improvements in RMSE, MAE, and R² when hyperparameters are optimized in scalable algorithms of the type used in this work (Hjort et al., 2022). Recent research combining Machine Learning with geospatial methods (such as residual kriging) demonstrates that Boosting models significantly improve their performance when spatial variables and large-scale data are integrated (Cellmer & Kobylińska, 2024). Finally, in large urban data contexts, histogram-based variants have been documented to outperform traditional Boosting in terms of speed and efficiency (F. Ma et al., 2023). For all these reasons, HistGradientBoostingRegressor, as an optimized histogram-based implementation of Gradient Boosting, offers the best balance between accuracy, efficiency, and interpretability for predicting real estate prices in Madrid. This assertion will be further validated in the following section presenting the obtained results.

4. Results

As indicated in the previous section, as can be seen in Table 3, the algorithm that yields the best results is HistGradientBoostingRegressor, with the following parameters and performance metrics:

Learning Rate: 0.05, Max Depth: 7;
Mean Square Error (MSE): 353,605,890,528.1876;
Root Mean Square Error (RMSE): 592,031.1993;
Mean Absolute Error (MAE): 253,406.4904;
Mean Absolute Percentage Error (MAPE): 0.3424;
Average Coefficient of Determination (R²): 0.6877.

Table 3. Results of the HistGradientBoostingRegressor algorithm.

HistGradientBoostingRegressor Parameters: Learning Rate: 0.05, Max Depth: 7
Parameter/Metric	Value
Mean square error (MSE)	353,605,890,528.1876
Root Mean Square Error (RMSE)	592,031.1993021306
Mean Absolute Average Error (MAE)	253,406.49038236085
Mean Absolute Percentage Error (MAPE)	0.34244135004850396
Average Coefficient of Determination (R²)	0.6876900719421352

Source: Own elaboration.

The configuration “Learning Rate: 0.05, Max Depth: 7” was selected as the best for HistGradientBoostingRegressor for the following reasons:

Performance in Evaluation Metrics: This configuration yields the best results across evaluation metrics, including Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Coefficient of Determination (R²). Compared to other configurations, it shows the lowest MSE, RMSE, MAE, and MAPE, alongside the highest R², indicating the best model fit to the data.
Balance Between Bias and Variance: The combination of a moderately low learning rate (0.05) and a maximum depth of 7 provides an optimal balance between bias and variance. A moderate learning rate helps prevent overfitting, while a higher max depth enables the model to capture more complex relationships between predictor variables and the target variable.
Capacity to Capture Complex Relationships: The maximum depth of 7 allows the HistGradientBoostingRegressor model to capture intricate relationships in the data. This capability is crucial in the analysis of Madrid’s real estate market, where the relationships between property features and prices tend to be highly complex.

The following figure (Figure 6) illustrates a comparative overview of the results obtained with the different algorithms.

The primary objective of this work is the development of an application utilizing advanced data analysis and Machine Learning techniques to accurately estimate the market value of any property based on specific characteristics, with a focus on the city of Madrid. The resulting application can be accessed at the following repository: https://appmadrid-greg.onrender.com (accessed on 2 March 2026).

As demonstrated, the user interface of the application has been designed to be accessible and user-friendly, even for users without prior technical experience in data analysis or information technology. The application’s sidebar features sliders and checkboxes that allow users to intuitively adjust property characteristics. From area size and number of rooms to the presence of amenities such as balcony, swimming pool, and garage, users can easily customize property parameters according to their needs and preferences, as illustrated in the following figure (Figure 7):

One of the most notable features of the application is its ability to display the property’s location on an interactive map of the city of Madrid. Utilizing Folium technology, the application allows users to explore the exact location of the property within a dynamic geospatial environment. This technology, based on the open-source interactive mapping library Leaflet.js, is particularly well-suited for this study due to its capacity to represent geospatial information in an interactive, precise, and visually accessible manner. Thus, in addition to displaying the location of the property specified by the user, the map also highlights the five closest properties based on geographic distance, providing users with a broader perspective of the surrounding environment and available options in the area (Figure 8).

Finally, the project involves the analysis of data related to the real estate market, including sale prices, locations, property features, among others. Therefore, it enables the use of appropriate charts that clearly and concisely visualize relevant information, such as the distribution of prices by location or the relationship between property characteristics and their prices (Figure 9).

Results Testing

As the final example in this section, we present a demonstration of how the application works using a property located at a specific address in Madrid. Naturally, understanding the relative importance of each feature in the model helps identify which factors are most relevant for predicting housing prices (Figure 10).

The property and parameters assessed are as follows:

Property to be appraised and located through comparisons: Address: Avenida de Burgos No. 22, Madrid; Area: 100 m²; Bedrooms: 3.
Query parameters: Surface Area, Gated Community, Number of Bedrooms, Has Garage, No Elevator, Has Balcony, Green Area, Swimming Pool.

Based on these inputs, the results shown in the following image provide an estimated price of €445,300 (Figure 11).

Another example of implementation is shown in Figure 12.

In summary, all estimates can be made and the application’s usability can be tested within the application developed as part of this project (https://appmadrid-greg.onrender.com/).

5. Discussion

The current housing market situation poses significant challenges in various cities and countries, characterized by high prices and limited accessibility. In Spain, particularly in major urban centers such as Madrid, real estate prices have reached levels unseen since the previous bubble of 2007, while access difficulties persist (Kenyon et al., 2024; Fernandez-Perez et al., 2025). In 2024, housing costs increased by approximately 9.3%, nearly double the previous year, pushing prices above historic highs (Fernandez-Perez et al., 2025). This sustained rise—far outpacing wage growth—has excluded an increasing portion of the population from accessing housing, making it the foremost socioeconomic concern for 22.3% of Spaniards by the end of 2024.

This current housing market predicament is not exclusively Spanish; various studies demonstrate that sustained price increases and deteriorating affordability—albeit with local nuances—are recurrent phenomena across many advanced economies. Globally, Wetzstein (2017) identifies an “affordability crisis cycle” wherein housing costs rise faster than incomes, particularly in large metropolitan areas, leading to residential exclusion and other adverse macroeconomic impacts.

In Ireland, tensions are doubly evident: declining purchasing power has pushed wide social strata out of the market (Corrigan et al., 2019), while field experiments reveal systematic ethnic discrimination further restricting rental access for certain groups (Gusciute et al., 2020). In Eastern Europe, Poland exemplifies how migratory flows exert additional pressures: just months after the onset of the war in Ukraine, the influx of refugees caused rents in Warsaw and Poznan to increase by up to 15% in the short term (Trojanek & Gluszak, 2022).

Interactions between external shocks and local conditions are also observed elsewhere. In the United States, D’Lima and Thibodeau (2022) show that the opioid crisis reduced housing prices in the most affected counties, illustrating how health risks can translate into significant real estate depreciation. In Austria and Ireland, a robust social housing stock partially cushioned post-2008 volatility, though instability was not entirely avoided (Norris & Byrne, 2018). Meanwhile, the Dutch case reveals that the 2011–2013 price adjustment was closely linked to public policy decisions and financial sector exposure, rather than merely a “demand bubble” (Boelhouwer, 2017). Even in markets like the United Kingdom, the historical link between interest rates and residential prices changed after the 2007–2008 crisis, suggesting that conventional monetary policies now have increasingly unpredictable effects on price dynamics (Tse et al., 2014).

Hence, rising prices and housing access difficulties constitute systemic, multifaceted, and increasingly internationalized challenges driven by financial, demographic, and policy factors transcending national borders.

In response, there is an urgent need for innovative tools to address the housing market from a new perspective. The use of Big Data and Artificial Intelligence (AI) techniques emerges as a revolutionary approach in the real estate sector, especially in asset valuation. The integration of these technologies not only optimizes existing processes but also unlocks new opportunities for future innovation. The ability to handle large volumes of data and analyze them in real time improves valuation accuracy and enables more effective trend prediction and risk assessment.

Big Data’s application in real estate has been extensively studied, highlighting its importance and potential (Starr et al., 2020; Latif et al., 2023). Incorporating Big Data and these new technologies into property valuation has enhanced efficiency and precision, contrasting traditional methodologies with data-driven approaches based on massive datasets (Mrsic et al., 2020; Soltani et al., 2022; Zhang & Buyuklieva, 2025; Pastukh & Khomyshyn, 2025). The so-called PropTech revolution exemplifies how real estate transactions and asset management have been transformed. Digitalization and connectivity have generated vast data quantities, which, when analyzed through Big Data techniques, reveal patterns and trends, optimizing property management and improving customer experience (Sing et al., 2021; Dou et al., 2023; Xu & Zhang, 2022).

This work demonstrates how Big Data provides a solid foundation for data-driven decision-making, moving beyond sole reliance on intuition or experience. Specifically, it accompanies the development of an application that validates the feasibility of the proposed process and objectives. As detailed previously, the results indicate that Big Data can overcome traditional methods’ limitations, achieving more objective and accurate valuations (Mora-Garcia et al., 2022; Tchuente & Nyawa, 2022; Soltani et al., 2022).

This study provides novel empirical evidence by demonstrating that the influence of housing characteristics on price is not only consistent with traditional hedonic models but can also be quantified with far greater precision using advanced machine learning techniques. Unlike previous research that relies on linear models, this study employs an optimized HistGradientBoostingRegressor (learning rate = 0.1; max depth = 7), capable of capturing nonlinear relationships and interaction effects that traditional methods overlook.

The model achieves predictive performance significantly superior to that reported in the conventional housing literature (R² = 0.8416; MAPE = 0.02), explaining more than 84% of price variability. Additionally, the feature importance analysis quantitatively reveals how property value is formed in Madrid:

Property size (44.15%).
Proximity to green areas (10.76%).
Absence of an elevator (6.71%).
Geographical coordinates (latitude 6.34%; longitude 5.49%).
Swimming pool (5.77%) and gated community (5.70%).

This contribution goes beyond confirming what the existing literature suggests. It precisely quantifies the relative weight of each variable, generating new, measurable, and reproducible knowledge for future research.

The results offer an objective foundation for decision-making by developers, appraisers, real estate agents, and investors. The low prediction error (RMSE = 0.40; MAE = 0.30) reduces uncertainty in pricing, appraisal, buying/selling, and strategic renovation processes. Moreover, the quantified weight of each variable provides a clear guide to investment prioritization:

Increasing usable floor area or locating properties near green spaces yields the highest market value return.
Features such as elevator, swimming pool, balcony, garage, or gated community add value, though to a lesser extent.
The model’s very low MAPE (0.02) confirms its suitability for integration into real-time decision-support systems, which is particularly beneficial in volatile markets like Madrid.

The results also confirm the structural complexity of the housing market—illustrated here through the Madrid case—and reinforce the widely accepted academic notion that real estate cycles are influenced by multiple interrelated factors. Price evolution, spatial heterogeneity, technological disruption, and new demographic dynamics converge to create a highly unstable and hard-to-predict environment.

The observed spatial segmentation aligns with studies like Kenyon et al. (2024), which highlight rising intra-urban inequality post-financial crisis, with well-connected and appreciating cores contrasting with peripheral areas of low activity. The price heterogeneity by district found in this study underscores the need for explanatory models incorporating socio-spatial factors alongside economic variables.

Regarding AI techniques in real estate analysis, the results confirm their predictive usefulness and ability to capture complex patterns, particularly in dynamic urban settings. This trend is consistent with recent works by Alzain et al. (2022), Tekouabou et al. (2023), and Dou et al. (2023), emphasizing the value of machine learning approaches and explainable models for anticipating price fluctuations and enhancing decision-making. However, challenges remain in model interpretability and robustness, as noted by Acharya et al. (2024) and Azam Khan et al. (2024). Following authors such as Asensio-Soto and Navarro-Astor (2022) and Gilman (2024), this work advances the use of real-time information to define updated market patterns, trends, and statistics.

The findings also align with Kenyon et al. (2024) and Sequera et al. (2022), who highlight growing urban market inequality and the need for more dynamic models adapted to intra-urban diversity.

Multiple authors have emphasized the critical role of housing financialization and the impact of institutional agents and global investors on price dynamics, especially in central urban markets (Christophers, 2019; Gil García & Martínez López, 2021; Byrne & Norris, 2019). This study concurs, with recent data from Madrid reflecting upward price pressures poorly justified by local economic fundamentals, possibly indicating structural imbalances between supply and solvent demand, or even persistent speculative behavior, as suggested by Fernandez-Perez et al. (2025) in their analysis of bubbles in Madrid and Barcelona.

From a structural perspective, the persistent shortage of affordable housing, combined with pressures from tourist rentals and speculative investment, generates tensions disproportionately affecting Madrid’s population. This situation mirrors developments in other European metropolises and echoes warnings about financialization and the loss of housing’s social function described by Sequera et al. (2022) and Colomb and Gallent (2022).

Future research could highlight the benefits of integrating alternative data sources (PropTech, urban sensors, social media) and mixed methods combining quantitative and qualitative analyses (Asensio-Soto & Navarro-Astor, 2022; Gilman, 2024). It would also be pertinent to incorporate temporal dynamics of housing bubbles (Whitehouse et al., 2025) and examine the effects of monetary or regulatory policies, as suggested by Sorge (2023) and X. Ma and Xie (2025).

6. Conclusions

This study demonstrates the feasibility and effectiveness of using big data and machine learning techniques for real estate valuation in complex urban environments, such as Madrid. The model developed using the HistGradientBoostingRegressor algorithm demonstrates solid predictive performance (R² = 0.6877), confirming its ability to capture non-linear relationships between housing prices and key variables such as location, structural characteristics and socio-economic context.

From a methodological perspective, the findings support the use of data-driven approaches as a scalable and robust alternative to traditional valuation methods, particularly in markets characterised by high spatial heterogeneity and dynamic price formation. Integrating georeferenced data with machine learning techniques enables more accurate and adaptive valuation processes, enhancing predictive capability and practical applicability.

From a societal perspective, this study provides valuable tools for identifying territorial inequalities and supporting evidence-based public policy. Using geographic coordinates enables the identification of areas under market pressure and experiencing socio-spatial segregation and unequal access to housing. This provides actionable insights for affordable housing strategies, differentiated taxation and land use regulation. Furthermore, the significant impact of access to green spaces indicates that public investment in environmental infrastructure improves quality of life and generates measurable economic value. However, the growing popularity of features such as swimming pools and gated communities reflects a preference for private, secure residential environments, raising concerns about urban fragmentation and social cohesion.

These results should also be considered in the context of the housing market stress observed in Madrid and other global cities, where sustained price increases continue to outpace income growth, exacerbating accessibility challenges. While prior research in countries such as Spain, Ireland, the Netherlands and the United Kingdom has documented these dynamics, few studies have used big data and machine learning techniques to quantify the spatial and structural determinants of housing prices with comparable precision. In this regard, the present study contributes by combining recent data, spatial analysis, and predictive modelling to provide enhanced analytical depth, improved forecasting capacity, and actionable insights for policymakers and market participants alike.

Despite these contributions, certain limitations remain. The lack of access to comprehensive administrative data and longitudinal information restricts the analysis of temporal dynamics, and the complexity of machine learning models poses challenges in terms of interpretability and transparency. Future research should address these limitations by incorporating richer datasets, explainable AI techniques, and additional social and environmental variables.

In conclusion, the digitalisation of real estate valuation offers a significant opportunity to enhance decision-making processes, boost market transparency, and promote more efficient and equitable housing systems. Integrating technological innovation with domain-specific knowledge is essential to address the structural challenges of contemporary housing markets and advance more sustainable and inclusive urban development.

Author Contributions

Conceptualization, R.F.R.F. and G.A.B.; methodology, R.F.R.F.; validation, R.F.R.F., and G.A.B.; formal analysis, R.F.R.F. and G.A.B.; investigation, R.F.R.F. and G.A.B.; resources, R.F.R.F. and G.A.B.; data curation, G.A.B.; writing—original draft preparation, R.F.R.F. and G.A.B.; writing—review and editing, R.F.R.F.; supervision, R.F.R.F.; project administration, R.F.R.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Acharya, D. B., Divya, B., & Kuppan, K. (2024). Explainable and fair AI: Balancing performance in financial and real estate machine learning models. IEEE Access, 12, 154022–154034. [Google Scholar] [CrossRef]
Ali, S., Abuhmed, T., El-Sappagh, S., Muhammad, K., Alonso-Moral, J. M., Confalonieri, R., Guidotti, R., Del Ser, J., Díaz-Rodríguez, N., & Herrera, F. (2023). Explainable Artificial Intelligence (XAI): What we know and what is left to attain trustworthy artificial intelligence. Information Fusion, 99, 101805. [Google Scholar] [CrossRef]
Almeida, R. P. (2025). Cycles, trends, disruptions: Real estate centrality on the global financial crisis, COVID-19 Pandemic, and new techno-economic paradigm. Real Estate, 2(1), 1. [Google Scholar] [CrossRef]
Alzain, E., Alshebami, A. S., Aldhyani, T. H. H., & Alsubari, S. N. (2022). Application of artificial intelligence for predicting real estate prices: The case of Saudi Arabia. Electronics, 11(21), 3448. [Google Scholar] [CrossRef]
Asensio-Soto, J. C., & Navarro-Astor, E. (2022). Proptech: A qualitative analysis of online real estate brokerage agencies in Spain. Intangible Capital, 18(3), 489. [Google Scholar] [CrossRef]
Atalay, K., & Edwards, R. (2022). House prices, housing wealth and financial well-being. Journal of Urban Economics, 129, 103438. [Google Scholar] [CrossRef]
Azam Khan, M. D., Debnath, P., Al Sayeed, A., Sumon, M. F. I., Rahman, A., Tushar Khan, M. D., & Pant, L. (2024). Explainable AI and machine learning model for California house price predictions: Intelligent model for homebuyers and policymakers. Journal of Business and Management Studies, 6(5), 73–84. [Google Scholar] [CrossRef]
Álvarez-Román, L., & García-Posada, M. (2021). Are house prices overvalued in Spain? A regional approach. Economic Modelling, 99, 105499. [Google Scholar] [CrossRef]
Basco, S., & Schäfer-i-Paradís, M. (2025). A model-free test of rational bubbles: An application to the US housing market. Economics and Business Letters, 14(2), 117–125. [Google Scholar] [CrossRef]
Boelhouwer, P. (2017). The role of government and financial institutions during a housing market crisis: A case study of the Netherlands. International Journal of Housing Policy, 17(4), 591–602. [Google Scholar] [CrossRef]
Bogatyreva, M. V., Leskinen, M. I., & Kolmakov, M. A. (2021). The domestic real estate market during financial crises. IOP Conference Series: Earth and Environmental Science, 751(1), 012134. [Google Scholar] [CrossRef]
Byrne, M., & Norris, M. (2019). Housing market financialization, neoliberalism and everyday retrenchment of social housing. Environment and Planning A: Economy and Space, 54(1), 182–198. [Google Scholar] [CrossRef]
Capellán, R. U., Luis Sánchez Ollero, J., & Pozo, A. G. (2021). The influence of the real estate investment trust in the real estate sector on the Costa del Sol. European Research on Management and Business Economics, 27(1), 100133. [Google Scholar] [CrossRef]
Cellmer, R., & Kobylińska, K. (2024). Housing price prediction—Machine learning and geostatistical methods. Real Estate Management and Valuation, 33(1), 1–10. [Google Scholar] [CrossRef]
Christophers, B. (2019). A tale of two inequalities: Housing-wealth inequality and tenure inequality. Environment and Planning A: Economy and Space, 53(3), 573–594. [Google Scholar] [CrossRef]
Colomb, C., & Gallent, N. (2022). Post-COVID-19 mobilities and the housing crisis in European urban and rural destinations. Policy challenges and research agenda. Planning Practice & Research, 37(5), 624–641. [Google Scholar] [CrossRef]
Corrigan, E., Foley, D., McQuinn, K., O’Toole, C., & Slaymaker, R. (2019). Exploring affordability in the Irish housing market. The Economic and Social Review, 50(1), 119–157. [Google Scholar]
Crisci, M. (2021). The impact of the real estate crisis on a South European metropolis: From urban diffusion to reurbanisation. Applied Spatial Analysis and Policy, 15(3), 797–820. [Google Scholar] [CrossRef]
Deppner, J., & Cajias, M. (2024). Accounting for spatial autocorrelation in algorithm-driven hedonic models: A spatial cross-validation approach. Journal of Real Estate Finance and Economics, 68, 235–273. [Google Scholar] [CrossRef]
D’Lima, W., & Thibodeau, M. (2022). Health crisis and housing market effects—Evidence from the U.S. opioid epidemic. The Journal of Real Estate Finance and Economics, 67(4), 735–752. [Google Scholar] [CrossRef]
Dou, M., Gu, Y., & Fan, H. (2023). Incorporating neighborhoods with explainable artificial intelligence for modeling fine-scale housing prices. Applied Geography, 158, 103032. [Google Scholar] [CrossRef]
Fan, R., Xie, X., Wang, Y., & Lin, J. (2024). Effect of financial contagion between real and financial sectors on asset bubbles: A two-layer network game approach. Managerial and Decision Economics, 46(1), 393–408. [Google Scholar] [CrossRef]
Fernandez-Perez, A., Gómez-Puig, M., & Sosvilla-Rivero, S. (2025). El clasico of housing: Bubbles in Madrid and Barcelona’s real estate markets. Elsevier BV. [Google Scholar] [CrossRef]
Gil García, J., & Martínez López, M. A. (2021). State-Led actions reigniting the financialization of housing in Spain. Housing, Theory and Society, 40(1), 1–21. [Google Scholar] [CrossRef]
Gilman, M. E. (2024). The impact of proptech and the datafication of real estate on the human right to housing. SSRN Electronic Journal. [Google Scholar] [CrossRef]
Gong, X.-L., Lu, J.-Y., Xiong, X., & Zhang, W. (2025). Liquidity constraints, real estate regulation, and local government debt risks. Financial Innovation, 11(1), 5. [Google Scholar] [CrossRef]
Guren, A. M., McKay, A., Nakamura, E., & Steinsson, J. (2020). Housing wealth effects: The long view. The Review of Economic Studies, 88(2), 669–707. [Google Scholar] [CrossRef]
Gusciute, E., Mühlau, P., & Layte, R. (2020). Discrimination in the rental housing market: A field experiment in Ireland. Journal of Ethnic and Migration Studies, 48(3), 613–634. [Google Scholar] [CrossRef]
Gyger, T., Hauri, S., Bühlmann, S., Lehner, M., Schlesinger, J., & Sigrist, F. (2025). Explainable spatial machine learning for hedonic real estate modeling. Available online: https://ssrn.com/abstract=5191260 (accessed on 10 March 2026).
Hamm, P., Klesel, M., Coberger, P., & Wittmann, H. F. (2023). Explanation matters: An experimental study on explainable AI. Electron Markets 33, 17. [Google Scholar] [CrossRef]
Higgins, C. R., & Sapci, A. (2023). Time-varying volatility and the housing market. Macroeconomic Dynamics, 28(2), 426–461. [Google Scholar] [CrossRef]
Hjort, A., Pensar, J., Scheel, I., & Sommervoll, D. E. (2022). House price prediction with gradient boosted trees under different loss functions. Journal of Property Research, 39(4), 338–364. [Google Scholar] [CrossRef]
Hromada, E., Heralová, R. S., Čermáková, K., Piecha, M., & Kadeřábková, B. (2023). Impacts of crisis on the real estate market depending on the development of the region. Buildings, 13(4), 896. [Google Scholar] [CrossRef]
James, B. V., Joseph, D., & Daniel, N. (2023). Young adults’ experience of housing and real estate chatbots in India: Effort expectancy moderated model. International Journal of Housing Markets and Analysis, 17(4), 1050–1066. [Google Scholar] [CrossRef]
Jeung, Y.-B., & Choi, J. (2024). Factors of the behavioral intention to adopt chatbot services for real estate complaints. Journal of Digital Contents Society, 25(2), 573–584. [Google Scholar] [CrossRef]
Jin, B., & Xu, X. (2024). Pre-owned housing price index forecasts using Gaussian process regressions. Journal of Modelling in Management, 19(6), 1927–1958. [Google Scholar] [CrossRef]
Kabaivanov, S., & Markovska, V. (2021). Artificial intelligence in real estate market analysis. AIP Conference Proceedings, 2333, 030001. [Google Scholar] [CrossRef]
Kassner, A. J. (2024). Factors influencing investment into PropTech and FinTech—Only new rules or a new game? Journal of European Real Estate Research, 17(3), 395–411. [Google Scholar] [CrossRef]
Kaur, T., & Solomon, P. (2021). A study on automated property management in commercial real estate: A case of India. Property Management, 40(2), 247–264. [Google Scholar] [CrossRef]
Kenyon, G. E., Arribas-Bel, D., Robinson, C., Gkountouna, O., Arbués, P., & Rey-Blanco, D. (2024). Intra-urban house prices in Madrid following the financial crisis: An exploration of spatial inequality. npj Urban Sustainability, 4(1), 26. [Google Scholar] [CrossRef]
Kettunen, H., & Ruonavaara, H. (2020). Rent regulation in 21stcentury Europe. Comparative perspectives. Housing Studies, 36(9), 1446–1468. [Google Scholar] [CrossRef]
Kriegbaum, A., Ebert, C., & Raghabendra, K. (2024). Chatbots selling condos: Generative artificial intelligence in real estate. Journal of Marketing Development and Competitiveness, 18(4), 69. [Google Scholar] [CrossRef]
Lakševics, K., Franz, Y., Haase, A., Nasya, B., Patti, D., Reeger, U., Raubiško, I., Schmidt, A., & Šuvajevs, A. (2023). The permanent regime of temporary solutions: Housing of forced migrants in Europe as a policy challenge. European Urban and Regional Studies, 31(1), 81–87. [Google Scholar] [CrossRef]
Lamas, M., & Romaniega, S. (2022). Designing a price index for the Spanish commercial real estate market (Elaboración de un índice de precios para el mercado inmobiliario comercial de España). SSRN Electronic Journal. [Google Scholar] [CrossRef]
Latif, S. N. F. A., Nawawi, A. H., & Wahab, M. A. (2023). PropTech: Technological innovation for sustainable real estate. AIP Conference Proceedings, 2947, 020016. [Google Scholar] [CrossRef]
Li, S., Liu, J., Dong, J., & Li, X. (2021). 20 years of research on real estate bubbles, risk and exuberance: A bibliometric analysis. Sustainability, 13(17), 9657. [Google Scholar] [CrossRef]
Liu, X., & Xinyu, L. (2025). Regional differences and dynamic evolution of house price bubble risks in provincial areas of China. Elsevier BV. [Google Scholar] [CrossRef]
Lupu, R., Călin, A. C., Dumitrescu, D. G., & Lupu, I. (2025). Introducing a novel fragility index for assessing financial stability amid asset bubble episodes. The North American Journal of Economics and Finance, 75, 102291. [Google Scholar] [CrossRef]
Ma, F., Wang, J., Wahab, M., & Ma, Y. (2023). Stock market volatility predictability in a data-rich world: A new insight. International Journal of Forecasting, 39(4), 1804–1819. [Google Scholar] [CrossRef]
Ma, X., & Xie, H. (2025). Real estate policy regulation and corporate financial risk: China’s Three Red Lines policy. Pacific Economic Review, 30(1), 46–87. [Google Scholar] [CrossRef]
Mach, Ł. (2019). Measuring and assessing the impact of the global economic crisis on European real property market. Journal of Business Economics and Management, 20(6), 1189–1209. [Google Scholar] [CrossRef]
Madani, N., Bagalkotkar, A., Anand, S., Arnson, G., Srihari, R., & Joseph, K. (2024). A recipe for building a compliant real estate chatbot. arXiv, arXiv:2410.10860v1. [Google Scholar] [CrossRef]
Mikulić, J., Vizek, M., Stojčić, N., Payne, J. E., Čeh Časni, A., & Barbić, T. (2021). The effect of tourism activity on housing affordability. Annals of Tourism Research, 90, 103264. [Google Scholar] [CrossRef]
Mora-Garcia, R.-T., Cespedes-Lopez, M.-F., & Perez-Sanchez, V. R. (2022). Housing price prediction using machine learning algorithms in COVID-19 times. Land, 11(11), 2100. [Google Scholar] [CrossRef]
Moro, M. F., de Souza Mendonça, A. K., & de Andrade, D. F. (2022). COVID-19 pandemic accelerates the perception of digital transformation on real estate websites. Quality & Quantity, 57(3), 2165–2181. [Google Scholar] [CrossRef]
Mrsic, L., Jerkovic, H., & Balkovic, M. (2020). Real estate market price prediction framework based on public data sources with case study from Croatia. In P. Sitek, M. Pietranik, M. Krótkiewicz, & C. Srinilta (Eds.), Intelligent information and database systems. (ACIIDS 2020, Communications in Computer and Information Science, Vol. 1178). Springer. [Google Scholar] [CrossRef]
Mubarak, M., Tahir, A., Waqar, F., Haneef, I., McArdle, G., Bertolotto, M., & Saeed, M. T. (2022). A map-based recommendation system and house price prediction model for real estate. ISPRS International Journal of Geo-Information, 11(3), 178. [Google Scholar] [CrossRef]
Nguyen, M.-L. T., & Bui, T. N. (2021). The macroeconomy and the real estate market: Evidence from the global financial crisis and the COVID-19 pandemic crisis. Industrial Engineering & Management Systems, 20(3), 373–383. [Google Scholar] [CrossRef]
Norris, M., & Byrne, M. (2018). Housing market (in)stability and social rented housing: Comparing Austria and Ireland during the global financial crisis. Journal of Housing and the Built Environment, 33(2), 227–245. [Google Scholar] [CrossRef]
Pastukh, O., & Khomyshyn, V. (2025, October 23–25). Using ensemble methods of machine learning to predict real estate prices. ITTAP’2024: 4th International Workshop on Information Technologies: Theoretical and Applied Problems, Ternopil, Ukraine and Opole, Poland. [Google Scholar] [CrossRef]
Pfeffer, F. T., & Waitkus, N. (2021). The wealth inequality of nations. American Sociological Review, 86(4), 567–602. [Google Scholar] [CrossRef]
Potturu, S. M. (2023). UiPath bot framework: Accelerating RPA development and innovation. IJRDO—Journal of Computer Science Engineering, 9(4), 1–15. [Google Scholar] [CrossRef]
Rampini, L., & Re Cecconi, F. (2021). Artificial intelligence algorithms to predict Italian real estate market prices. Journal of Property Investment & Finance, 40(6), 588–611. [Google Scholar] [CrossRef]
Reisenbichler, A. (2021). The politics of quantitative easing and housing stimulus by the federal reserve and European central bank, 2008–2018. In Bricks in the wall (pp. 190–210). Routledge. [Google Scholar] [CrossRef]
Rey-Blanco, D., Arbués, P., López, F. A., & Páez, A. (2023). Using machine learning to identify spatial market segments. A reproducible study of major Spanish markets. Environment and Planning B: Urban Analytics and City Science, 51(1), 89–108. [Google Scholar] [CrossRef]
Rico-Juan, J. R., & Taltavull de La Paz, P. (2021). Machine learning with explainability or spatial hedonics tools? An analysis of the asking prices in the housing market in Alicante, Spain. Expert Systems with Applications, 171, 114590. [Google Scholar] [CrossRef]
Seagraves, P. (2023). Real Estate Insights: Is the AI revolution a real estate boon or bane? Journal of Property Investment & Finance, 42(2), 190–199. [Google Scholar] [CrossRef]
Sequera, J., Nofre, J., Díaz-Parra, I., Gil, J., Yrigoy, I., Mansilla, J., & Sánchez, S. (2022). The impact of COVID-19 on the short-term rental market in Spain: Towards flexibilization? Cities, 130, 103912. [Google Scholar] [CrossRef]
Sevgen, S. C., & Tanrivermiş, Y. (2024). Comparison of machine learning algorithms for mass appraisal of real estate data. Real Estate Management and Valuation, 32(2), 100–111. [Google Scholar] [CrossRef]
Sing, T. F., Yang, J. J., & Yu, S. M. (2021). Boosted tree ensembles for artificial intelligence based automated valuation models (AI-AVM). The Journal of Real Estate Finance and Economics, 65(4), 649–674. [Google Scholar] [CrossRef]
Sohrabi, H., & Noorzai, E. (2024). Risk-supported case-based reasoning approach for cost overrun estimation of water-related projects using machine learning. Engineering, Construction & Architectural Management, 31(2), 544–570. [Google Scholar] [CrossRef]
Soltani, A., Heydari, M., Aghaei, F., & Pettit, C. F. (2022). Housing price prediction incorporating spatio-temporal dependency into machine learning algorithms. Cities, 131, 103941. [Google Scholar] [CrossRef]
Sorge, M. M. (2023). Politics, financial regulation and housing bubbles. The Journal of Real Estate Finance and Economics, 70(1), 65–91. [Google Scholar] [CrossRef]
Soundararaj, B., Pettit, C., & Lock, O. (2022). Using real-time dashboards to monitor the impact of disruptive events on real estate market. Case of COVID-19 pandemic in Australia. Computational Urban Science, 2(1), 14. [Google Scholar] [CrossRef] [PubMed]
Starr, C. W., Saginor, J., & Worzala, E. (2020). The rise of PropTech: Emerging industrial technologies and their impact on real estate. Journal of Property Investment & Finance, 39(2), 157–169. [Google Scholar] [CrossRef]
Sun, Q., Javeed, S. A., Tang, Y., & Feng, Y. (2024). Correction: The impact of housing prices and land financing on economic growth: Evidence from Chinese 277 cities at the prefecture level and above. PLoS ONE, 19(5), e0304494. [Google Scholar] [CrossRef]
Szumilo, N., & Wiegelmann, T. (2024). Real Estate Insights AI: Real estate’s new roommate—The good, the bad and the algorithmic. Journal of Property Investment & Finance, 42(2), 211–217. [Google Scholar] [CrossRef]
Tagliaro, C., Pomè, A. P., Migliore, A., & Danivska, V. (2024). Technology “like a fork”. How PropTech shapes real estate innovation. Journal of European Real Estate Research, 18(1), 4–26. [Google Scholar] [CrossRef]
Tan, Z., & Miller, N. G. (2023). Connecting digitalization and sustainability: Proptech in the real estate operations and management. Journal of Sustainable Real Estate, 15(1), 2203292. [Google Scholar] [CrossRef]
Tanović, A., & Hasibović, A. Ć. (2024, May 20–24). Automated real estate chatbot. 2024 47th MIPRO ICT and Electronics Convention (MIPRO) (pp. 241–246), Opatija, Croatia. [Google Scholar] [CrossRef]
Tapia, J., Chavez-Garzon, N., Pezoa, R., Suarez-Aldunate, P., & Pilleux, M. (2025). Comparing automated valuation models for real estate assessment in the Santiago Metropolitan Region: A study on machine learning algorithms and hedonic pricing with spatial adjustments. PLoS ONE, 20(3), e0318701. [Google Scholar] [CrossRef]
Tarasov, S., & Dessoulavy-Śliwiński, B. (2024). Algorithm-driven hedonic real estate pricing—An explainable AI approach. Real Estate Management and Valuation, 33(1), 22–34. [Google Scholar] [CrossRef]
Tchuente, D., & Nyawa, S. (2022). Real estate price estimation in French cities using geocoding and machine learning. Annals of Operations Research, 308, 571–608. [Google Scholar] [CrossRef]
Tekouabou, S. C. K., Gherghina, Ş. C., Kameni, E. D., Filali, Y., & Idrissi Gartoumi, K. (2023). AI-Based on machine learning methods for urban real estate prediction: A systematic survey. Archives of Computational Methods in Engineering, 31(2), 1079–1095. [Google Scholar] [CrossRef]
Trojanek, R., & Gluszak, M. (2022). Short-run impact of the Ukrainian refugee crisis on the housing market in Poland. Finance Research Letters, 50, 103236. [Google Scholar] [CrossRef]
Tse, C.-B., Rodgers, T., & Niklewski, J. (2014). The 2007 financial crisis and the UK residential housing market: Did the relationship between interest rates and house prices change? Economic Modelling, 37, 518–530. [Google Scholar] [CrossRef]
Vergara-Perucich, J.-F. (2023). A systematic bibliometric analysis of the real estate bubble phenomenon: A comprehensive review of the literature from 2007 to 2022. International Journal of Financial Studies, 11(3), 106. [Google Scholar] [CrossRef]
Wetzstein, S. (2017). The global urban housing affordability crisis. Urban Studies, 54(14), 3159–3177. [Google Scholar] [CrossRef]
Whitehouse, E. J., Harvey, D. I., & Leybourne, S. J. (2025). Real-time monitoring procedures for early detection of bubbles. International Journal of Forecasting, 41(3), 1260–1277. [Google Scholar] [CrossRef]
Xu, X., & Zhang, Y. (2022). Residential housing price index forecasting via neural networks. Neural Computing and Applications, 34(17), 14763–14776. [Google Scholar] [CrossRef]
Yang, G., Yin, X., Sun, Z., Bi, P., & Ma, Q. (2024). The spillover effect of real estate boom on stock market efficiency: Evidence from China. Applied Economics, 57(24), 3164–3179. [Google Scholar] [CrossRef]
Zhang, Y., & Buyuklieva, B. (2025). Spatial cluster pattern and influencing factors of the housing market: An empirical study from the Chinese city of Shanghai. Buildings, 15(5), 708. [Google Scholar] [CrossRef]

Figure 1. Code Used in UiPath. Source: Own elaboration.

Figure 2. Data Cleaning in the Column Structure of the Files. Source: Own Elaboration.

Figure 3. Cleaning of Duplicate and Repeated Data. Source: Own Elaboration.

Figure 4. Development of K-Fold code with n = 5. Source: Own elaboration.

Figure 5. Development of cross_val_score code. Source: Own elaboration.

Figure 6. Normalized Comparison of Metrics for Models and Parameters. R2 Learning Rate and Max Depth in each model.

Figure 7. Definition of User-Input Data in the Application. Source: Own Elaboration.

Figure 8. Interactive Map Displaying the Location of Each Property. Source: Own Elaboration.

Figure 9. Interactive map showing the location of each property. Source: Own elaboration.

Figure 10. Analysis of the characteristics. Source: Own Elaboration.

Figure 11. Example of usability. Source: Own Elaboration.

Figure 12. Example of usability providing only coordinates. Source: Own Elaboration.

Table 1. Comparison of Regression Model Performance: XGBRegressor, LGBMRegressor, and HistGradientBoostingRegressor with Different Parameters.

XGB_Regressor	LGBMRegressor	HistGradientBoostingRegressor
Parameters Learning Rate: 0.02, Max Depth: 3
Mean square error (MSE): 437,327,781,957.5555 Root Mean Square Error (RMSE): 657,943.1985676086 Mean Absolute Average Error (MAE): 330,502.5669765516 Mean Absolute Percentage Error (MAPE): 64.05195464396995 Average Coefficient of Determination (R²): 0.6148389918004791	Mean square error (MSE): 436,040,579,264.73083 Root Mean Square Error (RMSE): 657,216.7087550496 Mean Absolute Average Error (MAE): 330,176.1469023515 Mean Absolute Percentage Error (MAPE): 0.6393866816953104 Average Coefficient of Determination (R²): 0.615494164449667	Mean square error (MSE): 434,481,166,208.1042 Root Mean Square Error (RMSE): 655,851.2252418908 Mean Absolute Average Error (MAE): 329,782.9046161583 Mean Absolute Percentage Error (MAPE): 0.6386068479129388 Average Coefficient of Determination (R²): 0.6173249683089861
Parameters Learning Rate: 0.02, Max Depth: 5
Mean square error (MSE): 401,089,882,502.91315 Root Mean Square Error (RMSE): 630,329.2117607286 Mean Absolute Average Error (MAE): 299,521.6134590299 Mean Absolute Percentage Error (MAPE): 0.5576029051127928 Average Coefficient of Determination (R²): 0.6464536666590949	Mean square error (MSE): 411,009,894,538.5702 Root Mean Square Error (RMSE): 637,939.4689227754 Mean Absolute Average Error (MAE): 296,062.1300118515 Mean Absolute Percentage Error (MAPE): 55.04306959932567 Average Coefficient of Determination (R²): 0.6381901433346157	Mean square error (MSE): 395,742,359,814.4373 Root Mean Square Error (RMSE): 625,531.7961160316 Mean Absolute Average Error (MAE): 295,231.9551222862 Mean Absolute Percentage Error (MAPE): 0.5479353863650582 Average Coefficient of Determination (R²): 0.6518736046237075
Parameters Learning Rate: 0.02, Max Depth: 7
Mean square error (MSE): 386,467,787,228.2837 Root Mean Square Error (RMSE): 618,377.4603887532 Mean Absolute Average Error (MAE): 291,691.43414570054 Mean Absolute Percentage Error (MAPE): 0.5580107059067501 Average Coefficient of Determination (R²): 0.6598476881642226	Mean square error (MSE): 429,692,823,430.15405 Root Mean Square Error (RMSE): 649,075.5502877138 Mean Absolute Average Error (MAE): 279,651.53318349994 Mean Absolute Percentage Error (MAPE): 51.324829172456916 Average Coefficient of Determination (R²): 0.6251781222516023	Mean square error (MSE): 381,206,610,299.6483 Root Mean Square Error (RMSE): 613,366.4628542269 Mean Absolute Average Error (MAE): 284,739.20626096963 Mean Absolute Percentage Error (MAPE): 0.5406088148067931 Average Coefficient of Determination (R²): 0.6654844513266426
Parameters Learning Rate: 0.05, Max Depth: 3
Mean square error (MSE): 379,089,610,435.84766 Root Mean Square Error (RMSE): 613,898.8096839334 Mean Absolute Average Error (MAE): 290,407.8659147612 Mean Absolute Percentage Error (MAPE): 0.43717833841561493 Average Coefficient of Determination (R²): 0.6632505464519336	(MSE): 386,569,853,182.9364 Root Mean Square Error (RMSE): 618,783.1823176598 Mean Absolute Average Error (MAE): 288,657.4749560142 Mean Absolute Percentage Error (MAPE): 43.235672273711536 Average Coefficient of Determination (R²): 0.6589985562945958	(MSE): 378,063,160,354.20746 Root Mean Square Error (RMSE): 612,300.4309475098 Mean Absolute Average Error (MAE): 286,419.53654474515 Mean Absolute Percentage Error (MAPE): 0.42998827917198384 Average Coefficient of Determination (R²): 0.6658259277181333
Parameters Learning Rate: 0.05, Max Depth: 5
Mean square error (MSE): 362,099,564,116.8421 Root Mean Square Error (RMSE): 600,056.5283480572 Mean Absolute Average Error (MAE): 268,593.9456009671 Mean Absolute Percentage Error (MAPE): 0.3719642250380258 Average Coefficient of Determination (R²): 0.678384293946424	Mean square error (MSE): 364,541,137,363.85236 Root Mean Square Error (RMSE): 600,958.0625704266 Mean Absolute Average Error (MAE): 259,604.7203724448 Mean Absolute Percentage Error (MAPE): 35.78030649892446 Average Coefficient of Determination (R²): 0.6782013827272406	Mean square error (MSE): 360,506,999,108.92737 Root Mean Square Error (RMSE): 598,231.5474839646 Mean Absolute Average Error (MAE): 261,384.55247444194 Mean Absolute Percentage Error (MAPE): 0.35856381831104694 Average Coefficient of Determination (R²): 0.680556840402599
Parameters Learning Rate: 0.05, Max Depth: 7
Mean square error (MSE): 358,102,418,553.0781 Root Mean Square Error (RMSE): 596,443.6125319558 Mean Absolute Average Error (MAE): 263,334.98314521444 Mean Absolute Percentage Error (MAPE): 0.36457479738276216 Average Coefficient of Determination (R²): 0.6826867866950237	Mean square error (MSE): 385,397,112,391.8459 Root Mean Square Error (RMSE): 614,459.1865695618 Mean Absolute Average Error (MAE): 247,418.9341862863 Mean Absolute Percentage Error (MAPE): 32.47192964250027 Average Coefficient of Determination (R²): 0.662916339162665	Learning Rate: 0.05, Max Depth: 7 Mean square error (MSE): 353,605,890,528.1876 Root Mean Square Error (RMSE): 592,031.1993021306 Mean Absolute Average Error (MAE): 253,406.49038236085 Mean Absolute Percentage Error (MAPE): 0.34244135004850396 Average Coefficient of Determination (R²): 0.6876900719421352
Parameters Learning Rate: 0.1, Max Depth: 3
Mean square error (MSE): 365,151,684,517.3063 Root Mean Square Error (RMSE): 603,009.6636422386 Mean Absolute Average Error (MAE): 281,780.2585758836 Mean Absolute Percentage Error (MAPE): 0.4150602981853167 Average Coefficient of Determination (R²): 0.6742126680784819	Mean square error (MSE): 363,131,908,682.92114 Root Mean Square Error (RMSE): 600,397.3996485502 Mean Absolute Average Error (MAE): 274,373.7720657455 Mean Absolute Percentage Error (MAPE): 40.36714866794354 Average Coefficient of Determination (R²): 0.6785935757077579	Mean square error (MSE): 358,330,160,249.951 Root Mean Square Error (RMSE): 596,877.1117985161 Mean Absolute Average Error (MAE): 274,639.79255875346 Mean Absolute Percentage Error (MAPE): 0.4060731553645008 Average Coefficient of Determination (R²): 0.6817419681214746
Parameters Learning Rate: 0.1, Max Depth: 5
Mean square error (MSE): 363,802,850,744.1417 Root Mean Square Error (RMSE): 601,976.346174561 Mean Absolute Average Error (MAE): 267,227.5604717435 Mean Absolute Percentage Error (MAPE): 0.3655855115372483 Average Coefficient of Determination (R²): 0.6754722953291219	Mean square error (MSE): 353,574,484,563.452 Root Mean Square Error (RMSE): 590,792.706251847 Mean Absolute Average Error (MAE): 251,931.48744144678 Mean Absolute Percentage Error (MAPE): 34.046403366658495 Average Coefficient of Determination (R²): 0.688873336142018	Mean square error (MSE): 357,737,186,286.3001 Root Mean Square Error (RMSE): 596,296.2949192963 Mean Absolute Average Error (MAE): 258,676.79656639433 Mean Absolute Percentage Error (MAPE): 0.3505309187565855 Average Coefficient of Determination (R²): 0.6825035289538606
Parameters Learning Rate: 0.1, Max Depth: 7
Mean square error (MSE): 363,001,803,164.3329 Root Mean Square Error (RMSE): 600,780.3468661088 Mean Absolute Average Error (MAE): 263,242.5591691387 Mean Absolute Percentage Error (MAPE): 0.35945234940377385 Average Coefficient of Determination (R²): 0.6776334579508448	Mean square error (MSE): 385,180,751,270.6857 Root Mean Square Error (RMSE): 615,143.8190166078 Mean Absolute Average Error (MAE): 244,609.0183077359 Mean Absolute Percentage Error (MAPE): 31.516165758512994 Average Coefficient of Determination (R²): 0.6623796503978164	Mean square error (MSE): 351,564,624,791.25244 Root Mean Square Error (RMSE): 590,726.7643884304 Mean Absolute Average Error (MAE): 251,454.25676581875 Mean Absolute Percentage Error (MAPE): 0.33535032411705346 Average Coefficient of Determination (R²): 0.6888671067626047

Source: Own elaboration.

Table 2. Summary of parameters and models.

Model	R2 Training	R2 Test	R2 Difference	RMSE Test (Error in Price Units)
XGBoost	0.8577	0.7015	0.156	554,062.04
LightGBM	0.7784	0.7047	0.074	551,026.33
HistGradientBoosting	0.7720	0.7036	0.068	552,090.34

Source: Own elaboration.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Forradellas, R.F.R.; Acedo Benítez, G. Development of a Predictive Tool for Real Estate Analysis Using Machine Learning Techniques. Int. J. Financial Stud. 2026, 14, 130. https://doi.org/10.3390/ijfs14050130

AMA Style

Forradellas RFR, Acedo Benítez G. Development of a Predictive Tool for Real Estate Analysis Using Machine Learning Techniques. International Journal of Financial Studies. 2026; 14(5):130. https://doi.org/10.3390/ijfs14050130

Chicago/Turabian Style

Forradellas, Ricardo Francisco Reier, and Gregorio Acedo Benítez. 2026. "Development of a Predictive Tool for Real Estate Analysis Using Machine Learning Techniques" International Journal of Financial Studies 14, no. 5: 130. https://doi.org/10.3390/ijfs14050130

APA Style

Forradellas, R. F. R., & Acedo Benítez, G. (2026). Development of a Predictive Tool for Real Estate Analysis Using Machine Learning Techniques. International Journal of Financial Studies, 14(5), 130. https://doi.org/10.3390/ijfs14050130

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of a Predictive Tool for Real Estate Analysis Using Machine Learning Techniques

Abstract

1. Introduction

2. Literature Review

2.1. The Importance of the Housing Market in the Economy

2.2. Cycles and Bubbles in the Real Estate Market: Causes and Consequences

2.3. Technological Applications in the Real Estate Market: Advantages, Disadvantages, and the Role of Artificial Intelligence

3. Materials and Methods

Selection of the Predictive Algorithm

4. Results

Results Testing

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI