Learning the Value of Place: Machine Learning Models for Real Estate Appraisal in Istanbul’s Diverse Urban Landscape

Erciyes, Ahmet Hilmi; Atasoy, Toygun; Tursun, Abdurrahman; Canaz Sevgen, Sibel

doi:10.3390/buildings15152773

Open AccessArticle

Learning the Value of Place: Machine Learning Models for Real Estate Appraisal in Istanbul’s Diverse Urban Landscape

Department of Real Estate Development and Management, Ankara University, Münzeviler st. no. 87, 06590 Ankara, Türkiye

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(15), 2773; https://doi.org/10.3390/buildings15152773

Submission received: 1 July 2025 / Revised: 2 August 2025 / Accepted: 4 August 2025 / Published: 6 August 2025

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

Download

Browse Figures

Versions Notes

Abstract

The prediction of real estate values is vital for taxation, transactions, mortgages, and urban policy development. Values can be predicted more accurately by statistical or advanced methods together when the size of the data is huge. In metropolitan cities like İstanbul, where size of the real estate data is vast and complex, mass appraisal methods supported by Machine Learning offer a scalable and consistent alternative. This study employs six algorithms: Artificial Neural Network, Extreme Gradient Boosting, K-Nearest Neighbors, Support Vector Regression, Random Forest, and Semi-Log Regression, to estimate the values of real estate on both the Asian and European continent parts of İstanbul. In total, 168,099 residential properties were utilized along with 30 of their features from both sides of the Bosphorus. The results show that RF yielded the best performance in Beşiktaş, while XGBoost performed best in Üsküdar. ANN also produced competitive results, although slightly less accurate than those of XGBoost and RF. In contrast, traditional SVR and SLR models underperformed, especially in terms of R² and RMSE values. With its large-scale dataset, focusing on one of the greatest metropolitan areas, Istanbul, and the usage of multiple ML algorithms, this study stands as a comprehensive and practical contribution to the field of automated real estate valuation.

Keywords:

mass appraisal; artificial neural network; random forest; support vector regression; XGBoost; Istanbul

1. Introduction

Housing prices are an essential indicator of a country’s economy [1], and accurately predicting real estate prices is vital for various stakeholders, from individuals making informed decisions about buying and selling real estate to real estate agents trying to optimize their investment strategies to financial institutions evaluating mortgage risks [2]. Accurately estimating real estate prices can help improve to decision-making processes, reduce financial risks, and increase efficiency in valuation, management, and investment activities [3].

Due to the importance of housing markets for national economies and the diversity of factors affecting housing prices, the literature includes price forecasting studies conducted via different methods [4]. Mass valuation is one of the housing price estimation methods and is the process of determining the value of many properties in a large area simultaneously in a systematic and orderly manner [5,6]. Unlike individual property valuation, mass appraisal involves developing a valuation model that can incorporate the effects of supply and demand over large areas [7].

Mass appraisal provides a reference and pricing basis for many actors in the real estate market, particularly real estate developers and financial institutions. It also includes information for mortgage and financing transactions to make appropriate investment decisions and is the basis for local governments to conduct quality urban planning [8]. Mass appraisal is more complex and multidimensional than individual valuation due to the need for large data sets and many different parameters [9]. However, mass appraisal can provide more consistent and accurate results. Accurate and reliable large data sets, statistical, econometric, or machine learning methods are required for mass valuation studies. Although mass appraisal does not eliminate the need for individual appraisals, it allows real estate values to be determined more quickly and reliably.

Researchers and practitioners have utilized machine learning algorithms to improve traditional hedonic pricing models, making significant progress in real estate market analysis [10]. Machine learning is a method that improves system performance by using computational methods and learning from experience [11]. Machine learning algorithms create predictive models by using parameters based on training data. Test datasets are utilized to assess the accuracy and generalizability of these models. The algorithms mentioned can be divided into various categories used in the study of housing price determination, such as regression, classification, and neural networks [12].

The use of advanced machine learning techniques in mass appraisal can contribute to obtaining the correct real estate value [13,14,15]. With such techniques, accurate forecasts can be made by learning from large data sets. Especially in cities with a dynamic and complex market structure, such as Istanbul, it can help to complete the valuation processes quickly, reliably, and while reducing risk. The use of advanced machine learning and data analytics techniques in this process can make real estate valuation practices more effective and efficient [16,17,18].

Machine learning algorithms have been frequently used in studies in the real estate literature. House price prediction is an analysis that includes various factors such as location, neighborhood, market situation, etc., and as a result of using ML methods, it has been determined that factors such as location, plot size, and the number of rooms have an impact on the value of a house [19]. The commonly used methods in the literature were determined to be Random Forest (RF), Artificial Neural Networks (ANNs), Adaptive Boosting (ADBoost), Extreme Gradient Boosting (XGBoost), K-Nearest Neighbors (KNN), Support Vector Regression (SVR), Linear Regression, and Semi-Log Regression [20].

Some authors have estimated house prices using some of these methods or compared them with classical ones [21]. Linear regression is one of the most widely used methods to estimate the impact of housing characteristics on house prices. Its simplicity and interpretability are the most important reasons for its overuse [22]. Machine learning approaches have been found to outperform traditional models due to their superior adaptation to the nonlinearities of housing market data [23]. The Random Forest algorithm has been used in a large number of studies and has been found to perform strongly. Studies have shown that the Random Forest method outperforms traditional methods such as Multiple Linear Regression [24]. The most crucial disadvantage of Artificial Neural Networks is that, due to over-parametrization, an excessive number of neurons may lead to a lack of predictive power. Nevertheless, marginal prices calculated using ANNs are more realistic than traditional methods [25]. ANNs performed very well in terms of predictive power and thus valuation accuracy, surpassing traditional Multiple Regression Analysis (MRA) and approaching the performance of spatially weighted regression approaches [26]. However, Hoxha [27] stated that, contrary to widespread belief about ANNs’ superior predictive power, ANNs’ performance is average.

The XGBoost algorithm has often been identified as a high-performing method in recent research on real estate price prediction. Studies have found that XGBoost outperforms other machine learning algorithms, including Random Forest, KNN, and SVR, in achieving high prediction accuracy [28,29]. The scalability of XGBoost due to system and algorithmic optimization demonstrates the success of this algorithm. In addition to its scalability in all scenarios, it runs faster than other existing methods [30]. However, XGBoost may experience overfitting problems as with RF [31]. With a non-parametric approach, KNN captures local patterns in the data and performs price prediction based on comparable houses. Two disadvantages of KNN are its low efficiency and its dependence on the choice of the optimal value for k [3,32]. SVR is also widely used in house price forecasting [33] and is found to produce accurate forecasts, especially in time-constrained scenarios [14]. The performance of SVR was compared with that of other algorithms, and it was found that SVR was less successful [34] and required significant computational resources for processing [35].

The strengths of this study include the comparison of six powerful ML algorithms in the context of aggregate valuation, the use of a large and comprehensive dataset, and the use of data from a unique urban environment. This study stands out from others owing to its great amount of data and the unique characteristics of the study area, which makes it original. As an inimitable place and a potential value-adding factor for the local area, proximity to the Bosphorus has been chosen as a variable in this study, in contrast to previous studies. Consequently, it is expected that this study makes a valuable and comprehensive contribution to the literature. The mass appraisal model results compared in this research may be used as a decision support system by municipalities and real estate financing institutions to determine property tax values. In Türkiye, these values are currently determined individually for each property, which is a significant issue, especially concerning public revenues.

2. Study Area and Data Description

Istanbul was chosen as the study area, which is a shelter for over 15 million people and has a great amount of housing stock. Istanbul is a transcontinental megapolis that straddles both Europe and Asia, separated by the Bosphorus Strait. It serves as the cultural, historical, and economic heart of Türkiye. Istanbul has a rich history as the capital of the Eastern Roman Empire and Ottoman Empire. Although the official capital is Ankara, Istanbul is the leader province economically, in addition to its cultural and historical importance. Istanbul has a national income level that is 59% higher than the average of Türkiye. When considering the housing sector, the significance of Istanbul is undeniable. In 2024, Istanbul constituted 16% of all total primary and resale residential property sales nationwide. Beşiktaş and Üsküdar, particularly preferable districts as the study area, accounted for 3.18% of total housing sales in Istanbul [36]. The Beşiktaş and Üsküdar districts are notable for their dual significance as major tourist destinations within Istanbul. Furthermore, they are characterized by housing prices that consistently exceed the Istanbul average, with Beşiktaş notably containing some of the city’s most expensive residential properties [37]. Therefore, these two districts were chosen as the study area since they are capable of representing İstanbul’s real estate market. Despite being divided by the Istanbul Strait, these two districts exhibit a spatial integrity in terms of their geographical positioning. Figure 1 illustrates the study area.

To estimate residential property values in the districts of Beşiktaş and Üsküdar, data were obtained from a dominant online indexing platform of Türkiye [38]. The total study area comprising Üsküdar (17.90 km²) and Beşiktaş (37.68 km²) is around 56 km². The core dataset encompasses a comprehensive range of attributes for each property, including geographic coordinates (latitude and longitude), price date (between 1 January 2019 to 15 July 2024), building typology, presence of a terrace, gross floor area, sale price, room count, living room count, bathroom count, total building floors, floor level, property age, facade orientation, and the availability of amenities such as elevators, generators, and both open and covered parking and swimming pools. Data pre-processing involved the identification and removal of outliers and anomalies. All property prices were adjusted to reflect the 15 July 2024 value using the Central Bank of Türkiye’s house price index, which was the latest available one. After adjustment and pre-processing, a dataset of 168,099 residential properties in İstanbul was assembled. During the process, 28,782 erroneous rows were eliminated due to data incompleteness, unrealistic entries, and anomalous features. In total, 69,232 properties belong to the European side (Beşiktaş District) and 98,867 properties to the Asian side (Üsküdar District). The great number of properties for both sides allowed us to complete analyses separately and comparatively. Figure 2 illustrates the spatial spread of properties within the corrected dataset.

The primary challenge in predicting house prices arises from the numerous similar structural features shared by individual properties. However, the locational and geographical characteristics of dwellings render them unique, even if they are structurally identical. Given that numerous studies on Istanbul’s housing market have substantiated the correlation between structural characteristics and property values, these characteristics were incorporated into the model [37,39,40,41,42,43,44]. Compared with structural features, locational features are more numerous [45]. Therefore, it is a challenging process to select locational variables. While structural and environmental characteristics are both correlated with property price, the selection of environmental variables in non-linear models can lead to varying outcomes [46] which emphasizes the importance of locational variable selection. Both the structural and locational characteristics of properties and price can vary depending on district or region. In this respect, understanding the characteristics of the study area is crucial.

Considering the research area and the large dataset, we decided to include only regional-level, consensus variables such as education [47,48,49,50], transportation [51,52,53,54,55], health services [50,55,56,57,58], and large green spaces [45,59,60,61] in the model to mitigate potential autocorrelation issues that might arise within sub-units [40]. To incorporate locational variables into the predictive model, a spatial analysis was conducted. Points of interest, including schools (primary and secondary), universities, metro stations, hospitals, and major roads, were geocoded by Google Maps. Additionally, major green spaces and the Bosphorus Strait, considered to influence property values, were integrated into the spatial analysis via ArcGIS. The Near Tool was used to calculate the proximity of each property to these points of interest. The Near Tool in ArcGIS calculates the Euclidean proximity of each feature in the source layer to the nearest point of feature in the target layer. To avoid errors rising from inaccuracies in proximity calculations between a property in one district and a point of interest in another, analyses were conducted separately for Beşiktaş and Üsküdar.

Table 1 presents the properties’ features employed in this study. In total, 30 features, including locational and structural, were utilized. As can be seen in the table, there were 5 types of variables: categorical, binary, dummy, integer, and double. The dependent variable was the unit price of properties. The gross area of properties varied between 50 m² and 500 m², the age of the buildings was between 0 and 60, the total number of rooms differed from 1 to 12, and the total number of floors altered between 1 and 48. There were 8 distance to points of important places features, which were expected to have a high effect on real estate values for İstanbul. The effects of the features will be also examined in the experiment part of this study.

Finally, to further investigate the features of the properties, the correlation matrix, which is a representation of the relation between features, was generated (Figure 3). According to the correlation matrix, values of real estate have a strong relation to gross area, bathroom, absence of parking area, and number of rooms, which indicates that these features have a greater effect on value. In contrast, the distances to some places, such as schools and hospitals, had less of an impact on the value.

3. Machine Learning Algorithms

In recent years, ML algorithms in many research fields have gained significant attraction from researchers due to their reliable predictive performance. Machine Learning is an advanced technique based on training computers to conduct more accurate analyses [62,63,64]. In this study, widely adopted algorithms were used to predict value from real estate data. Extreme Gradient Boosting (XGBoost), Random Forest (RF), k-Nearest Neighbors (k-NN), Support Vector Regression (SVR), Artificial Neural Network (ANN), and Semi-Log Regression algorithms were chosen to test and observe the power of the algorithms in the real estate research area.

3.1. Artificial Neural Network

ANN is an algorithm inspired and structured by the functioning of the human brain. ANN is a mathematical model of biological neurons. ANNs have three main layers, which are the input layer, one or more hidden layers, and the output layer. The input layer is the data-receiving part of the algorithm, while computations and feature transformation are carried out in the hidden layer(s). The results are placed in the output layer.

Each neuron of the networks are like those in biological brains, such as synapses, represented by weights, which determine the importance of input features; dendrites, modeled by the summation function that aggregates inputs; and the nucleus, corresponding to the activation function, which introduces non-linearity to the model and enables it to learn complex relationships [65]. Each neuron computation of an ANN is conducted as follows:

z = W x + b

(1)

where,

W is the weight vector for the neuron;
x is the input vector;
b is the bias term;
z is the result of the linear transformation.
Lastly, the activation function σ(z), where σ can be a non-linear function such as Rectified Linear Unit (ReLU), Sigmoid, or Tanh, is important for ANN algorithms. In this study, the ReLU function was used to test the algorithm.

3.2. Random Forest

The RF algorithm is one of the widely used ensemble learning methods based on decision trees. Originally, the RF algorithm [66] was created as a development of the bagging approach [67]. The main idea of the RF algorithm is to produce a large number of decision trees; therefore, the algorithm is called a forest. Each decision tree has a subset of the training data and a random subset of input features at each node split. This stochastic process enhances model diversity, reduces overfitting, and improves generalization performance. There are two critical parameters of RF, as follows: N: the number of decision trees, and m: the number of variables used for each node. The model prediction is obtained by averaging the outputs of all decision trees in the ensemble. This aggregation is mathematically expressed as:

\bar{y} = \frac{1}{M} \sum_{i = 1}^{n} h_{j} (x)

(2)

where,

$\bar{y}$ is the final predicted value;
M is the number of decision trees;
$h_{j}$ (x) is the j-th decision tree.

3.3. Extreme Gradient Boosting (XGBoost)

Extreme Gradient Boosting (XGBoost) is known as a highly regarded ML algorithm for supervised learning tasks. XGBoost is a gradient boosting method that combines weak methods such as decision trees into a powerful model [16]. This method is preferred more than other ML algorithms through high working speed and accuracy rates.

L^{(t)} = \sum_{i = 1}^{n} ⌈g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} {{(x}_{i})}^{2}⌉ + Ω (f_{t})

(3)

where,

$g_{i}$ is the first derivative (gradient) of the loss function;
$h_{i}$ is the second derivative;
$f_{t}$ ( $x_{i}$ ) is the new tree;
Ω( $f_{t}$ ) is the regularization term.

3.4. K-Nearest Neighbors (KNN)

The KNN algorithm is widely used for classification problems; however, the algorithm can be also applied in regression tasks [68]. The main idea of the algorithm is on the principle of feature similarity, predicting the value of a new data point based on the known values of its nearest neighbors in the feature space. There are three main steps in the KNN algorithm as follows: computation of the distance between the new data input and all points, determination of K closest points (neighbors), and averaging the target values of the K neighbors for final prediction. Different distance metrics are used to calculate the similarity between points, and most used distance is the Euclidean (5). The other two alternative distances are the Manhattan and Minkowski distances.

d (p, q) = \sqrt{\sum_{i = 1}^{n} {(p_{i} - q_{i})}^{2}}

(4)

where p and q are two points.

3.5. Support Vector Regression (SVR)

SVR is a regression form of Support Vector Machine (SVM). The objective and loss functions to fit continuous target variables are modified for the regression form [69]. SVR’s main idea is to identify a function that closely approximates the target variable while maintaining an acceptable error margin. The main equation of SVR to predict a linear function is:

y = w x + b

(5)

where,

w is the weight vector;
x is the input feature vector;
b is the bias term.

3.6. Semi-Logarithmic Regression

Semi-Logarithmic Regression (SLR) is a type of multiple linear regression and is a statistical analysis method that models the linear relationship between a dependent variable and multiple independent variables. This method aims to estimate the value of the dependent variable with information obtained from various independent variables. The dependent variable is expressed as the combination of the weighted sum of the independent variables and an error term as follows.

Y_{i} = β_{0} + β_{1} X_{i 1} + β_{2} X_{i 2} + \dots + β_{n} X_{i n} + ε_{i}

(6)

where

Y_{i}

is the dependent variable value of observation i,

β_{0}

is the intercept,

β_{1}

to

β_{n}

are partial slope coefficients, ε is the stochastic disturbance term, and n is the sample size.

In multiple linear regression analysis, the coefficients are usually estimated using the ordinary least squares (OLS) method. This method tries to minimize the sum of the squares of the differences between the observations and the model-predicted values. During the analysis, it is assumed that the relationship between the dependent and independent variables is linear, the error terms are normally distributed, their variances are constant, and the error terms are independent of each other.

3.7. Quality Metrics

The Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), Root-Mean-Square Error (RMSE), and Coefficient of Determination, R-squared (R²), which are the main quality control metrics for regression problems, were implemented to analyze the results of ML algorithms. The MAPE expresses accuracy as a percentage and is calculated as the average of the absolute percentage errors (7). MAE is a simple measure of the average absolute differences of predicted and observed values (8). R² shows how effectively the model’s predictions correspond to the actual data (9). The R² value changes from 0–1, and if the result is close the 1, it means that the model’s performance is reliable. RMSE is a measurement of the standard deviation of the residuals (10).

M A P E = \frac{1}{N} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}|

(7)

M A E = \frac{1}{N} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(8)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum {(y_{i} - \bar{y})}^{2}}

(9)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(10)

where n indicates the number of the data tested,

y_{i}

is the actual value,

{\hat{y}}_{i}

is the predicted value, and

\bar{y}

is the mean of all actual values.

4. Results

The ML algorithms were implemented in Python version 3.13.5 using Spyder version 5.4.3. The computer that performed analyses has 64 GB of RAM and an Intel(R) Core (TM) i7-9700K CPU of 3.60 GHz. The data from the two sides of the Bosphorus (specifically the Beşiktaş and Üsküdar neighborhoods) were tested by six leading ML algorithms. In this study, k-fold cross-validation was employed to evaluate the performance of both algorithms. The data were divided five times for cross-validation (5-fold). As mentioned before, 20% of the data were used for the testing approach, while 80% for training the algorithms. Thus, the test part of the data changed five times by cross-validation so that the whole data were included in the testing. This process led to the algorithms’ more robust evaluations.

The parameters used for the ML models were as follows:

ANN: The model was tested with different hidden layer configurations: (50, 25), (100, 50), and (150, 100) neurons. The maximum number of iterations was set to 1000, and the regularization parameter (alpha) was 0.0001, activation function; ReLU, Optimization algorithm; Adam, and the default mean squared error as the loss function;
RF: The number of trees was varied among 100, 200, and 300;
XGBoost: The number of trees was set to 100, 200, and 300, with the regularization parameter (alpha) set to 10;
KNN: The number of neighbors was tested with values of 3, 5, and 10;
SVR: Three combinations of training error and tolerance were tested: (1, 0.1), (10, 0.5), and (100, 0.1).

Each set of parameters was tested individually, and the results are presented in Table 2. Furthermore, highly correlated parameters, such as indoor and outdoor swimming pools, pier distance, and distances to other important locations, were removed and tested separately to observe their impact. However, it was found that excluding these correlated parameters did not improve the accuracy of the results.

The comparison of six ML algorithms for the Beşiktaş and Üsküdar parts of İstanbul revealed significant differences in quality control metrics. According to the results, the RF algorithm outperformed compared with the other five ML algorithms. The performance of various machine learning models was evaluated using different parameter configurations, as shown in Table 2. Overall, RF and XGBoost demonstrated the best predictive performance for both Beşiktaş and Üsküdar, achieving higher R² values and lower MAE, MAPE, and RMSE scores compared with the other models. Specifically, RF consistently yielded R² values around 0.80 for Beşiktaş and 0.78 for Üsküdar, while those for XGBoost reached up to 0.77 and 0.75, respectively. ANN showed competitive results, especially with larger hidden layer configurations (150, 100), but slightly underperformed compared with RF and XGBoost. KNN and SVR generally produced lower R² values and higher errors, with SVR performing particularly poorly under some parameter settings. SLR had the weakest performance among all models, with the lowest R² and highest RMSE values, indicating its inadequacy for capturing the complexity of the data. These findings suggest that ensemble methods like RF and XGBoost are more suitable for predicting real estate-related values in this context.

Figure 4 shows the graphs of the real and estimated values resulting from the analysis of the data from the Beşiktaş and Üsküdar districts. The study conducted for the Beşiktaş region showed that the XGBoost and RF models produced results that were quite close to the real values. XGBoost exhibited a strong generalization capability, especially on data points with variability, while RF successfully learned the data patterns and provided highly accurate predictions. The Random Forest method also showed a strong performance for Beşiktaş; the model produced balanced predictions by avoiding over-learning with the advantage of the ensemble structure. It is noteworthy that RF reached a similar accuracy level to XGBoost at some points. The KNN algorithm showed irregularities in the prediction values and was particularly inadequate in capturing extreme values. The SVR model showed relatively more deviations in Beşiktaş, suggesting that these models had difficulty in accurately reflecting the complex urban structure dynamics. The semi-log regression model showed low success due to its parametric nature.

The prediction performances of the machine learning algorithms applied in the Üsküdar district are generally similar to the findings from the Beşiktaş district. The Random Forest and XGBoost models provided higher prediction accuracy in the Üsküdar district than the others. These models have a superior ability to distinguish essential variables in complex and multidimensional data structures.

In particular, the XGBoost algorithm stands out, with the low deviation and high stability observed in the graphs. The ANN and SVR models performed moderately well in this district, with partial deviations in the prediction results. The KNN model produced unsuccessful forecasts, as in Beşiktaş, especially when the data points were sparse. The semi-log regression model was also the lowest-performing method for the Üsküdar district.

The analysis results revealed that machine-learning-based models (especially XGBoost, ANN, and Random Forest) provided higher accuracy than classical regression methods in house value estimation. Ensemble algorithms such as Random Forest and XGBoost stand out in modeling non-linear relationships in the data and offer strong generalization capacities across different spatial structures. This supports the preference for using such models for complex problems such as housing valuation in urban areas.

The level of importance of the variables affecting house prices according to the data for Üsküdar and Beşiktaş districts are shown in Figure 5. According to the results of the XGBoost algorithm, Estate Housing is the independent variable that affects house prices the most. Due to the advantages of access to many services such as security, cleaning, and maintenance, it is considered to be the most important variable affecting the housing price. Other variables affecting the price of the housing are, in order of importance, indoor swimming pool, number of bathrooms, age of the house, outdoor swimming pool, and distance to the Bosphorus. The presence of a swimming pool in the complex or apartment building in which the house is located, especially an indoor swimming pool that can be used in all seasons, is among the variables that affect value. It is noteworthy that the prices of houses close to the Bosphorus in Üsküdar district are also significantly higher. In addition, some variables such as frontage, distance to schools, and living room do not affect house prices much. According to the results obtained from the XGBoost algorithm, it is understood that the presence of an indoor swimming pool affects house prices the most. After the presence of an indoor swimming pool, the other variables affecting house prices are housing complex, the presence of an outdoor swimming pool, and the number of bathrooms. In addition, some variables such as frontage and distance to green areas do not affect house prices much. A significant finding for both study areas is that the facade feature of real estate is the least correlated with price.

In this part of the study, a value map of the two sides of the Bosphorus was created and is shown in Figure 6. The value map shows the spatial distribution of housing values. The map shows that the houses close to the Bosphorus are priced higher than the ones inland. In addition, in both Üsküdar and Beşiktaş districts, the TEM (Trans-European North–South Motorway) highway route from the Bosphorus to the interior is also identified as a high-priced area. In Üsküdar district, being close to the Bosphorus has a greater impact than in Beşiktaş district, and the value map shows that the region with important business centers in the inner parts of Beşiktaş district also has high housing values.

5. Discussion

This study focused on the estimation of the residential real estate prices of both sides of the Bosphorus, the Beşiktaş and Üsküdar districts of İstanbul. İstanbul’s real estate market stands out for its exceptional diversity as a trans-continental metropolis bridging Europe and Asia. The city’s property dynamics offer a living laboratory for understanding how geography, history, and culture converge to shape real estate value. Despite the two districts’ common characteristics, such as being densely populated urban areas, high house prices, and view of the Bosphorus, Beşiktaş and Üsküdar differ in terms of continental location and socio-economic profiles.

A dataset of total 168,099 properties was employed for analysis. The ANN, K-NN, RF, XGBoost, SVR, and SLR algorithms were implemented for each district individually. The analyses were conducted using 21 structural and 9 locational features, including the proximity of the Bosphorus. The correlation matrix of data demonstrated that residence prices were more correlated with gross area, number of bathrooms, absence of swimming pools, and car-parking facilities. However, variables such as building facade orientation, distance to the pier and hospitals, and number of living rooms were not correlated with the unit price. It is also noteworthy that the variables highly correlated with price are commonly found amenities in mass residential complexes. This is supported by the correlation between the variables of outdoor–indoor swimming pools, individual unit parking lots, and the status of being part of a residential complex.

When comparing the performance of the ML algorithms, RF demonstrated slightly higher accuracy than XGBoost. However, XGBoost showed superior performance in terms of processing speed and outlier handling. Although both algorithms are appropriate for mass appraisal practices, their substantial data requirement emphasizes the importance of data processing speed. As huge cities create various and numerous data every single moment, working with big data becomes more important. With increasing transparency and accessibility of city data sources, artificial intelligence (AI) based real estate valuation models are expected to gain wider applications and yield effective results.

The findings of this study contribute significantly to the understanding of how geographical factors, particularly unique ones like the Bosphorus, influence real estate prices in urban areas with distinct socio-economic profiles, despite their geographical proximity. By analyzing the real estate dynamics on both sides of the Bosphorus, this research offers valuable insights for future studies and real estate professionals alike. It highlights the importance of incorporating both structural and locational features into pricing models, underlining that, even within highly similar districts, real estate values can diverge due to the varying impact of local geographic elements.

Finally, this study indicates that effective real estate value predictions can be made even in highly dense urban areas through AI-based algorithms despite the complex and volatile nature of housing markets. Consistent with previous research, both structural and locational characteristics were found to be influential in determining house prices. Furthermore, this study provided valuable findings regarding the influence of globally unique local geographic variables like Bosphorus on housing prices.

Our research findings confirm that, consistent with previous studies, ML algorithms provide more accurate results than linear models for real estate valuation. Although ML algorithms produced varying results across two distinct, adjacent areas, XGBoost and RF were observed to yield more accurate outcomes, which is also in line with the existing literature. Another significant finding of our study is that, to further increase the accuracy of predictive models, it is crucial not only to increase the amount of data, but also to effectively classify it.

Consequently, it was observed that the impact of variables influencing housing prices varies even between administratively and geographically similar regions highlights that two sides of the same coin can be quite different. Although studies of the real estate markets differentiate themselves through their study area and variable selection, this study added an additional layer of originality by comparing very significant locational places, the Bosphorus’ two sides, which are even in different continents, and the huge size of the data sets. It is expected that this study will serve as a valuable resource for future research, enabling the assessment of the influence of unique geographical factors on real estate prices and comparative analyses between geographically linked areas.

Author Contributions

Conceptualization, A.H.E.; Methodology, S.C.S.; Software, S.C.S.; Validation, T.A. and A.T.; Formal analysis, T.A.; Data curation, A.H.E. and A.T.; Writing—original draft, A.H.E. and T.A.; Writing—review & editing, A.T. and S.C.S.; Visualization, A.H.E. and S.C.S.; Supervision, A.T. and S.C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data processed for the research cannot be shared due to legal reasons. However, the spatial data produced by the authors will be available.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADBoost	Adaptive Boosting
ANN	Artificial Neural Networks
KNN	K-Nearest Neighbors
ML	Machine Learning
RF	Random Forest
ReLU	Rectified Linear Unit
SLR	Semi-Log Regression
SVR	Support Vector Regression
TEM	Trans-European North–South Motorway
XGBoost	Extreme Gradient Boosting

References

El Mouna, L.; Silkan, H.; Haynf, Y.; Nann, M.F.; Tekouabou, S.C. A comparative study of urban house price prediction using machine learning algorithms. Proc. E3S Web Conf. 2023, 418, 03001. [Google Scholar] [CrossRef]
Park, B.; Bae, J.K. Using machine learning algorithms for housing price prediction: The case of Fairfax County, Virginia housing data. Expert Syst. Appl. 2015, 42, 2928–2934. [Google Scholar] [CrossRef]
Choy, L.H.; Ho, W.K. The use of machine learning in real estate research. Land 2023, 12, 740. [Google Scholar] [CrossRef]
Burhan, H.A. Konut fiyatları tahmininde makine öğrenmesi sınıflandırma algoritmalarının kullanılması: Kütahya kent merkezi örneği. Dumlupınar Üniv. Sos. Bilim. Derg. 2023, 76, 221–237. [Google Scholar] [CrossRef]
Erdem, N. Toplu (küme) değerleme uygulama örnekleri ve ülkemiz için öneriler. In Proceedings of the 16th Turkish Surveying Scientific and Technical Congress (TMMOB Harita ve Kadastro Mühendisleri Odası), Ankara, Turkey, 3–6 May 2017; pp. 3–6. [Google Scholar]
Standard on Mass Appraisal of Real Property; International Association of Assessing Officers: Kansas City, MO, USA, 2011. Available online: https://www.pinal.gov/DocumentCenter/View/6894/IAAO-Standard-on-Mass-Appraisal-of-Real-Property-PDF (accessed on 10 December 2024).
Sladić, D.; Radulović, A.; Govedarica, M. Mass Property Valuation in Serbia. In Proceedings of the 11th International Workshop on the Land Administration Domain Model and 3D Land Administration, Gävle, Sweden, 11–13 October 2023. [Google Scholar]
Zhao, Y.; Shen, X.; Ma, J.; Yu, M. Path selection of spatial econometric model for mass appraisal of real estate: Evidence from Yinchuan, China. Int. J. Strateg. Prop. Manag. 2023, 27, 304–316. [Google Scholar] [CrossRef]
Bourassa, S.C.; Hoesli, M.; Merlin, L.; Renne, J. Big data, accessibility and urban house prices. Urban Stud. 2021, 58, 3176–3195. [Google Scholar] [CrossRef]
Potrawa, T.; Tetereva, A. How much is the view from the window worth? Machine learning-driven hedonic pricing model of the real estate market. J. Bus. Res. 2022, 144, 50–65. [Google Scholar] [CrossRef]
Zhou, Z.-H. Machine Learning; Springer Nature: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
Jui, J.J.; Imran Molla, M.; Bari, B.S.; Rashid, M.; Hasan, M.J. Flat price prediction using linear and random forest regression based on machine learning techniques. In Embracing Industry 4.0: Selected Articles from MUCET 2019; Springer: Berlin/Heidelberg, Germany, 2020; pp. 205–217. [Google Scholar]
Aydınoğlu, A.Ç.; Bovkır, R.; Çölkesen, İ. Toplu taşınmaz değerlemede makine öğrenme algoritmalarının kullanımı ve konumsal/konumsal olmayan özniteliklerin tahmin doğruluğuna etkilerinin karşılaştırılması. Jeodezi Jeoinformasyon Derg. 2023, 10, 63–83. [Google Scholar] [CrossRef]
Ho, W.K.; Tang, B.-S.; Wong, S.W. Predicting property prices with machine learning algorithms. J. Prop. Res. 2021, 38, 48–70. [Google Scholar] [CrossRef]
Zilli, C.A.; Bastos, L.C.; da Silva, L.R. Machine learning models in mass appraisal for property tax purposes: A systematic mapping study. Aestimum 2024, 84, 31–52. [Google Scholar] [CrossRef]
Almaslukh, B. A gradient boosting method for effective prediction of housing prices in complex real estate systems. In Proceedings of the 2020 International Conference on Technologies and Applications of Artificial Intelligence (TAAI), Taipei, Taiwan, 3–5 December 2020; pp. 217–222. [Google Scholar]
Barns, S. Out of the loop? On the radical and the routine in urban big data. Urban Stud. 2021, 58, 3203–3210. [Google Scholar] [CrossRef]
Iwai, K.; Hamagami, T. A New XGBoost Inference with Boundary Conditions in Real Estate Price Prediction. IEEJ Trans. Electr. Electron. Eng. 2022, 17, 1613–1619. [Google Scholar] [CrossRef]
Ja’afar, N.S.; Mohamad, J. Application of machine learning in analysing historical and non-historical characteristics of heritage pre-war shophouses. Plan. Malays. 2021, 19, 72–84. [Google Scholar] [CrossRef]
Sevgen, S.C.; Aliefendioğlu, Y. Mass apprasial with a machine learning algorithm: Random forest regression. Bilişim Teknol. Derg. 2020, 13, 301–311. [Google Scholar] [CrossRef]
Abidoye, R.B.; Chan, A.P. Improving property valuation accuracy: A comparison of hedonic pricing model and artificial neural network. Pac. Rim Prop. Res. J. 2018, 24, 71–83. [Google Scholar] [CrossRef]
Schernthanner, H.; Asche, H.; Gonschorek, J.; Scheele, L. Spatial modeling and geovisualization of rental prices for real estate portals. In Cognitive Analytics: Concepts, Methodologies, Tools, and Applications; IGI Global: Hershey, PA, USA, 2020; pp. 962–977. [Google Scholar]
Mora-Garcia, R.-T.; Cespedes-Lopez, M.-F.; Perez-Sanchez, V.R. Housing price prediction using machine learning algorithms in COVID-19 times. Land 2022, 11, 2100. [Google Scholar] [CrossRef]
Dureh, N.; Ueranantasan, A.; Eso, M. A comparison of multiple linear regression and random forest for community concern of youth and young adults survey. Methods 2018, 44, 481–487. [Google Scholar]
Rampini, L.; Re Cecconi, F. Artificial intelligence algorithms to predict Italian real estate market prices. J. Prop. Invest. Financ. 2022, 40, 588–611. [Google Scholar] [CrossRef]
McCluskey, W.J.; McCord, M.; Davis, P.T.; Haran, M.; McIlhatton, D. Prediction accuracy in mass appraisal: A comparison of modern approaches. J. Prop. Res. 2013, 30, 239–265. [Google Scholar] [CrossRef]
Hoxha, V. Exploring the predictive power of ANN and traditional regression models in real estate pricing: Evidence from Prishtina. J. Prop. Invest. Financ. 2024, 42, 134–150. [Google Scholar] [CrossRef]
Grybauskas, A.; Pilinkienė, V.; Stundžienė, A. Predictive analytics using Big Data for the real estate market during the COVID-19 pandemic. J. Big Data 2021, 8, 105. [Google Scholar] [CrossRef]
Karamanou, A.; Kalampokis, E.; Tarabanis, K. Linked open government data to predict and explain house prices: The case of Scottish statistics portal. Big Data Res. 2022, 30, 100355. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Wang, Y.; Ni, X.S. A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization. arXiv 2019, arXiv:1901.08433. [Google Scholar] [CrossRef]
Babu, A.; Chandran, A.S. Literature review on real estate value prediction using machine learning. Int. J. Comput. Sci. Mob. Appl. 2019, 7, 8–15. [Google Scholar]
Dha, T. A Literature Review on Using Machine Learning Algorithm to Predict House Prices. Int. Res. J. Adv. Sci. Hub 2023, 5, 132–137. [Google Scholar] [CrossRef]
Pow, N.; Janulewicz, E.; Liu, L. Applied Machine Learning Project 4 Prediction of Real Estate Property Prices in Montréal; Course Project, COMP-598, Fall/2014; McGill University: Montreal, QC, Canada, 2014. [Google Scholar]
Wang, H.; Hu, D. Comparison of SVM and LS-SVM for regression. In Proceedings of the 2005 International Conference on Neural Networks and Brain, Beijing, China, 13–15 October 2005; pp. 279–283. [Google Scholar]
Turkstat. Construction and Housing Statistics. Yearly 2024. Available online: https://data.tuik.gov.tr/Kategori/GetKategori?p=Insaat-ve-Konut-116 (accessed on 3 January 2025).
Taşabat, S.E.; Ersen, M. House Price Prediction: A Case Study for Istanbul. In Industry 4.0 and the Digital Transformation of International Business; Springer: Berlin/Heidelberg, Germany, 2023; pp. 233–250. [Google Scholar]
Endeksa. Available online: https://www.endeksa.com/en/ (accessed on 12 January 2025).
Acar, T. Determining housing prices using the semiparametric estimation within the hedonic price model framework: Case study of istanbul housing market example. Ekon. Polit. Finans. Araştırmaları Derg. 2020, 5, 561–575. [Google Scholar]
Arslanlı, K.Y. Analysis of house prices: A hedonic model proposal for Istanbul metropolitan area. J. Des. Resil. Archit. Plan. 2020, 1, 57–68. [Google Scholar] [CrossRef]
Bekar, E.; Çağlayan Akay, E. Modelling housing prices in Istanbul applying the spatial quantile regression. Empir. Econ. Lett. 2014, 8, 863–869. [Google Scholar]
Keskin, B. Hedonic analysis of price in the Istanbul housing market. Int. J. Strateg. Prop. Manag. 2008, 12, 125–138. [Google Scholar] [CrossRef]
Sisman, S.; Aydinoglu, A.C. A modelling approach with geographically weighted regression methods for determining geographic variation and influencing factors in housing price: A case in Istanbul. Land Use Policy 2022, 119, 106183. [Google Scholar] [CrossRef]
Tekin, M.; Sari, I.U. Real estate market price prediction model of Istanbul. Real Estate Manag. Valuat. 2022, 30, 1–16. [Google Scholar] [CrossRef]
Lorenz, F.; Willwersch, J.; Cajias, M.; Fuerst, F. Interpretable machine learning for real estate market analysis. Real Estate Econ. 2023, 51, 1178–1208. [Google Scholar] [CrossRef]
Din, A.; Hoesli, M.; Bender, A. Environmental variables and real estate prices. Urban Stud. 2001, 38, 1989–2000. [Google Scholar] [CrossRef]
Kang, Y.; Zhang, F.; Peng, W.; Gao, S.; Rao, J.; Duarte, F.; Ratti, C. Understanding house price appreciation using multi-source big geo-data and machine learning. Land Use Policy 2021, 111, 104919. [Google Scholar] [CrossRef]
Peng, Y.; Tian, C.; Wen, H. How does school district adjustment affect housing prices: An empirical investigation from Hangzhou, China. China Econ. Rev. 2021, 69, 101683. [Google Scholar] [CrossRef]
Zulkifley, N.H.; Rahman, S.A.; Ubaidullah, N.H.; Ibrahim, I. House price prediction using a machine learning model: A survey of literature. Int. J. Mod. Educ. Comput. Sci. 2020, 12, 46–54. [Google Scholar] [CrossRef]
Toprak, M.F.; Güngör, O. Kayseri’de çoklu regresyon ve coğrafi ağırlıklı regresyon yöntemleri ile konutların toplu değerlemesi. Türk. Uzak. Algılama CBS Derg. 2023, 4, 114–124. [Google Scholar] [CrossRef]
Hyun, D. Still prefer, but not that much: The premium of subway access on house prices during the COVID-19 pandemic period. Int. J. Urban Sci. 2024, 28, 675–700. [Google Scholar] [CrossRef]
Lieske, S.N.; van den Nouwelant, R.; Han, J.H.; Pettit, C. A novel hedonic price modelling approach for estimating the impact of transportation infrastructure on property prices. Urban Stud. 2021, 58, 182–202. [Google Scholar] [CrossRef]
Liu, X.; Jiang, C.; Wang, F.; Yao, S. The impact of high-speed railway on urban housing prices in China: A network accessibility perspective. Transp. Res. Part A Policy Pract. 2021, 152, 84–99. [Google Scholar] [CrossRef]
Yang, L.; Chu, X.; Gou, Z.; Yang, H.; Lu, Y.; Huang, W. Accessibility and proximity effects of bus rapid transit on housing prices: Heterogeneity across price quantiles and space. J. Transp. Geogr. 2020, 88, 102850. [Google Scholar] [CrossRef]
Zhang, X.; Zheng, Y.; Sun, L.; Dai, Q. Urban Structure, Subway Systemand Housing Price: Evidence from Beijing and Hangzhou, China. Sustainability 2019, 11, 669. [Google Scholar] [CrossRef]
Nyanda, F. The effect of proximity and spatial dependence on the house price index for Dar es Salaam. Int. J. Hous. Mark. Anal. 2024, 17, 945–963. [Google Scholar] [CrossRef]
Peng, T.-C.; Chiang, Y.-H. The non-linearity of hospitals’ proximity on property prices: Experiences from Taipei, Taiwan. J. Prop. Res. 2015, 32, 341–361. [Google Scholar] [CrossRef]
Zheng, Z.; Yu, S.; Li, M.; Zhang, K.; Zhu, M.; He, Y.; Peng, Q. Mass Appraisal of Urban Housing Based on GIS and Deep Learning; SSRN 4510568; Elsevier: Amsterdam, The Netherlands, 2023. [Google Scholar]
Conway, D.; Li, C.Q.; Wolch, J.; Kahle, C.; Jerrett, M. A spatial autocorrelation approach for examining the effects of urban greenspace on residential property values. J. Real Estate Financ. Econ. 2010, 41, 150–169. [Google Scholar] [CrossRef]
Daams, M.N.; Sijtsma, F.J.; Veneri, P. Mixed monetary and non-monetary valuation of attractive urban green space: A case study using Amsterdam house prices. Ecol. Econ. 2019, 166, 106430. [Google Scholar] [CrossRef]
McCord, M.; McCord, J.; Lo, D.; Brown, L.; MacIntyre, S.; Squires, G. The value of green and blue space: Walkability and house prices. Cities 2024, 154, 105377. [Google Scholar] [CrossRef]
Janiesch, C.; Zschech, P.; Heinrich, K. Machine learning and deep learning. Electron. Mark. 2021, 31, 685–695. [Google Scholar] [CrossRef]
Mahesh, B. Machine learning algorithms-a review. Int. J. Sci. Res. (IJSR) 2020, 9, 381–386. [Google Scholar] [CrossRef]
Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef] [PubMed]
Goodarzi, M.; Deshpande, S.; Murugesan, V.; Katti, S.B.; Prabhakar, Y.S. Is feature selection essential for ANN modeling? QSAR Comb. Sci. 2009, 28, 1487–1499. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Zhang, S.; Li, X.; Zong, M.; Zhu, X.; Cheng, D. Learning k for knn classification. ACM Trans. Intell. Syst. Technol. 2017, 8, 1–19. [Google Scholar] [CrossRef]
Gunn, S.R. Support Vector Machines for Classification and Regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton. 1997. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=91f6251257ab1642bf5742244a93da3a57a64b63 (accessed on 13 December 2024).

Figure 1. Study area, two sides of İstanbul.

Figure 2. Location of points of interests and the residential properties.

Figure 3. Correlation matrix for residential properties’ features.

Figure 4. Actual and predicted values for Beşiktaş and Üsküdar sides.

Figure 5. Feature importance for both sides of İstanbul.

Figure 6. Value map of both sides of the Bosphorus.

Table 1. Description of variables.

Variable	Unit	Data Type	Data Class	Definition
Property_type	Dummy	Categorical	Structural	Detached House (1) or Flat (2)
ExtraArea	Dummy	Categorical	Structural	Does the property have a terrace? Yes (1) or No (0)
GrossArea	M²	Continuous	Structural	Area of property
Room	Count	Discrete	Structural	Number of room(s)
LivingRoom	Count	Discrete	Structural	Number of living room(s)
Bathroom	Count	Discrete	Structural	Number of bathroom(s)
TotalRooms	Count	Discrete	Structural	Number of total room(s)—[Room + Living Room]
FloorCount	Count	Discrete	Structural	Number of floors of the apartment or house
FloorNumbe	Number	Discrete	Structural	Floor level of property
Age	Number	Continuous	Structural	The age of the building
FrontageNo	Dummy	Categorical	Structural	Facing north? Yes (1) or No (0)
FrontageSo	Dummy	Categorical	Structural	Facing south? Yes (1) or No (0)
FrontageEa	Dummy	Categorical	Structural	Facing east? Yes (1) or No (0)
FrontageWe	Dummy	Categorical	Structural	Facing west? Yes (1) or No (0)
EstateHousing	Dummy	Categorical	Structural	In a building complex? Yes (1) or No (0)
Elevator	Dummy	Categorical	Structural	Elevator existence Yes (1) or No (0)
ParkingAreaOutdoor	Dummy	Categorical	Structural	Outdoor parking existence Yes (1) or No (0)
ParkingAreaIndoor	Dummy	Categorical	Structural	Indoor parking existence Yes (1) or No (0)
SwimmingPoolOutdoor	Dummy	Categorical	Structural	Outdoor swimming pool Yes (1) or No (0)
SwimmingPoolIndoor	Dummy	Categorical	Structural	Indoor swimming pool Yes (1) or No (0)
Heating	Dummy	Categorical	Structural	Heating type of the property. None (1), Coil Stove (2), Natural Gas Stove (3), Floor Heater (4), Combi Boiler (5), Central Heating System (6), Heat Cost Allocator (7), Electric Heating (8), Air Conditioning (9), Geothermal Energy (10), Solar Energy (11), Other (12)
DistrictId	Dummy	Categorical	Locational	Administrative neighborhood of property
school_dist	Meter	Continuous	Locational	Distance to the nearest school (elementary and secondary)
hosp_dist	Meter	Continuous	Locational	Distance to the nearest hospital
pier_dis	Meter	Continuous	Locational	Distance to the nearest public sea transportation pier
mainroad_dist	Meter	Continuous	Locational	Distance to the nearest main road
bosp_dist	Meter	Continuous	Locational	Distance to the Bosphorus
green_dist	Meter	Continuous	Locational	Distance to the nearest green area
metro_dist	Meter	Continuous	Locational	Distance to the nearest metro station
uni_dist	Meter	Continuous	Locational	Distance to the nearest university

Table 2. Results of the ML algorithms for Beşiktaş and Üsküdar.

		Parameters	MAE	MAPE	RMSE	R²
ANN	Beşiktaş	Hidden layers # of neurons: 50, 25	42,711.97	29.35%	64,962.35	0.72
		Hidden layers # of neurons: 100, 50	41,364.28	28.58%	62,956.46	0.74
		Hidden layers # of neurons: 150, 100	40,810.38	28.09%	62,394.98	0.74
		Hidden layers # of neurons: 150, 100 *	44,380.95	30.89%	67,352.54	0.70
		Hidden layers # of neurons: 150, 100 **	41,426.92	28.55%	63,145.35	0.73
	Üsküdar	Hidden layers # of neurons: 50, 25	17,499.49	25.19%	30,413.27	0.69
		Hidden layers # of neurons: 100, 50	16,919.88	24.39%	29,343.15	0.71
		Hidden layers # of neurons: 150, 100	17,278.90	24.76%	29,180.05	0.72
		Hidden layers # of neurons: 150, 100 *	16,965.53	24.24%	29,731.69	0.71
		Hidden layers # of neurons: 150, 100 **	17,267.53	24.93%	29,527.43	0.71
RF	Beşiktaş	Number of trees:100	3351.93	22.84%	54,566.22	0.80
		Number of trees:200	33,452.32	22.78%	54,403.61	0.80
		Number of trees:300	33,408.08	22.75%	54,341.15	0.80
		Number of trees:300 *	37,345.59	25.69%	59,815.44	0.76
		Number of trees:300 **	33,339.37	22.70%	54,219.45	0.80
	Üsküdar	Number of trees:100	13,670.24	19.20%	25,733.62	0.78
		Number of trees:200	13,615.34	19.12%	25,627.88	0.78
		Number of trees:300	13,599.35	19.10%	25,601.47	0.78
		Number of trees:300 *	14,092.31	19.81%	26,407.75	0.77
		Number of trees:300 **	13,680.01	19.24%	25,767.98	0.78
XGBoost	Beşiktaş	Number of trees:100	41,141.41	28.33%	62,452.78	0.74
		Number of trees:200	39,590.15	27.23%	60,440,00	0.76
		Number of trees:300	38,782.90	26.66%	59,351.94	0.77
		Number of trees:100 *	43,146.35	30.18%	65,218.29	0.72
		Number of trees:100 **	38,923.88	26.78%	59,489.30	0.76
	Üsküdar	Number of trees:100	6704.12	24.03%	29,183.04	0.72
		Number of trees:200	15,950.74	22.78%	28,036.05	0.74
		Number of trees:300	15,608.99	22.25%	27,481.90	0.75
		Number of trees:300 *	16,031.30	22.87%	28,161.56	0.74
		Number of trees:300 **	15,844.88	22.61%	27,807.46	0.74
KNN	Beşiktaş	# of nearest neighbors:3	40,536.86	26.92%	66,045.35	0.71
		# of nearest neighbors:5	40,229.05	26.78%	64,633.47	0.72
		# of nearest neighbors:10	40,794.10	27.25%	64,694.79	0.72
		# of nearest neighbors:10 *	42,441.96	28.59%	67,154.52	0.70
		# of nearest neighbors:10 **	41,140.30	27.45%	65,246.79	0.72
	Üsküdar	# of nearest neighbors:3	18,324.96	25.65%	33,528.64	0.63
		# of nearest neighbors:5	18,199.75	25.52%	33,045.64	0.64
		# of nearest neighbors:10	18,309.47	25.74%	33,080.80	0.64
		# of nearest neighbors:10 *	17,143.19	23.32%	32,188.65	0.66
		# of nearest neighbors:10 **	18,761.38	26.46%	33,714.08	0.62
SVR	Beşiktaş	training error and tolerance: 1, 0.1	86,484.50	55.21%	129,376.40	−0.11
		training error and tolerance: 100, 0.1	59,428.14	36.00%	94,623.36	0.40
		training error and tolerance: 10, 0.5	76,656.99	46.26%	119,264.70	0.05
		training error and tolerance: 100, 0.1 *	57,649.46	35.03%	91,846.83	0.44
		training error and tolerance: 100, 0.1 **	59,458.12	36.12%	94,552.14	0.41
	Üsküdar	training error and tolerance: 1, 0.1	31,421.64	38.00%	56,665.68	−0.07
		training error and tolerance: 100, 0.1	23,043.50	28.65%	44,559.86	0.34
		training error and tolerance: 10, 0.5	27,272.26	32.89%	51,451.84	0.12
		training error and tolerance: 100, 0.1 *	22,810.92	28.24%	44,171.73	0.35
		training error and tolerance: 100, 0.1 **	23,215.81	28.89%	44,699.73	0.34
SLR	Beşiktaş		54,097.47	33.89%	85,542.16	0.51
		*	58,231.77	37.32%	176,142.40	0.45
		**	55,135.33	34.63%	181,093.75	0.50
	Üsküdar		22,710.96	29.95%	41,466.70	0.43
		*	23,440.78	30.63%	74,454.11	0.39
		**	23,001.77	30.21%	75,308.68	0.42

* Results calculated by excluding variables with feature importance rate lower than 0.02. ** Results calculated by excluding variables with strong correlation.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Erciyes, A.H.; Atasoy, T.; Tursun, A.; Canaz Sevgen, S. Learning the Value of Place: Machine Learning Models for Real Estate Appraisal in Istanbul’s Diverse Urban Landscape. Buildings 2025, 15, 2773. https://doi.org/10.3390/buildings15152773

AMA Style

Erciyes AH, Atasoy T, Tursun A, Canaz Sevgen S. Learning the Value of Place: Machine Learning Models for Real Estate Appraisal in Istanbul’s Diverse Urban Landscape. Buildings. 2025; 15(15):2773. https://doi.org/10.3390/buildings15152773

Chicago/Turabian Style

Erciyes, Ahmet Hilmi, Toygun Atasoy, Abdurrahman Tursun, and Sibel Canaz Sevgen. 2025. "Learning the Value of Place: Machine Learning Models for Real Estate Appraisal in Istanbul’s Diverse Urban Landscape" Buildings 15, no. 15: 2773. https://doi.org/10.3390/buildings15152773

APA Style

Erciyes, A. H., Atasoy, T., Tursun, A., & Canaz Sevgen, S. (2025). Learning the Value of Place: Machine Learning Models for Real Estate Appraisal in Istanbul’s Diverse Urban Landscape. Buildings, 15(15), 2773. https://doi.org/10.3390/buildings15152773

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Learning the Value of Place: Machine Learning Models for Real Estate Appraisal in Istanbul’s Diverse Urban Landscape

Abstract

1. Introduction

2. Study Area and Data Description

3. Machine Learning Algorithms

3.1. Artificial Neural Network

3.2. Random Forest

3.3. Extreme Gradient Boosting (XGBoost)

3.4. K-Nearest Neighbors (KNN)

3.5. Support Vector Regression (SVR)

3.6. Semi-Logarithmic Regression

3.7. Quality Metrics

4. Results

5. Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI