1. Introduction
Conducting property valuation on a large scale—sometimes called a massive valuation—using inferential statistics or AI approaches presents several challenges, especially when considering location effects in hedonic pricing models. Indeed, in property valuation, a common barrier is measuring location effects for a larger sample. This becomes problematic when measurements are based on personal judgment, without a defined scheme, making collective evaluations and updates difficult. Therefore, valuers should seek more objective variables that reduce personal interference, developing robust measurements. By doing so, they can improve the quality of their valuations and provide clear justifications for the results obtained, which is of paramount importance in the context of property tax appraisals or legal cases.
The importance of location in the real estate market is a well-known factor. In simple terms, the quality of location can be divided into two parts: accessibility (as a “macro”-location, at the city level) and neighborhood (a “micro”-level, regarding the quality of the surroundings of each property) [
1,
2,
3,
4,
5,
6,
7,
8].
However, in valuation practice, measuring location effects poses difficulties. Since accessibility and neighborhood quality lack standardized, ready-to-use measures, proxy variables are often used.
In the first case, accessibility is often explored by considering distances to key points in the city, such as the commercial-historical center, public transport hubs, schools, shopping centers, and parks, among other elements [
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28]. Traditional models for analyzing urban areas typically consider a central point of attraction, known as the central business district (CBD). These models assume that the CBD concentrates essential urban functions and most employment opportunities [
5,
6]. While this approach is usually appropriate for small cities, it can be overly simplistic for larger cities, where urban growth tends to create more complex structures with multiple centers of attraction [
11,
14,
15,
23,
26].
In fact, some empirical studies using distance to the CBD as a measure of accessibility have found this variable to have limited statistical significance, leading to the adoption of alternative measures or the consideration of multiple centers. This process also involves the use of public survey data (such as censuses) and the calculation of distances to points of significant interest, such as public transport stops, shopping centers, schools, and parks. Other aspects, such as distances to major avenues or highways, are also considered [
9,
10,
12,
15,
24,
27].
A similar challenge exists regarding neighborhood quality. Neighborhood effects are equally important and difficult to measure. In traditional practice, professionals often assess neighborhood quality based on personal judgment and local market knowledge, generating an aggregated measure by district, for example. This approach can be useful in some cases; however, it suffers from limitations, such as a lack of spatial detail and a highly individualized assessment process, which may lead to redundant efforts during re-evaluation.
More specifically, some studies have demonstrated the influence of various factors, such as the quality of neighboring properties (built environment), land use intensity, the education and income levels of nearby residents, air quality, noise levels, availability of schools and public transport, access to bicycle lanes and walkability, as well as negative externalities, such as proximity to factories, waste disposal sites, or nuclear power plants [
10,
13,
16,
17,
19,
21,
22,
25,
27,
28,
29,
30]. In particular, Gallimore et al. [
20] and Grum [
31] highlight that the perceived level of safety affects neighborhood value. Other authors have examined sustainability-related aspects, including ecosystem value, the presence of green areas, and views of water bodies from residential properties [
13,
32,
33].
Although there is no consensus on the most appropriate measures for assessing accessibility and neighborhood quality, it is evident that properties with similar characteristics, located close to each other, tend to share similar market values. This “location value”—partly resulting from the immobility of housing—decreases as the distance between properties increases. Therefore, it is reasonable to assume that a property’s price is influenced by the quality of its neighborhood, which varies within urban areas. These price variations tend to form nearly continuous patterns rather than random fluctuations. Using appropriate tools, such as mathematical surfaces or geoprocessing techniques, these patterns can be mapped from market data using one or a set of location variables [
10,
11,
14,
20,
26,
28].
For commercial or singular appraisals, assessing neighborhood quality is not particularly challenging, as information on nearby and comparable properties is collected, and differences are generally not significant. In contrast, mass appraisal is a far more complex task, given the substantial variations in building types and spatial price differences. Mass appraisals can be conducted using automated valuation models (AVMs), which apply statistical or artificial intelligence techniques combined with big data to generate price models for each property type. The greater the spatial heterogeneity, the more important it becomes to estimate neighborhood values at a microscale [
23,
34,
35].
Recent studies have adopted a technologically advanced approach by assessing neighborhood quality using detailed scores derived from large sets of street view images (SVI) through AI systems, employing machine learning algorithms and computer vision techniques. Among various SVI options, Google Street View has been widely used. Although not a new tool, it has undergone significant improvements since its public launch [
36]. A detailed description of data processing using the SVI/AI approach is provided in [
37]. Systematic reviews have identified nearly 200 [
38], 400 [
39], and more than 600 studies in this field [
40], highlighting the widespread adoption of this methodology. Among these, several studies focus on walkability [
41,
42] and greenery [
37,
40,
43,
44], but only a few explore connections with housing prices [
43,
44,
45,
46,
47,
48,
49,
50,
51,
52,
53,
54,
55,
56,
57,
58,
59,
60,
61,
62].
The SVI/AI approach is relatively recent and has gained popularity; however, it is not exempt from criticism. First, it requires significant technological and human resources and does not fully capture subjective perception. The International Association of Assessing Officers (IAAO) has studied the use of various tools in mass appraisal. Their report points out opportunities, but also challenges, particularly the need for resources to develop and implement such applications. There are substantial differences in the available technical infrastructure among different cities [
63]. For example, in the Latin American context, studies on mass appraisal emphasize that the existence of a qualified technical team is a necessary condition, as these capabilities are difficult to develop in the short term [
64].
While the focus of studies such as [
63,
64] is mass appraisal for property tax purposes, similar considerations can extend to other AI applications. For instance, computer vision may require more resources than price modeling using random forest techniques. Consequently, in locations with adequate technical and financial resources, automated image evaluation could be feasible. However, these conditions are not commonly found in most small- and medium-sized Latin American cities.
A second criticism concerns the lack of truly personal perception of street appearance (scene perceptions). SVI/AI systems must be trained to predict subjective perceptions from images, recognizing visual elements such as buildings, pavement, trees, and perceived safety levels. This training relies on human input to define evaluation criteria. It is a computer-mediated vision, a proxy for human perception, representing an indirect measurement. Thus, it can be described as a “semi-subjective” method for measuring neighborhood quality. Some authors argue that fully automated recognition fails to capture human perceptions of street scenes and propose studies comparing subjective and objective measures [
56,
57,
58,
59,
60].
Nevertheless, subjective measures derived from direct human perception can explain behavior more comprehensively. There appears to be a collective image of the city or its parts, formed by the composition of numerous individual impressions, resulting in a shared vision shaped by interactions with others, both directly and indirectly [
64]. While each individual image is unique, it is influenced by this collective perception. For example, feelings of safety may be affected by the dissemination of crime statistics. Similarly, satisfaction with living in areas free from visual pollution or with natural elements is subject to cultural influences that are difficult to generalize. In this sense, the perceived quality or value of an urban area derives from such shared perceptions [
65,
66].
Despite the importance of human perception, traditional methods generate overly aggregated measurements, are time-consuming, and suffer from individual rating differences, resulting in a low-throughput process. This situation suggests an opportunity to develop an intermediate mechanism that is more objective yet remains simple and feasible for small- and medium-sized cities. This research gap is identified as a motivation for this study.
The proposed approach aims to represent micro-neighborhood values through a human-centered perspective, using a set of quasi-objective criteria. It is neither a traditional, fully aggregated approach nor one that is fully automated through AI. To the best of the authors’ knowledge, no previous studies have proposed a similar scheme.
Using this type of score, data collection and evaluation can be distributed among a team, allowing for verification, refinement, or renewal more easily than traditional individual-based procedures. This reduces subjectivity without requiring complex AI applications.
Based on this perspective, the objective of this paper is to propose and evaluate a mechanism to measure neighborhood quality using a scoring system derived from street view images, and to evaluate the relevance and contributions of this variable on the hedonic models. The method was implemented in three Latin American cities, and the resulting variable was applied in hedonic price models.
2. Materials and Methods
The evaluation of micro-neighborhood quality (at the street level) involved a variable derived from the analysis of Google Street View images. The method can be summarized in three steps: (a) proposing a framework to weigh key neighborhood elements; (b) assessing neighborhood quality based on street-level images (punctual estimates) and extrapolating these into a spatially continuous variable; and (c) collecting market data to develop hedonic price models for testing the new variable.
The process is detailed in the following sections. This study employs widely accessible tools and free software, including MS Office 365, LibreOffice 7.2, and QGIS 3.38.0-Grenoble. Notably, complex techniques such as web scraping for data acquisition or artificial intelligence were intentionally excluded from the model development. This decision ensures broader applicability, making the approach feasible for cities of varying sizes and resource availability—a key factor in extending its relevance to a significant portion of the global context.
2.1. Scoring Scheme
A proposal to determine neighborhood quality is crafted based on a synthesis of references cited in the literature, combined with insights derived from the authors’ experience and knowledge of the studied cities. This approach uses a specific set of conditions designed to comprehensively assess and measure neighborhood quality. It was developed through careful consideration of various factors that may include the quality of surrounding buildings, perceived safety, availability of amenities, infrastructure, and the overall living environment.
The categories and proposed scores are presented in
Table 1. The categories are derived from aspects widely discussed in the literature. Considerations related to the built environment (1a, 1b), through the perception of neighboring properties, align with general findings in studies such as [
10,
13,
16,
17,
19,
21,
22,
25,
27,
28,
29,
30]. Factors related to street access (2a, 2b, 3) are also addressed in various works [
9,
10,
12,
15,
24,
27,
59]. The positive influence of natural elements (4) is frequently highlighted in neighborhood evaluations, in both traditional and AI-based approaches [
13,
32,
33,
37,
43,
44,
48,
53,
62]. Physical conditions that impact pedestrians, such as the presence of garbage, graffiti, lack of sidewalks, or damaged sidewalks (5), are generally reported as negative influences [
39,
41,
42,
56]. Similarly, the sense of security (6), as perceived by residents, affects the use of public space (ease and safety for walking through the neighborhood) [
13,
19,
20,
21,
22,
25,
28,
29,
30,
32,
38,
41,
59].
It is assumed that individuals’ appreciation of street conditions influences their purchasing decisions, thereby affecting land and property prices in a given region. Furthermore, it is assumed that this appreciation is not limited to a specific point but is shaped by the overall route or path taken; individuals form an impression of an area considering the streets they pass through to reach it.
To enhance the reliability of this assessment, narrow scoring ranges were employed for each aspect, minimizing classification errors and promoting consistency in collective evaluations. In this case, the assessment was developed by both authors, taking the average of the values. It was repeated when the difference was greater than 20%. The score for an image is calculated as presented in Equation (1):
2.2. Analysis of Street Images
After selecting the study area, the neighborhood quality variable was constructed in two steps. In the first step, a grid of image sampling points was defined to organize the data collection. Google Street View images were then accessed, and a neighborhood quality score was assigned to each point. The street scene was evaluated from the perspective of the block, considering the elements visible in a view from the center of the street, looking toward both sides. An example of an image is shown in
Figure 1. In this process, each street view image was rated using the scheme presented in
Table 1, resulting in a specific score calculated according to Equation (1).
In the second step, a map representing the variable was created. After evaluating the images at the defined points, these ratings were extrapolated to produce a continuous surface using Ordinary Kriging, considering the nearest neighboring points. This assumption, widely accepted in the literature, holds that nearby locations tend to exhibit similar quality levels due to comparable general conditions of occupation and similar patterns of urban land use [
14,
17,
18,
23,
25,
28,
29].
Ordinary Kriging (OK) is a powerful tool for interpolation. Its capabilities depend on the characteristics of the data and the problem to be solved. Knowing that the choice of the best interpolation method depends on the characteristics of the data and the specific context of the problem, Ordinary Kriging was chosen because it is a geostatistical method with a great capacity to consider the spatial autocorrelation of the data and provide unbiased estimates. OK estimates values by minimizing the mean square error, resulting in estimates that do not tend to overestimate or underestimate the true values. It also considers spatial structure by using a variogram to model the location relationship between the data, all of which allows capturing spatial dependence and improving the accuracy of the estimates. Finally, OK generates standard errors that allow evaluation of the accuracy of the estimates. Other methods could have been selected; however, since the data do not necessarily show clear trends, the spatial variability of the data is not very high in this experiment, and there is a high sampling density and certain types of data or problems, methods such as nearest neighbor or linear interpolation may be more appropriate.
2.3. Hedonic Modeling
The resulting variable can be evaluated by its statistical performance, which serves to confirm the adequacy of the proposed method. In other words, it is possible to verify whether the expert-based evaluation approximates people’s behavior as reflected in their willingness to pay, as expressed in property prices.
In the real estate domain, several attributes must be considered simultaneously, each assuming different weights in explaining price variations. In this context, a property is understood as a “composite good” described by a set of attributes.
Hedonic pricing models establish relationships between the main attributes of properties and their prices [
67,
68,
69,
70]. Considering the functioning of real estate markets, certain conditions need to be met for price modeling to be statistically valid. These models are constructed using local datasets to generate equations suitable for property valuation.
To develop these models, data must be collected within the relevant sector, and the models are estimated using regression analysis. Regression analysis is a technique that relates independent variables to a dependent variable—in this case, the market price—resulting in an estimated equation [
71,
72]. The ultimate objective is to create a numerical model capable of capturing the contribution of each attribute to price formation. A hedonic price function can be proposed in the general form presented in Equation (2):
where
Price is the variable under study (response or dependent variable); x
1, …, x
k represent the explanatory variables (independent attributes); α
1, …, α
k are the coefficients of the equation, which express the relative importance of each attribute in explaining the dependent variable (also referred to as “implicit prices”); α
0 is the intercept of the equation; and ε is the error term [
67,
68,
69,
70,
71,
72].
Equation (2) presents a generic model, commonly referred to as the “classical linear model” [
71,
72]. The final specification of the variables in the equation depends on the analysis of each dataset and may adopt different functional forms, such as linear or semi-logarithmic, the latter being frequently found in models in the literature.
The coefficients of a hedonic model can be interpreted as the monetary contribution to the price resulting from a one-unit increase in a given variable, when the equation is linear. In terms of hedonic theory, α
i (i > 0) represents the implicit weight or hedonic price of the characteristic x
i, measured in the same currency as the property price. In semi-logarithmic models, the coefficients can be interpreted as the approximate percentage change in price resulting from a one-unit change in x
i [
67,
68,
70]. In general, regardless of the functional form of the equation, the effects correspond to the partial derivative of price with respect to each specific variable.
The assessment of regression models involves fundamental statistical parameters, including the adjusted coefficient of determination (R
2a), which accounts for the degrees of freedom, and the overall significance of the model assessed through a variance analysis using the Fisher–Snedecor F distribution. The significance of individual variables is determined through hypothesis tests based on the Student’s
t distribution. This analysis identifies the variables that should remain in the model under a specific significance level (α) and indicates their relative importance in explaining price variation [
71,
72].
The residuals of the models can be evaluated using the mean absolute percentage error (MAPE), a metric widely used to assess the accuracy and predictive capacity of models, particularly in the context of mass appraisal [
34,
35,
63,
64].
Several assumptions must be verified to ensure the quality of the generated model. Among these, homoscedasticity, linearity of the relationships in Equation (2), and normality of residuals should be analyzed, in addition to verifying that residuals do not exhibit outliers or spatial correlation [
18,
71,
72].
In this study, homoscedasticity was evaluated using scatterplots of standardized residuals versus fitted values. In the same plots, the presence of outliers was examined, using a ±2σ range as a reference limit. Linearity was initially analyzed through scatterplots of price against each tested attribute. When linear behavior was not evident, alternative functional forms were tested; in general, the resulting specification adopted a log-linear format [
71,
72].
Normality was assessed using the Kolmogorov–Smirnov test. Evaluating normality is essential, as various statistical tests assume that residuals follow a normal distribution. The Kolmogorov–Smirnov test compares the empirical distribution of the residuals with a reference (normal) distribution and calculates a
p-value. If the
p-value is greater than 0.05, the null hypothesis of normality is not rejected, suggesting that the data are normally distributed. If the
p-value is less than 0.05, the null hypothesis is rejected, indicating significant deviation from normality [
71,
72].
Moreover, in the real estate context, it is crucial to address spatial correlation. The presence of spatial correlation may indicate systematic patterns in the residuals and reduce the precision of the estimated values. Spatial correlation was assessed by analyzing the spatial distribution of standardized residuals. It is also commonly evaluated using Moran’s I index. In spatial analysis, a Moran’s I value close to zero suggests a random spatial pattern, values significantly greater than zero indicate positive spatial autocorrelation (clustering of similar values), and values significantly less than zero indicate negative spatial autocorrelation (dispersion) [
11,
14,
18,
70,
71,
72,
73,
74].
3. Case Studies
This study encompassed three cities: Novo Hamburgo in southern Brazil, Manizales in Colombia, and Cancun (Benito Juárez) in Mexico (
Figure 2). Land prices and significant reference points were collected in each city. These cities were selected because the authors had prior experience with them in earlier studies, particularly related to mass assessment, which is relevant when evaluating subjective aspects such as the perception of insecurity (item 6 in
Table 1). Additionally, these cities exemplify the type of context to which this approach seeks to contribute—small- and medium-sized cities that generally lack resources for automated applications [
75,
76,
77].
At the outset of the project, the process for determining image verification locations was defined. This began with the selection of the cities and, within each, the region under study, for which a centroid was established to represent the area. Subsequently, a sampling approach was implemented, involving the creation of a regular grid with a standardized spacing, typically set at 250 m. The grid coordinates were established in both longitudinal (LO) and latitudinal (NS) directions. It is important to note that adjustments to the initial study area dimensions are often necessary to account for local conditions and potential obstacles or barriers, such as large blocks, major roads, parks, rivers, or lakes.
The sampling may also present variations due to the actual geometry and accessibility of the streets, as well as the availability of images and other practical constraints. Often, proposed points were initially located in public areas (parks, military facilities, or public buildings) or within blocks. In such cases, points had to be moved to the nearest accessible street segment. When heterogeneity in street characteristics was observed, the verification point was moved to a nearby location to avoid distortions or outliers in the data modeling.
After image assessment and the development of Kriging-based maps, the subsequent step involved acquiring market data and benchmark information to evaluate accessibility at an urban scale. Market data were sourced from local real estate brokers’ websites, focusing on listings that included clear price information and precise geographic coordinates. Neighborhood quality was then estimated for each sampled property.
For the hedonic analysis, models prioritizing simplicity to facilitate interpretation were adopted, following the principle of Ockham’s Razor. In line with this approach, only essential variables were included. It is noteworthy that tax appraisal is based on municipal cadastres, which often lack certain attributes considered in commercial appraisals, such as plot shape.
The models were developed using a backward elimination approach. In this procedure, the initial model included all attributes and was progressively simplified by removing attributes with error levels exceeding the adopted significance threshold (α = 0.05), one at a time.
Data were collected in these Latin American cities with the objective of testing the application in contexts from different countries and under varied conditions. Regions within the urban area of each city were selected to verify the evaluation of the Neighborhood variable under different local characteristics.
The first study was conducted in Novo Hamburgo, a city located in southern Brazil along the federal highway BR-116. With 94 years of history, its urban area covers 223.6 km
2. The population was approximately 247,000 inhabitants (1105 inhabitants/km
2), and the GDP per capita was US
$ 6500 (based on 2018 official figures). The local economy is based on services and production related to the footwear industry. The search for market data began in the central region of the city, focusing on areas predominantly occupied by middle-income populations, which included parts of the Vila Rosa, Ouro Branco, Guarani, and city center districts (
Figure 3).
Data collection took place from March to June 2021, resulting in the acquisition of 53 land price records. To assess accessibility in this region, reference points were established, and distances were measured to key landmarks. Specifically, measurements were taken to the city’s main shopping center and to the central train station, both located close to each other and adjacent to the traditional central business district (CBD). Notably, this part of the city did not present other significant points of interest.
The second case study is Manizales, a Colombian city located in the department of Caldas, approximately 250 km from the country’s capital, Bogotá. The city is 173 years old, has an area of 570 km
2, and a population of about 460,000 inhabitants (as of 2023). There are seven universities in the city and the local economy is also related to coffee production. The analyzed region is situated in the historic center of the city, encompassing the Centro, Cumanday, Los Augustinos, and San José neighborhoods. This area is characterized by older developments and constructions in poor condition, which leads to lower land price levels. The region is predominantly occupied by low- to middle-income populations (
Figure 4).
In Manizales, 37 land price records were collected. Various points of interest were identified, and distances were measured to the central business district (CBD), represented by the commercial-historical center using the Public Market as a reference point. Additional measurements were taken to the Sancancio shopping center, the Alcázares Arenillo and Los Yarumos EcoParks, Morro Sancancio, the City Hall building, and a popular shopping area. All distances were calculated in meters based on UTM coordinates, with each distance evaluated individually and as an average.
The third city studied was Cancun, the seat of the municipality of Benito Juarez, founded 53 years ago and covering an area of 142.7 km
2. The city had nearly 900,000 inhabitants according to 2020 figures. There is a close relationship with tourism. A specific region was selected to analyze the Neighborhood variable under different local conditions, as illustrated in
Figure 5. An initial analysis provided a general evaluation of this region, which is predominantly occupied by low-income populations.
A total of 29 land price records were obtained, forming a local price observatory within the selected area. Distances from each case to key urban elements were measured, including schools, parks, transportation centers (such as three bus terminals), and the major avenues.
The samples carried out may also present variations according to the real geometry and the accessibility of the streets, as well as the availability of images and other issues. Even when the size of the sample looks restricted, it was enough to achieve good results, and besides that, these samples worked well in all the cities analyzed.
4. Presentation and Discussion of Results
4.1. Novo Hamburgo, Brazil
To incorporate the quality of the neighborhood, street images were selected and evaluated. In Novo Hamburgo, the dimensions of the programmed sampling space were 2 × 2.5 km, with 30 reference points defined (
Figure 6). Street view images were assessed at 81 points, resulting in approximately 16 images per km
2 (
Table 2).
The data collected from the images were evaluated according to the characteristics defined in
Table 1 and the average results are presented in
Table 2. Based on that information, the Neighborhood variable was established.
Not all points were directly verified, as some were located in the middle of a block or in areas without access. Additionally, in some cases, greater variations in street quality were observed, making it necessary to include more images to reduce the presence of possible exceptions.
In order to evaluate the neighborhood quality for points not directly verified through images, it was necessary to estimate values using Kriging. This technique allows us to obtain an average of a set of points, and addresses issues related to incorrect valuations or differences in perception among evaluators working as a team.
The calculation of the Neighborhood variable was performed using Ordinary Kriging, considering the five nearest data points (neighbors) and weighting by the inverse of the distance. Based on these results, a map showing the spatial distribution of the Neighborhood variable within the analysis area was generated, as presented in
Figure 7.
In sequence, the statistical relevance of this variable was verified by modeling the observatory data (53 cases). For each case, the neighborhood value was estimated based on the Kriging map.
Table 3 presents the attributes tested in the models, including both statistically significant and non-significant variables. The variable District represents the traditional subjective neighborhood quality measurement, while the other variables retain their conventional meanings.
The position of the land plot within the block, the distance to the commercial-historic center, and the District variable did not show statistical significance. The Neighborhood variable represents the measurement of the quality of the surroundings of the properties, following the procedure described in the previous section. The elements of the final model are presented in
Table 4, including coefficients,
t-statistics, and
p-values. All variables included are significant at the 5% level.
The model returned an adjusted R
2 of 0.873, with overall significance assessed by an F-test resulting in 1.236 × 10
−22. The MAPE was 2.40%. The significance levels for the variables Area and Distance were far below the conventional threshold. The Neighborhood variable exhibited a
p-value of 0.0405, which is higher than the others but still below the 5% significance level (
Table 4). The pricing model can be presented as an Equation (3).
Practical interpretation of the model demonstrates its functionality. The model identifies three key price drivers (Equation (3)). As it is not a linear model, interpretation is not straightforward. A 1% increase in lot size raises property values by approximately 1.104% (following the coefficient of Area), which is slightly more than a one-to-one effect on price. Each additional meter farther from the city mall reduces prices by about 0.0516%. A one-point improvement in neighborhood quality implies a 7.6% premium. A property valued at US
$ 250,000 (close to the average price—see
Table 3), located in a superior neighborhood (for example, improving from level 7 to 8), would gain around US
$ 19,000.
The normality of residuals was investigated using the Kolmogorov–Smirnov test, which yielded a D statistic of 0.112 and a p-value of 0.678. Since the p-value is greater than 0.05, the null hypothesis of normality is not rejected. Thus, the residuals can be considered normally distributed.
Figure 8 shows a scatterplot of standardized residuals against the fitted prices. The pattern indicates a random distribution, with no evidence of outliers or heteroscedasticity, thus meeting the assumptions of the regression model.
On the other hand,
Figure 9 shows the spatial distribution of standardized residuals across the study area. Visual inspection of this map suggests that the residuals are randomly dispersed throughout the urban space, with no evident spatial clusters or systematic patterns that might indicate local bias or spatial dependence.
To formally assess the presence of spatial autocorrelation among residuals, Moran’s I index was computed. The result, I = 0.182 with a p-value of 0.200, indicates low and statistically insignificant spatial autocorrelation. The low Moran’s I value confirms that the residuals do not exhibit spatial dependence and are not influenced by the geographical distribution of observations. This finding strengthens the reliability of the model by showing that spatial effects have been adequately captured by the included explanatory variables, particularly the Neighborhood variable, which was specifically designed to incorporate local environmental quality into price estimation.
Furthermore, the absence of spatial autocorrelation in the residuals suggests that no significant omitted spatial variables or locational biases are present in the model. Consequently, the model can be considered robust from a spatial econometrics perspective, indicating that predictions are not systematically over- or underestimated in specific areas.
These findings support the conclusion that the model is not only statistically sound but also spatially consistent, fulfilling a key assumption required for spatially explicit valuation studies. This consistency enhances the credibility of the results for urban policy and land value assessment applications.
4.2. Manizales, Colombia
In sequence, images were collected to assess the neighborhood conditions in Manizales. A total of 25 sampling points was defined within a 1 km
2 urban area. However, at some locations, images were not available—likely due to restricted access for Google Street View vehicles in pedestrian-only or otherwise inaccessible areas. In such cases, the closest accessible street location was selected as a substitute.
Figure 10 shows both the originally planned and the final sampling points, along with the locations of the land data used in the analysis.
In this case, 41 images were collected within the study region, corresponding to a density of 41 images per km
2. The information obtained from these images was classified based on the characteristics described in
Table 1, and the results are presented in
Table 5.
Using Ordinary Kriging, the values for the Neighborhood variable were estimated for each land parcel in the data sample from the Manizales observatory. The interpolation considered the five nearest points, weighted by the inverse of their distances. The resulting map is presented in
Figure 11.
Table 6 summarizes the attributes and basic statistics. In addition to Price and Neighborhood, other variables were identified and developed, including land size and the distance to key urban elements. The Sancacio Mall is adjacent to the city’s commercial-historic center (CBD), so they were represented by a single variable. Other relevant points as City Hall, Cerro Palogrande, and EcoParks (Alcázares Arenillo and Los Yarumos) were included.
Social strata are as households are classified in this city. It uses different criteria that do not depend on the income of an individual or family, but rather on the conditions of the housing in which that group of people lives and the environment or area in which it is located. Social strata determine utility bills, some taxes, and the provision of certain financial subsidies for certain households. There are six socioeconomic strata into which homes or properties can be classified, as follows: strata 1, 2, and 3 correspond to the lowest-income users, who are eligible for utility subsidies; strata 5 and 6 correspond to upper strata, with greater financial resources, who must pay additional costs (contributions) on the value of utility services. Properties in strata level 4 do not benefit from subsidies, nor do they have to pay additional costs; it pays the amount that the company defines as the cost of providing the service.
Social strata and the distances to City Hall, Cerro Palogrande, and the EcoParks were not significant at the 5% significance level. The variables that showed statistical significance were the average distances to the city center (CBD) and to the shopping center. Thus, the average of these two distances was used as a single variable: Distance (
Table 7).
The statistical significance of these variables in the price models was evaluated using the regional data. The Neighborhood variable was verified using the observatory data model (37 cases). The price model for the study region is presented in Equation (4), where the variables have the same meaning as described in Equation (3), except for Distance.
The adjusted coefficient of determination (R
2a) resulted in 0.705. The significance level in test F was 1.711 × 10
−9 and the MAPE was 4.17%. The three variables have significance levels below 5% (
Table 7).
Using a similar analysis to that applied in the Novo Hamburgo model, the results indicate that a 1% increase in lot area leads to a 0.577% rise in property prices. Reducing the distance to the city center by 500 m increases property values by approximately 21%, as derived from the coefficient 2005.839 for Distance−1. Most notably, a one-unit improvement in neighborhood quality corresponds to a 7.2% price increase, ceteris paribus. For a property valued at US$ 60,000, this implies an additional price of approximately 4300.
The normality of residuals was assessed through the Kolmogorov–Smirnov test, which resulted in a D statistic of 0.138 and a p-value of 0.445. Since this p-value exceeds 0.05, the null hypothesis of normal distribution is not rejected, allowing the residuals to be considered normally distributed.
Figure 12 shows a scatterplot of standardized residuals versus predicted values. The points appear randomly dispersed, suggesting no evidence of outliers or heteroscedasticity and thus meeting the assumptions of the regression model.
Additionally,
Figure 13 depicts the spatial distribution of standardized residuals, which do not exhibit spatial dependence. The possibility of spatial correlation of the residuals from the application of Equation (4) with Moran’s I index was verified. The result was I = 0.0553 with a
p-value of 0.165, indicating no significant spatial autocorrelation. Therefore, it can be concluded that there are no issues related to spatial dependence. Overall, the tests confirm that the model is suitable for value estimation.
4.3. Cancun
To incorporate the neighborhood quality in this city, street images were used and evaluated. The planned sampling area covered 2 × 2 km, with 25 predefined sampling points (
Figure 14). In total, 79 images were obtained (approximately 20 images per km
2), considering specific street conditions and local variability (
Table 8).
Data collected from the images were evaluated according to the characteristics defined in
Table 1, and the average results are presented in
Table 8. As shown in
Figure 14, in some areas, no Street View Images were available, probably due to limited car access.
To evaluate neighborhood quality across additional points, a value map of the Neighborhood variable’s spatial distribution was generated. This was calculated using Ordinary Kriging, incorporating the five nearest data points (neighbors) for each reference location and weighting them by inverse distance (
Figure 15).
Table 9 presents the descriptive statistics for Price, Area, and Neighborhood, which align with the definitions used in other cities. In this study, none of the distances to nearby parks, schools, avenues/boulevards, and bus terminals showed statistical significance individually at the 5% level. Consequently, an average of these distances was computed, creating a new composite variable labeled Distance.
Subsequently, the statistical significance of these variables was assessed by modeling the collected dataset (29 observations). Following the statistical analysis,
Table 10 summarizes the results, including regression coefficients,
t-statistics, and
p-values.
The proposed pricing model is presented in Equation (5). It demonstrates strong statistical performance, with an adjusted R
2 of 0.888, a highly significant F-test (
p = 1.288 × 10
−12), and a mean absolute percentage error (MAPE) of 4.25%. All predictor variables are statistically significant, with
t-test
p-values below 0.5%.
The Neighborhood coefficient implies an approximately 15.5% price influence, indicating that this variable explains a significant portion of price variation (
Table 10). The model estimates the following marginal effects. A 1% increase in lot area corresponds to a 0.849% rise in property prices. Being 1 m closer to key amenities increases values by 0.123%. For a property at the average price (US
$720,000), a one-unit improvement in neighborhood quality (e.g., from 6 to 7 due to enhanced streetscape or new amenities) is associated with a US
$112,000 price increase.
Regarding assumptions of regression analysis, linearity was addressed by transforming the numerical format of Price and Area using natural logarithms, while Distance was converted to its inverse form. Normality of the residuals was assessed via the Kolmogorov–Smirnov test, yielding a D statistic of 0.111 (
p-value = 0.826). Since the
p-value exceeds the 0.05 significance threshold, one fails to reject the null hypothesis that the residuals follow a normal distribution. Visual inspection of
Figure 16 confirms randomly distributed residuals with no discernible patterns or large outliers. This suggests the absence of heteroscedasticity and influential outliers. Given that the model satisfies the fundamental assumptions of regression analysis, it can be considered statistically robust.
Furthermore,
Figure 17 displays the spatial distribution of standardized residuals, demonstrating no apparent spatial dependence. To quantitatively assess potential spatial autocorrelation in the residuals from Equation (5), we computed Moran’s I index, obtaining a value of I = 0.124 (
p-value = 0.158). This statistically non-significant result (
p > 0.05) confirms the absence of spatial correlation in the model residuals. Overall, these findings support a robust statistical performance of the model.
5. Discussion
The proposed scoring scheme, as outlined in
Table 1, underwent evaluation using actual data from the cities, and the outcomes were notably effective. The adaptability of this scheme is particularly promising. It has potential to accommodate specific or unique situations encountered in each city.
The findings, on a broader scale, signify the practicality of employing objective variables to assess neighborhood quality using street-level images, particularly within the regions analyzed across the three cities. This demonstrates the methodology’s capacity to provide a reliable and data-driven approach to understand the nuances of real estate valuation.
One of the significant takeaways from this study is the development of robust pricing models, which not only exhibit a high degree of adequacy from a statistical standpoint but also offer a user-friendly interface for professionals. The ease of development and the method’s adaptability provide a solid foundation for both seasoned practitioners and newcomers in the field, further contributing to the feasibility and practicality of the proposed approach.
Table 11 presents some results of the study.
All the models present a relatively high adjustment coefficient of determination (R2a). This statistical metric quantifies the proportion of variation in the dependent variable explained by the independent variables in a model and serves as a fundamental indicator of a model’s effectiveness. These models have achieved a reasonable level of explanatory power, considering their relatively straightforward and uncomplicated nature. In other words, these models are adept at accounting for a substantial portion of the variability in the data, despite their simplicity. This observation is particularly noteworthy. It underscores the models’ proficiency in capturing and elucidating the relationships and patterns within the data, even without the inclusion of intricate or complex variables and parameters.
In the same way, the Neighborhood coefficients in the hedonic models for Novo Hamburgo, southern Brazil, and Manizales, Colombia are similar, indicating a price share of around 7.5%, while for Cancun, México they represent twice as much. It represents a reasonable influence. The statistical significance of this variable indicates a good level in the three cases, allowing us to accept the validity of the hypothesis that it is related to price and therefore reasonably represents market behavior.
The literature does not provide elements for a direct comparison. Some authors working with AI have developed price models based on apartment data [
57,
58,
59,
60], but no studies using land price models were identified. Moreover, comparisons of different scales were adopted in them, and the authors used disaggregated measurements. In these studies, “subjective” parameters are based on computer vision trained by humans and “objective” measurements are based on direct identification (by typical colors, for instance). Nevertheless, despite these limitations, it is possible to observe several factors with similar magnitudes and also differences among the studies, probably reflecting local conditions (
Table 12).
In economic terms, there is no clear explanation for the disparity in neighborhood coefficients. However, it could reflect differences in urban economic structures and housing market dynamics. Cancún’s tourism-driven economy intensifies the premium for neighborhood quality, as location significantly impacts rental yields and property demand. In contrast, Novo Hamburgo and Manizales (with their industrial activities) probably exhibit more stable, income-constrained markets, where price sensitivity to neighborhood attributes is moderated. Future work could explicitly test this hypothesis through city-type stratification.
In the cities studied, the low level of Mean Absolute Percentual Error (MAPE) is remarkable. It aligns with the limits set forth by the International Association of Assessing Officers (IAAO) and is within an acceptable and favorable range. This means that the models used for property valuation are generating predictions that are close to the current values, with relatively small errors or discrepancies. Achieving a MAPE that conforms to standards is a positive outcome. It could sign that the models used in property valuation provide reliable and accurate predictions, which is critical for real estate professionals, appraisers, and policymakers. It also reflects a commitment to maintaining high standards and ensuring the quality and precision of property valuation processes.
The assumptions of the regression model were verified through the Kolmogorov–Smirnov test and scatterplots of standardized residuals versus predicted values. The residuals were found to be normally distributed, with no presence of outliers or heteroscedasticity.
Additionally, evaluation based on Moran’s I index and spatial distribution maps of the residuals was carried out, showing no evidence of spatial autocorrelation. This is particularly important in models with broader spatial coverage. Above all, the absence of spatial autocorrelation indicates the robustness and effectiveness of the models in capturing and explaining variations across different geographical areas. In simpler terms, it suggests that the models adequately account for the unique characteristics and trends specific to each location without introducing bias or distortion.
The absence of spatial autocorrelation suggests that localized urban dynamics drive price variations in the study areas. This aligns with the observed heterogeneous economic conditions (industrial versus tourism economic focus). Such fragmentation implies that local factors (e.g., micro-neighborhood quality, infrastructure disparities) outweigh broader spatial trends. This finding highlights the importance of granular, location-specific variables (such as the proposed street-level metric) in hedonic models for cities, whether in monocentric or economically integrated contexts.
Achieving these results in such simple models is indicative of their robustness and efficacy in explaining the data. Furthermore, the fact that these models strike a balance between their explanatory power and their simplicity is a notable advantage. This balance implies that they offer a clear and accessible means of understanding and interpreting data. Such models are highly valuable in various applications, as they can be easily comprehended and utilized by practitioners, policymakers, and decision-makers.
Some comments on practical issues can be made in guidelines for future applications. The data collection was developed directly in brokers’ websites and, if there is availability, it is possible to obtain a good amount of data in a few workdays. On the other hand, the analysis and classification of 10 to 12 images per working hour was possible. Taking into account the range of image/km2 along the three cities, it allows us to cover around to 25 km2 in a week, with one people working on it. Of course, regions with greater diversity will need more imagens to develop a consistent analysis. For the development of an entire city, it may be interesting to use grouping techniques such as clustering, decision trees, or random forest-based models.
6. Conclusions
To collect assessment of neighborhood quality is a challenge. In a traditional approach, it is developed using non structured personal evaluation. It could be hard to reproduce and difficult to justify which is relevant to tax purposes. Existing web scraping and automatic detection methods are effective, but they often need a lot of human and economic resources, which are not easy to find in small cities. The medium-term mechanism proposed enhance it to take more objective measures without special resources.
In this study, the application of our methodology was displayed across three diverse cities, each one characterized by varying social and economic landscapes and located in different Latin American countries. This strategic selection aimed to assess the methodology’s adaptability across different urban contexts. By embracing cities with differences, it sought to push the boundaries of the proposed method and put it to the test in a real-world, Latin American context.
The fact that the results exhibit a high degree of statistical adjustment in these cities is an encouraging sign. It highlights its potential as a versatile and dependable tool for professionals in the real estate industry, urban planners, and policymakers who navigate the complex landscape of real estate markets around the world.
Furthermore, statistical analysis revealed that the models based on the Neighborhood variable developed using the proposed scheme exhibited strong results in terms of statistical performance. This suggests that the methodology not only yields consistent results but also produces well-qualified models that can be valuable for real estate professionals and policymakers in diverse urban environments.
Despite the promising results, some limitations should be acknowledged. Image and market data availability and quality can vary between cities, potentially affecting results. The manual classification process, while effective, needs some training and maybe crossing two individual evaluations to allow consistence. Future research should also consider extending the methodology to other city types and incorporating more complex spatial econometric approaches, to refine neighborhood classifications and capture non-linear effects. Further exploration of the relationship between urban economic structures and price sensitivity to neighborhood quality is another relevant avenue.
Overall, this study demonstrates that simple yet robust models can effectively incorporate granular urban data, providing accurate and practical tools for land valuation. The methodology offers a foundation for broader applications and future enhancements, contributing to the improvement of data-driven urban analysis and valuation practices.