Uncovering Bias in Objective Mapping and Subjective Perception of Urban Building Functionality: A Machine Learning Approach to Urban Spatial Perception

Jiaxin Zhang; Zhilin Yu; Yunqin Li; Xueqiang Wang

doi:10.3390/land12071322

,

and

¹

Architecture and Design College, Nanchang University, No. 999, Xuefu Avenue, Honggutan New District, Nanchang 330031, China

²

School of Architecture, Southeast University, Nanjing 210096, China

^*

Author to whom correspondence should be addressed.

Land2023, 12(7), 1322;https://doi.org/10.3390/land12071322

This article belongs to the Section Land Innovations – Data and Machine Learning

Version Notes

Order Reprints

Review Reports

Abstract

Urban spatial perception critically influences human behavior and emotional reactions, emphasizing the necessity of aligning urban spaces with human needs for enhanced urban living. However, functionality-based categorization of urban architecture is prone to biases, stemming from disparities between objective mapping and subjective perception. These biases can result in urban planning and designs that fail to cater adequately to the needs and preferences of city residents, negatively impacting their quality of life and the city’s overall functionality. This research scrutinizes the perceptual biases and disparities in architectural function distribution within urban spaces, with a particular focus on Shanghai’s central urban district. The study employs machine learning to clarify these biases within urban spatial perception research, utilizing a tripartite methodology: objective mapping, subjective perception analysis, and perception deviation assessment. The study revealed significant discrepancies in the distribution centroids between commercial buildings and residential or public buildings. This result illuminates the spatial organization characteristics of urban architectural functions, serving as a valuable reference for urban planning and development. Furthermore, it uncovers the advantages and disadvantages of different data sources and techniques in interpreting urban spatial perception, paving the way for a more comprehensive understanding of the subject. Our findings underscore the need for urban planning strategies that align with human perceptual needs, thereby enhancing the quality of the urban environment and fostering a more habitable and sustainable urban space. The study’s implications suggest that a deeper understanding of perceptual needs can optimize architectural function distribution, enhancing the urban environment’s quality.

Keywords:

urban spatial perception; building function classification; objective mapping; subjective perception; machine learning; point of interest (POI); street view images

1. Introduction

Urban spatial perception, an intricate interplay between individuals and their urban environment, significantly shapes human behavior and emotional reactions within the spheres of architecture and urban design [1,2,3]. This psychological process, rooted in the perception of forms [4], structures [4], colors [5], and aesthetics [6], underscores the importance of aligning urban spaces with the human biological structure and satisfying human needs, thereby enhancing the quality of urban life [7]. Within the research of spatial perception, the phenomenon of “bias” is commonplace, referring to a discrepancy between the output of a certain process and its anticipated outcome [8]. When classifying urban architecture based on functionality, the statistical results derived from objective data may differ from those based on people’s subjective perceptions [9], illustrating an instance of bias between observed and true values within the process of urban perception. Bias carries considerable research significance, enabling us to delve deeper into data and phenomena, and to uncover the patterns and characteristics concealed therein. For instance, the semantic differences in the soundscapes of open urban spaces can be explored by understanding the variations in sound perception among different demographics [10]. F. Zhang et al. deciphered the discrepancy between perceived safety and actual crime rates in urban areas. F. Zhang found that high-traffic daytime neighborhoods appear safer than they are, while the reverse is true for nighttime areas, highlighting the need for balanced urban management strategies. Both of these studies scrutinized the rules underlying urban phenomena from the perspective of perceptual bias [11]. This research builds on the objective mapping and subjective perceptual biases of architectural classification to explore the phenomena and characteristics of urban architectural classification in depth. The value of deviation study lies in its power to reveal insightful disparities between anticipated and actual outcomes. Particularly in urban science, scrutinizing deviations can deepen our understanding of urban perception and behavior, thus guiding urban planning decisions. Applying this to architectural function distribution can highlight discrepancies between intended and actual use, offering valuable insights for more effective, responsive urban planning, thereby enhancing urban functionality and livability.

Objective mapping denotes a rigorous, data-driven approach to spatially represent information. Utilizing Point of Interest (POI) data, this method ensures an empirical and quantitative depiction of architectural functions within the urban milieu. Contrasting subjective perception, which may be colored by individual biases, objective mapping ascertains an unbiased, analytical, and visual representation of urban architectural distributions. For instance, multi-source data, such as Geographic Information System (GIS) data and POI, can be utilized facilitating the classification and spatial distribution analysis of urban architecture [12]. Such data are usually classified and analyzed based on geographical location, architectural types, and other indices related to urban functionality. In addition, subjective perception refers to the awareness of urban architecture, environment, and functionality shaped by individual experiences, knowledge, emotions, values, and other factors as people perceive and evaluate urban spaces [13]. This perception arises from actual experiences, observations, and feelings, as demonstrated in the subjective evaluation of thermal comfort in urban open spaces [14]. Within the realm of subjective perception studies, a machine learning model was constructed to assist in mapping the perception distribution of humans towards new urban regions throughout the city [15]. Moreover, street images encapsulate the overall landscape of urban regions, and this novel source of image data has advantages not only in the fine observation of the physical environment but also in social perception [16].

By summarizing the aforementioned studies, it becomes evident that POI data can accurately carry out objective mapping for architectural classification, while analysis based on street view images can reflect people’s subjective perception of architectural classification. Hence, this research probes into the roles and corresponding influencing factors of objective mapping and subjective perception of urban architectural functionality in urban perception by comparing the objective mapping architectural classification method based on POI data with the subjective classification method based on street view images. Unlike objective classification methods that rely on predefined categories or criteria, this method considers how people perceive and interpret the urban environment when they navigate through it [13].

Despite an abundance of research on the classification and spatial distribution of urban architectural functionality, considerable challenges persist. Some of the existing research in the field tends to rely on methodologies derived from a singular data source [17,18]. However, within the domain of urban perception, a singular data source often falls short in fully unveiling the bias between observed and true values, as well as the diverse characteristics and patterns within the process of urban perception. Multi-source data aids in deeply investigating the meanings and patterns underlying phenomena. For instance, multi-source data can be employed to examine the correlation between human perception and architectural environment indices and socio-economic data, encompassing visual elements, facility attributes, and socio-economic indices [19]. Sampling and spatial analysis methods can be utilized on social media data to probe into the missing elements of social media within smart cities [20]. Furthermore, F. Zhang et al. used information from scenes and objects within social media photos to identify similarities between urban street views and cities. These studies collectively demonstrate that the use of multi-source data, along with novel perspectives and methodologies, allows for a more comprehensive dissection of urban characteristics and patterns.

This study acknowledges potential disparities in urban architectural functionality classification and spatial distribution when employing objective mapping versus subjective perception methodologies. Objective mapping, the “true values” in urban perception, mirrors the actual circumstances of architectural structures or spaces. In contrast, subjective perception, the corresponding “observed values”, embodies the individual sentiment evoked by these structures or spaces. A comprehensive exploration of the deviation between these “true” and “observed” values in urban perception can uncover sources of bias and refine measurement techniques, thereby affirming the efficacy of extant urban planning and design models. Concurrently, the elucidation of underlying patterns and characteristics can spotlight novel issues and challenges, supplying more comprehensive data to optimize decision making and strategy formation. This research endeavors to probe into the disparities and the causative factors between the actual and observed values of objective mapping and subjective perception in the classification of urban architectural functionality.

The core methodologies of this study encompass: (1) Functionally classifying urban architecture via the application of frequency density ratio [21] and inverse distance-weighted frequency density [22] methods to POI data. (2) Leveraging a Deep Convolutional Neural Network (DCNN) model to carry out functional classification of urban architecture using street view imagery. (3) Employing spatial clustering analysis and grid-structure spatial pattern similarity analysis to discern discrepancies between objective mapping and subjective perception methodologies regarding urban architectural functionality categorization and spatial distribution. (4) Probing potential factors contributing to the variances between objective mapping and subjective perception, thereby enriching our understanding of urban perception.

The structure of this paper is as follows: Section 1, the introduction, delineates the research context, relevance, and objectives, and outlines the central methodologies. Section 2 offers a literature review, investigating the roles of subjective perception and objective mapping in urban perception, alongside the deployment of machine learning in urban image recognition studies. Section 3 delves into the research methodology, covering the study area, urban architectural functionality classification based on POI data, functional categorization of buildings via visual perception employing street view images, and the analysis of disparities between objective mapping and subjective perception of architectural functionality at the urban scale. Section 4 unveils the experimental results, including building categorization outcomes based on POI data and street view images, and the variances between objective mapping and subjective perception of architectural functionality. Section 5, the discussion, examines the experimental findings and the relevance of the discrepancies between objective mapping and subjective perception. Section 6 encapsulates the study’s conclusions.

2. Related Works

2.1. Overview of Subjective Perception in Urban Cognition

Subjective perception is an integral part of urban cognition, encompassing an individual’s subjective understanding, experiences, and sensations of the urban environment. The users of urban landscapes largely experience them through visual sensations [9].

Spatial subjective perception, based on objective environmental perception, forms an essential part of urban image-related research. This subjective perception can be investigated through questionnaires, revealing urban residents’ behaviors and intentions, thereby providing more accurate evidence for urban planning and decision making [23]. In the age of big data, numerous urban data sources, represented by street view images, are documenting the evolution of people’s lifestyles in various ways. Given the advancements in image processing technology, the utilization of street view images in urban research has become increasingly prevalent in urban perception studies [24].

A series of studies are proposed to provide a theoretical and methodological basis for exploring urban imagery. A meticulous field investigation of Baidu Street View (BSV) images of the Macau Peninsula was conducted to assess key features of street space, such as openness, greenery, interface coverage, and road area ratio [25], to reveal the interaction between subjective perception and objective environment, and to further explore the relevance of these physical aspects to the physical and social well-being of the elderly population. In addition, the perspective of cognitive maps allows the assessment of urban walkability and deepens the understanding of spatial perception in a walkable environment [26]. An advanced deep learning framework was leveraged, together with an extensive collection of panoramas from BSV, to visualize and quantify three paramount indices of street-level scene perception: Greenery, Sky, and Building Landscape Indices, respectively [27]. BSV service covers most of China’s urban street view images and is available for researchers to download [28].

However, existing subjective perception research has overlooked the focus on street view architecture [29]. The elements in the street view, such as buildings, streets, parks, etc., have a significant impact on people’s cognition [15]. Even though architecture is a critical component of the urban environment, current research on the subjective visual perception of architectural classification is relatively scarce. Therefore, this study, building upon previous research, delves into the subjective visual perception of space to explore the methodology of subjective architectural classification.

2.2. Overview of Objective Mapping in Urban Cognition

Objective mapping refers to the process of providing an objective description and analysis of the urban environment based on statistical methods, offering significant reference for urban planning and design. Objective mapping can encompass a multitude of facets, such as the data resource statistical approach used in the studies in [30,31]. Investigations into the city-population-industry (UPI) system were conducted, where the coupling coordination degree was employed to assess the level of urbanization in city-industry integration, offering a novel perspective on urban development and planning [32].

In the field of transportation statistics, a comprehensive experiential dataset, consisting of billions of vehicle observations derived from static traffic detectors, was utilized to identify critical points. These observations were then juxtaposed with the OpenStreetMap network, elucidating the interplay between network topology and the emergence of these critical points. This approach has broadened the understanding of transport dynamics and network influences [33]. Geospatial data can be utilized to describe the usage of urban parks, analyzing three categories and nine variables affecting park utilization, identifying their relationship with park utilization through geospatial data [34].

Nonetheless, in research pertaining to objective urban mapping, there is a noticeable deficiency in the application of POI data. POI encompasses various geographic information elements within a city, such as locations, facilities, and attractions, among others. The lack of objective statistics regarding POI suggests a shortfall in the comprehensive representation of these urban elements. Furthermore, POI data serve as a pivotal source of information for analyzing urban issues. For example, during poverty assessments, spatial autocorrelation of poverty displayed significant positivity, which was more pronounced at the town level than at the county level. Among the chosen factors, the cost distance of POI was identified as the most significant determinant for poverty assessment [35].

2.3. Application of Machine Learning in Urban Perception

With the ongoing advancements in computer technology and data processing techniques, machine learning technologies are being employed across a myriad of domains, from healthcare and intelligent power grids to vehicular communications [36]. Machine learning is extensively applied in urban planning studies, with algorithms such as Random Forest (RF) [37], Convolutional Neural Networks (CNN) [7], and deep learning methodologies proving ideal for classification and pattern analysis of geo-observed data. Generative Adversarial Networks (GAN) have been utilized for simulating urban patterns [38,39]. Contributions from deep learning (DL) and machine learning (ML) methods to the evolution of models in various aspects of prediction, planning, and uncertainty analysis in smart cities and urban development have been notable [40,41], providing support for urban planning and decision making [42].

Soheil Fathi have employed machine learning for predicting a building’s energy performance. Beyond physics-based energy modeling, machine learning techniques can offer faster and more accurate estimates based on historical energy consumption data of buildings [43]. Goldhammer et al. have proposed a machine learning-based motion model to classify current motion states and predict future trajectories of vulnerable road users [44]. Constantine E. Kontokosta et al. have introduced a novel analytical method that combines machine learning with small-area estimation techniques to predict weekly and daily waste generation at the building scale [45].

With the continuous breakthroughs in artificial intelligence technology, research into urban planning evaluation using street view images has become feasible. This provides a more precise and efficient data source and research approach for urban planning and management. For instance, in studies of the Urban Landscape Index, traditional methods of measuring urban greening involve limitations on the coverage of various forms of greening and cannot reflect actual exposure to pedestrians. Google Street View (GSV) and deep learning can be utilized to calculate the Green View Index (GVI) through semantic segmentation, referencing greenery from a pedestrian’s visual perspective [46].

3. Methodology

The workflow of the study comprises three primary steps (Figure 1). Objective mapping, Subjective perception, and Perception deviation. Firstly, the distribution of building functions at a city scale is determined by employing the POI mapping technique in conjunction with the Frequency Ratio method and Inverse Distance Weighting. These POIs, while not widely recognized, contribute to the overall character and functionality of the urban buildings. Different types of buildings are identified and classified based on their functions using the available POI data. Secondly, subjective perception is conducted by evaluating the building functions. Street view images in this study are from BSV and are utilized as a tool for subjective assessment, and domain experts subjectively classify the buildings based on these images. Additionally, pre-trained models are employed to enhance the perception of building functions within the urban context. The final step involves calculating the perception deviation, which quantifies the disparity between objective mapping and subjective perception. Spatial clustering analysis is employed to identify differences in the distribution of building functions between the objective and subjective datasets. Furthermore, the spatial pattern similarity of the grid structure is assessed to determine the degree of agreement between the results obtained from objective mapping and subjective perception.

Figure 1. Research framework for the present study.

3.1. Study Area

Figure 2 designates the core area within the middle ring of Shanghai as the study site, covering an area of approximately 336.2 square km. As China’s economic nexus and a prominent global metropolis, Shanghai has experienced a rapid urbanization trajectory. However, in the course of urban progression, perceptual discrepancies concerning the urban environment and the irrationality of architectural function distribution have emerged as salient issues. Therefore, intertwining the studies of urban perceptual bias and architectural functionality distribution bears profound significance for the sustainable evolution of Shanghai’s central urban district. By comprehending the perceptual needs of the residents in depth, we can optimize the distribution of architectural functions, enhance the quality of the urban environment, and engineer a more habitable and sustainable urban space.

Figure 2. Overview of the study area for the present study: (a) location of Shanghai, (b) location of Shanghai’s central urban district, (c) the road network in Shanghai’s central urban district.

3.2. Functional Classification of Urban Buildings Based on POI Data

3.2.1. POI Data Acquisition and Pre-Processing

This study predominantly classifies architectural types based on POI data. Despite the wealth of spatial information contained within POI data, inconsistencies in data quality and positional shifts pose significant issues. Therefore, prior to initiating the architectural classification experiment, it is imperative to process the obtained POI data. Initial steps involve the selection of POI data from Shanghai, vectorizing it on an online map, and extracting the necessary POI data within the research ambit.

Post data validation, data cleansing is undertaken; the 12-category POI Excel files are converted into vector data, and low public recognition POI data such as newsstands and public toilets in the original data are eliminated. The processed POI data are then reclassified according to the type and function of buildings. In compliance with the “China National Standard—Current Land Use Classification (GB/T 21010-2017)”, all building functions are bifurcated into commercial, public, and residential categories (as shown in Figure 3). Given that the classification of the architectural type extends to the Area of Interest (AOI) of the building, it is also necessary to acquire the building area data required within the research scope. Finally, the geographical coordinate system of all files is converted into a projected coordinate system, and the spatial coordinates of the collected data are uniformly converted into the WGS-84 coordinate system for subsequent overall spatial-structural analysis.

Figure 3. Building function and building categories.

3.2.2. Frequency Ratio Method

Primarily, frequency methodology is employed for computation. The fundamental principle of the frequency method is the calculation of the quantity of disparate POI data on the architectural facade, classifying architectural functions based on frequency density and the proportion of architectural facade POI types within a spatial scope.

F_{i} = \frac{n_{i}}{N_{i}} (i = 1, 2, 3, \dots, n)

(1)

C_{i} = \frac{F_{i}}{Σ_{i = 1}^{n} F_{i}} (i = 1, 2, 3, \dots, n)

(2)

In these equations,

i

denotes the architectural type,

n

signifies the count of architectural types, and

n_{i}

represents the quantity of the ith POI type within the architecture. Furthermore,

N_{i}

signifies the total count of the ith POI category within the POI data,

F_{i}

represents the frequency density considering the number

i

POI type, and

C_{i}

signifies the proportion of the frequency density of the ith POI category to the frequency density of all types of POI data within the architectural object area.

Calculations for three categories of POIs are performed via the frequency density ratio method, yielding the quantity of three categories of POIs on each architectural facade file. Observations of the computational outcomes of the frequency density ratio method reveal that a substantial number of architectural facade files have not attained statistical results of POI data. This is primarily due to the absence of POI data on the architectural facade file. The strength of the frequency density ratio method lies in its ability to account for the quantity of POIs within each building facade. However, its weakness is its relatively high demand for the quality of POI data. The POI data obtained are often hard to ensure quality, primarily manifested as spatial position offsets, data losses, and the like. For facade files with POI values, computations can be made via the frequency density ratio method. However, for facades where the quantity is not accounted for, supplementary computations must be made via the kernel density method, or the frequency density method weighted by the inverse distance method.

3.2.3. Inverse Distance Weighting

Due to inherent qualitative biases in POI data, certain architectural facade files fail to obtain POI values. These gaps necessitate employing the inverse distance weighted frequency density method to supplement architectural surface area, as conventional frequency density coefficient computation methods prove insufficient for architectural typology studies. The frequency density method weighted by inverse distance incorporates a Gaussian function that offers rapid decay for a given distance factor, thereby considerably mitigating the influence of distant POI data on outcomes. Initially, architectural facade data that can be calculated using the frequency density factor method are discarded in favor of employing the inverse distance weighted frequency density method for computation.

The choice of a 100-meter distance aligns with human visual cognition of building function, reflecting the typical scale at which people perceive their urban environment. This ensures our analysis is ecologically valid, making our findings more relevant to real-world urban experiences. The key distinction between the inverse distance weighted frequency density method and the frequency density ratio method lies in the treatment of unaccounted architectural facade data. For these data, the surrounding POI is leveraged to infer type classification. For architectural areas devoid of POI, POIs within a surrounding 100-meter buffer are tallied, and the POI’s weight is restricted. Hence, areas nearer to the architectural facade possess greater weight.

f (x) = {a e}^{- \frac{{(x - b)}^{2}}{{2 c}^{2}}}

(3)

In comparison to the commonly used inverse form of the inverse function, the Gaussian function presents a smoother curve and is more applicable to the given POI data characterized by high uncertainty. The one-dimensional form of the Gaussian function is a bell-shaped curve, where

a

represents the peak value,

b

indicates the value of the independent variable at the peak (

x

=

b

also serves as the bell’s axis of symmetry), and

c

signifies the standard deviation, depicting the bell’s breadth. The true meaning of the independent variable

x

is the distance from the POI to the geometric center of the building polygon. As the POI is nearer to the architectural facade, the weight approaches 1, and as the POI is farther from the architectural surface, the weight decreases, infinitesimally approaching 0 but never reaching a negative value, as shown in Figure 4. Therefore, we assign

a

= 1 and

b

= 0.

Figure 4. Inverse distance weighting.

For built-up areas devoid of POI within their scope, this study calculates the POI categories within a 100-meter built-up buffer zone and evaluates the type classifications of the built-up areas.

3.3. Functional Classification of Buildings Based on Visual Perception of Street View Images

3.3.1. Street View Image Acquisition and Pre-Processing

In this study, we extract images from street view platforms, procuring a broad coverage of city street photographs. Initially, urban road networks, equipped with geographical coordinate information, are obtained from OpenStreetMap (OSM). Following this, we simplify the road network into linear forms with an average distance of 25 m between adjacent points, based on the urban street design method proposed by J. Gehl [47]. Subsequently, we acquired sampling points with geographical coordinate information, exhibited within the spatial distribution. It is noteworthy that not all sampling points within street view services possess corresponding street view images. Ultimately, to procure building façades, we downloaded two images perpendicular to the road from street view services for each sampling point (on the left and right, respectively, with a viewpoint of 90 degrees, a horizontal angle of 0 degrees, and image size of 760 × 480 pixels), as illustrated in Figure 5. We have obtained a total of 102,046 street view images of the central urban streets of Shanghai.

Figure 5. Street view images acquisition at the sampling point.

This study employs a methodology that segments building facades from street view images, enhancing the accuracy of facade color recognition and building function classification. In recent years, high-precision semantic segmentation models based on convolutional neural networks, such as U-Net [48], PSPNet [49], and DeepLabv3 [50], have been extensively developed. We employ the DeepLabv3 to segment building facades from street view images, owing to its high accuracy and ease of implementation. DeepLabv3 achieved an IoU accuracy of 93.5% for buildings on the Cityscapes test set. Figure 6 demonstrates the building segmentation results of street view images by the trained DeepLabv3. However, in some street view images, the proportion of buildings is relatively low, and these images are unable to reflect the color features of the buildings. To enhance experimental accuracy, we need to exclude images with a small proportion of buildings, as the computer cannot identify the features of the buildings through these images. By inputting street view images into a pre-trained semantic segmentation model, we can measure the area ratio of building facades. After calculating the proportion of buildings in each sampled image, we exclude images with a building proportion below 15%, as shown in Figure 3c.

Figure 6. Street view image segmentation through Deeplabv3. (a) Commercial buildings, (b) public buildings, (c) a low building proportion.

3.3.2. Building Semantic Segmentation and Image Classification

In Figure 7, four types of building functions are included. The first line is the single-label class, from left to right: residence, commerce, public, and other facilities. The second line is the multi-label class, from left to right: residence and commerce, commerce and public, residence and public, no data.

Figure 7. Four types of building functions are included. The first line is single-label class, from left to right: residence, commerce, public, and other facilities. The second line is the multi-label class, from left to right: residence and commerce, commerce and public, residence and public, no data.

Considering urban architecture from the perspective of street facades, the built environment primarily exhibits four types of architectural functions, namely residential (R), commercial services (C), public services (P), and other facilities (O), as shown in Figure 7. For the purpose of effectively classifying these architectural types, we employed deep learning techniques to automatically recognize architectural functions in the street view images of our study area. In previous research, a single-label approach was commonly utilized to classify architectural categories, where each photograph corresponded to one label. However, the single-label method was incapable of accurately segregating street view images reflecting multiple architectural functions, resulting in imprecise experimental outcomes. To circumvent this limitation, we adopted a multi-label image classification approach to identify multiple architectural categories within street view images.

To train the multi-label architecture classifier, we initially established a corresponding street view benchmark dataset using semantically segmented architectural images. This dataset encompassed 6000 images from four basic categories: residential, commercial services, public services, and other facilities. Within these training images, approximately 4000 images were single-label, and 2000 images were multi-label. We partitioned these street view images into a training set (80%), a testing set (10%) and a validation set (10%). Importantly, all test images differed from those employed during training. Subsequently, we trained the state-of-the-art deep learning model, EfficientNetV2 [51]. To enhance the learning rate, we performed 100 epochs of training on these models, with a learning rate decay of 0.1 every 25 epochs. Each training batch consisted of 64 images. All other numerical values were set to default. The experiment was implemented using Pytorch and executed on an NVIDIA GeForce RTX 3090 24 GB GPU.

3.4. The Deviation between Objective Mapping and Subjective Perception of Building Functions at City Scale

3.4.1. Spatial Distribution Variance Analysis Based on K-Means Clustering

This research aims to conduct a cluster analysis on pixels in an image that correspond with a target color, with a subsequent visualization of the results. Initially, to locate pixels in an image matching a given target color, we swiftly identified all pixel coordinates congruent with the target color utilizing the NumPy library [52]. Following this, the MiniBatchKMeans algorithm, part of the scikit-learn library [53], was employed to perform a cluster analysis on the target color pixels. MiniBatchKMeans, a variant of the K-means algorithm [54], employs a subset of data samples (referred to as a mini-batch) during each iteration process, thereby accelerating computations. This algorithm endeavors to find optimal cluster centers, thereby minimizing the subsequent objective function:

J(C) = Σ(Σ‖xᵢ − μⱼ‖²), where xᵢ ∈ Cⱼ

(4)

In this context, J(C) signifies the clustering error, Cⱼ represents the jth cluster, xᵢ denotes a data point, and μⱼ is the center of Cⱼ. By minimizing J (C), we can achieve more compact and representative clusters.

Finally, we utilize the seaborn library [55] to generate a scatter plot to visualize the clustering outcomes. For each cluster, a dashed circular frame is drawn, the radius of which equals the distance from the center to the farthest point within the cluster. Additionally, the center point of each cluster is also illustrated. The calculation of distance employs the Euclidean distance formula:

d (x, y) = \sqrt Σ {(x_{i} - y_{i})}^{2}

(5)

Through the implementation of these methods, we can match and conduct cluster analysis of architectural functions in both subjective perceptions and objective statistics.

3.4.2. Analysis of the Similarity of the Spatial Pattern of the Grid Structure

In this study, we proposed a methodology predicated on the spatial overlap metric, Jaccard Similarity [56], for the analysis and visualization of the differences between two city-scale spatial distribution RGB images. Below, we elaborate on the detailed steps employed in implementing this approach:

Initially, we adjusted the two input RGB images to an appropriate size to facilitate their subdivision into a specified quantity of grids. Each image was segmented into 100 × 100 grids [57]. To accomplish this, we first split the images along the vertical axis (axis = 0), subsequently concatenating the divided image blocks and further partitioning them along the horizontal axis (axis = 1).

For each pair of corresponding grids, we calculated their Jaccard similarity. The specific steps are as follows: firstly, calculate the unique colors within each grid and their occurrence frequencies. Secondly, identify the colors appearing in both grids. For each common color, calculate its minimum occurrence in each grid. Add these minimum counts to ascertain the size of the intersection. Thirdly, compute the total color occurrences in both grids and subtract the size of the intersection to determine the size of the union. Compute the Jaccard Similarity according to the formula:

J (A, B) = | A \cap^{} B | ∕ | A \cup^{} B |

(6)

where A and B denote the two sets,

| A \cap^{} B |

indicates the size of the intersection of sets A and B, and

| A \cup^{} B |

represents the size of the union of sets A and B. The Jaccard Similarity serves as an index of the similarity between two sets, with values ranging between 0 and 1. The closer the similarity value is to 1, the more similar the two sets; conversely, the closer the similarity value is to 0, the greater the disparity between the two. The computed Jaccard Similarities were stored in a matrix, the row and column numbers of which coincide with the number of grids. Finally, the Jaccard Similarity matrix was visualized as a heatmap using the heatmap function of the Seaborn library.

Through these steps, we can generate a heatmap illustrating the similarities between the two-input city-scale spatial distribution RGB images. This method enables us to compare and analyze the similarities and differences in urban spatial structures in a quantitative manner.

4. Experiments and Results

4.1. POI-Based Building Classification Results

Implementing the aforementioned Frequency Density Ratio and Inverse Distance Weighted Frequency Density methods, we categorized the structures within the epicenter of Shanghai, thereby culminating in an urban architectural classification map predicated on AOI data (Figure 8).

Figure 8. Urban building classification map based on AOI data.

To facilitate an ensuing comparison with subjective visual perception, we incorporated reference to established standards pertaining to architectural façade observation. Ordinarily, individuals can discern the overall form and intricate features of a building at close range (approximately 25 m). Conversely, at a greater distance (approximately 250 m), one’s focus primarily gravitates toward the building’s prominence and spatial relations within the urban milieu. Recognizing that individuals may scrutinize architecture from varied distances during actual observation, we opted for a median distance of 50 m (with a field of view extending 25 m on either side) as an appropriate measuring standard. This distance sufficiently accounts for both the overall shape and partial intricate characteristics of a building while reflecting its spatial relations within the urban environment. Accordingly, a 50-meter buffer analysis was conducted on the buildings, and a spatial distribution visualization of the architectural classification results based on POI data was performed (Figure 9).

Figure 9. Results of Building classification based on POI data.

Commercial buildings within the range are typically situated in areas with optimal transportation access, often at the heart of district clusters. The majority of residential structures exhibit group distribution within the range, which is consistent with the aggregated distribution characteristics of living spaces in most Chinese cities, taking the form of residential communities. Apart from large public service institutions such as hospitals and schools concentrated in clusters, public buildings in the central research area of Shanghai are rather dispersed, primarily constituting administrative offices and cultural and sports facilities, often situated within standalone structures. These cover extensive land areas and are plentiful in number.

4.2. Building Classification Results Based on Street View Images

4.2.1. Classification Accuracy of Deep Learning Models

Figure 10 illustrates the normalized confusion matrices of the seven architectural function classifications, evaluated via our testing data with the trained EfficientNetV2 model. The F1 score (F1), a robust criterion for gauging classification accuracy, is a harmonic mean that encapsulates both precision (p) and recall (r). It is formulated as:

F1 score = 2 × (p × r)/(p + r)

(7)

Figure 10. Illustrates the normalized confusion matrices of the seven architectural function classifications.

The determined F-score value for the trained EfficientNetV2 model registers at 0.82.

4.2.2. Building Function Classification Results and Spatial Distribution

The research area under scrutiny encompasses 102,046 inferred street view images. Figure 11 exhibits the distribution of building functionalities within these street view images of the investigated area.

Figure 11. Results of visual perception of urban building function classification of streetscape pictures at street scale.

From a subjective human perspective, the primary functionalities of buildings in Shanghai’s core region are predominantly residential and commercial. These two functions occupy significant positions within the visual landscape. Residential buildings likely represent the dominant function, reflecting the high population density and urban residential zones in the core area. Commercial structures, such as shopping centers and retail stores, are also prevalent, reflecting the vibrant economic activity and commercial hubs within the city.

Contrasting with the previous figure, it is noteworthy that office buildings may be mistakenly identified as commercial buildings. This misperception could arise due to various factors such as architectural design, signage, and visual saliency, all of which influence the perceived distribution of different functionalities within the urban environment.

4.3. The Result of Deviation between Objective Statistics and Subjective Perception of Building Functions

The comparison between the centroids of different building functionality color blocks in two diagrams illustrates the spatial distribution differences between objective statistics and subjective perception. We computed the disparities in the distribution of architectural functionalities between spatial clustering analysis of subjective perception and objective statistics, as shown in Figure 12. Particular attention was devoted to the centroid differences in the distribution of residential, commercial, and public buildings.

Figure 12. Differences in building function distribution based on spatial clustering analysis.

The results reveal the greatest centroid difference in the distribution of commercial buildings, suggesting substantial variability in the spatial dispersion of commercial functionality within the core area of Shanghai. Further analysis indicates that this centroid difference in the distribution of commercial buildings could be closely associated with factors such as urban planning, land utilization, and market demand. This divergence may reflect the agglomeration tendencies of commercial activities and alterations in urban development.

In addition, the centroid differences in the distribution of residential and public buildings are relatively minimal, implying a stable spatial distribution of these two functions within the core area. The exploration of these discrepancies facilitates a deeper understanding of the spatial organization characteristics of urban architectural functionalities, providing a reference for urban planning and development. Further research could delve into underlying causes and influencing factors, including the economic dynamics of commercial activities, changes in citizens’ needs, and the orientation of government planning, to promote more rational and sustainable urban development.

Results of Spatial Pattern Similarity Analysis of Grid Structure

The objective mapping results (Figure 9) and subjective perception results (Figure 11) of building functionalities were retrieved, and their grid-based spatial pattern similarities [58] were visualized on a novel image (Figure 13). Grid areas with a higher similarity are depicted in white (close to similarity value 1), while those with a lower similarity are illustrated in red (close to similarity value 0). The results of spatial pattern similarity analysis of grid structure are shown in Figure 13.

Figure 13. Grid structure spatial pattern similarity.

In terms of overall similarity, subjective perception and objective statistics demonstrate a certain degree of resemblance in their spatial distribution on an urban scale. For instance, areas proximate to the Huangpu River generally exhibit higher similarity. Regarding local differences, despite an overarching similarity between the two maps, significant disparities persist in certain local areas.

Through the analysis of grid-based spatial pattern similarity, we can glean a deeper understanding of the characteristics of these two spatial perception distributions and the differences between them. This assists urban planners and relevant stakeholders in better comprehending the development patterns of the city, thereby enabling the formulation of top-down planning strategies and intervention measures.

5. Discussion

5.1. The Significance of Perceptual Deviations for Urban Science

The significance of perceptual deviations in urban science lies in the unique opportunity they provide to juxtapose the objective and subjective dimensions of urban experiences. In particular, the challenge to perceive human activities and the socio-economic environment of cities using traditional computer vision features. Conventional image analysis techniques and automated algorithms often struggle to accurately capture the subjectivity of perception, multicultural backgrounds, and the intricate layout of urban functions. This limitation necessitates the development of superior computational tools and methods capable of holistically learning and interpreting the visual content of urban environments.

By understanding perceptual deviations within a clearly demarcated study area (the core area within the middle ring of Shanghai), researchers can aspire to more precise and quantitative measurements of architectural functional layout and socio-economic conditions. The extraction of high-level representations from street view imagery is pivotal in this pursuit. These representations are particularly beneficial when we seek to compare objective data with subjective perceptions. These representations provide insights into the functional distribution of urban architecture. By analyzing visual data and considering perceptual deviations, researchers can identify indicators that capture people’s perceptions of the functional distribution of architecture, such as commercial vibrancy and residential experience. This information is paramount for the optimization of transportation planning and infrastructure development. Understanding perceptual deviations propels the integration of urban science with data science, opening up new possibilities for innovative solutions in evidence-based decision making, efficient resource allocation, and sustainable urban development.

5.2. Potential Applications of Perceptual Deviations on Improving Urban Planning and Development

The assessment of perceptual deviations in urban environments has the potential to foster more livable cities that embody the unique characteristics of their communities, attracting new investments and stimulating economic growth. These deviations provide crucial insights into individuals’ interactions with and perceptions of their environment, informing urban planning decisions. For instance, should there be a perceived sense of insecurity in a particular urban area, regardless of low crime statistics, urban planners may address this perception through actions such as increased lighting or heightened police surveillance. Similarly, if residents feel a lack of green spaces within their vicinity, urban planners could prioritize the development of parks and green expanses. By integrating these perceptual deviations, urban planning and development can align more closely with the needs and desires of residents, thus fostering more sustainable and livable urban environments.

Further integrating advanced computational tools and methodologies to extract high-level data from street-level imagery can help illuminate mobility patterns within cities and provide detailed socio-economic data for diverse urban regions. This information serves as an invaluable resource for urban planners and social scientists involved in urban observation, research, and design. Notably, it can assist in identifying high pedestrian traffic areas or potential locales suitable for specific business establishments, such as grocery outlets. A deeper understanding of the relationship between the physical environment and human activities can inform decisions on land use, transportation, and infrastructure development. The ability to offer comprehensive and precise information about the urban landscape anchors the potential applications of perceptual deviations in enhancing urban planning and development.

The potential applications of perceptual deviations in urban planning are extensive. Detailed analyses of these deviations can reveal regions where architectural design may be misaligned with the needs and preferences of local communities. This valuable insight can guide urban planning decisions towards more effectively meeting residents’ needs and expectations, contributing to the development of more sustainable and livable cities. For instance, if a park sees little utilization due to safety concerns or a lack of amenities, planners can leverage this information to enact improvements, making the space more appealing. Moreover, understanding perceptual deviations can foster more effective communication between planners, designers, and community members, providing a nuanced understanding of how different groups perceive the same space. This can facilitate a more collaborative decision-making process, better reflecting the needs and expectations of all stakeholders.

5.3. Potential Limitations and Future Research

The present study’s limitations merit additional examination and warrant elevated attention in future investigative endeavors. First, it is imperative to address the constraints tied to the quality and volume of the underlying data. While the faithful categorization of building functionalities utilizing POI data and street view imagery is highly dependent on the trustworthiness of these data sources, the integration of subjective perception studies based on surveys can complement this objective analysis and provide a more rounded understanding of urban environments. A lack of comprehensive data coverage can jeopardize the accuracy of classification results. Furthermore, the applicability of these methodologies across diverse cities or regions may necessitate considerable adjustments due to variations in architectural styles, patterns of urban development, and spatial configurations.

Secondly, deploying deep learning models in this research introduces difficulties concerning interpretability and resource demands. The models in use are intrinsically non-transparent and devoid of interpretability, rendering the elucidation of their decision-making processes laborious and obstructing the validation and refinement of results. Additionally, the training of these models requires substantial computational resources and expertise, rendering their application across all scenarios impractical. Moreover, the objective and subjective measurements of urban attributes might oversimplify the complex and multifaceted nature of urban settings. For instance, quantifying building functionalities solely based on POI data and street view images overlooks temporal elements such as the evolution of building functionalities over time or the dynamic character of urban activities.

Lastly, while the spatial pattern similarity analysis and spatial distribution difference analysis employed in this study provide significant insights into disparities between objective and subjective perceptions of urban spaces, these methods could potentially neglect other influential factors, including socio-economic considerations, cultural contexts, and individual experiences. Furthermore, the validation of results heavily hinges on field validation, which could introduce biases due to logistical hurdles and possibly inadequate sample sizes in the validation data. The mathematical formulas utilized to quantify differences in spatial distribution and grid structure spatial pattern similarity might not sufficiently capture the nuances of urban spatial patterns, suggesting a need for more advanced statistical or geospatial techniques.

6. Conclusions

In summary, this study illuminates the impact of the discrepancy between objective mapping and subjective perception on the functional classification of urban architecture. By employing machine learning methodologies to unravel latent patterns and salient features within urban spatial perception studies, we have effectively addressed this challenge and achieved a more comprehensive understanding of urban design and planning.

Our findings have significant implications for urban design and planning. They highlight the necessity of integrating both objective mapping and subjective perception to better comprehend the diverse needs and preferences of varied communities within an urban context. This integrated approach can guide the development of more inclusive, sustainable, and livable urban strategies.

Moreover, our study underscores the importance of commercial buildings’ spatial dispersion and the dominant presence of residential buildings in shaping the visual landscape of urban spaces. These insights, derived from Shanghai’s central urban district, can inform urban planning strategies in similar urban contexts. The implications of these discoveries lie in their potential to guide the formulation of more effective and sustainable urban strategies. Specifically, the integration of objective mapping and subjective perception allows for a deeper understanding of the diverse needs and preferences of various communities within an urban context. This approach promotes the creation of more inclusive and livable urban plans.

Furthermore, our research emphasizes the critical role of interdisciplinary investigation in addressing intricate issues related to urban spatial perception and design. By bridging the gap between urban planning, cognitive science, and open-source data, we can foster a more comprehensive understanding of urban spaces and how they are perceived and experienced by their inhabitants.

Author Contributions

Conceptualization, J.Z. and Y.L.; methodology, J.Z., Y.L. and Z.Y.; data curation, J.Z. and X.W.; writing—original draft preparation, J.Z. and Z.Y.; writing—review and editing, J.Z., X.W. and Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Acknowledgments

We would like to thank the editors and anonymous reviewers for their constructive suggestions and comments, which helped improve this paper’s quality.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gehl, J.; Svarre, B. Public Space, Public Life: An Interaction. In How to Study Public Life; Springer: Berlin/Heidelberg, Germany, 2013; pp. 1–8. [Google Scholar]
Aletta, F.; Kang, J.; Axelsson, Ö. Soundscape Descriptors and a Conceptual Framework for Developing Predictive Soundscape Models. Landsc. Urban Plan. 2016, 149, 65–74. [Google Scholar] [CrossRef]
Kyriakidis, C.; Chatziioannou, I.; Iliadis, F.; Nikitas, A.; Bakogiannis, E. Evaluating the Public Acceptance of Sustainable Mobility Interventions Responding to Covid-19: The Case of the Great Walk of Athens and the Importance of Citizen Engagement. Cities 2023, 132, 103966. [Google Scholar] [CrossRef] [PubMed]
Lynch, K. The Image of the City; MIT Press: Cambridge, MA, USA, 1964. [Google Scholar]
O’connor, Z. Colour Psychology and Colour Therapy: Caveat Emptor. Color Res. Appl. 2011, 36, 229–234. [Google Scholar] [CrossRef]
Smardon, R.C. Perception and Aesthetics of the Urban Environment: Review of the Role of Vegetation. Landsc. Urban Plan. 1988, 15, 85–106. [Google Scholar] [CrossRef]
Porzi, L.; Rota Bulò, S.; Lepri, B.; Ricci, E. Predicting and Understanding Urban Perception with Convolutional Neural Networks. In Proceedings of the 23rd ACM International Conference on Multimedia, Brisbane, Australia, 26–30 October 2015; Association for Computing Machinery: New York, NY, USA, 2015; pp. 139–148. [Google Scholar]
Zhang, F.; Ye, X. What Can We Learn from “Deviations” in Urban Science? In New Thinking in GIScience; Li, B., Shi, X., Zhu, A.-X., Wang, C., Lin, H., Eds.; Springer Nature: Singapore, 2022; pp. 301–308. ISBN 978-981-19381-6-0. [Google Scholar]
Rossetti, T.; Lobel, H.; Rocco, V.; Hurtubia, R. Explaining Subjective Perceptions of Public Spaces as a Function of the Built Environment: A Massive Data Approach. Landsc. Urban Plan. 2019, 181, 169–178. [Google Scholar] [CrossRef]
Kang, J.; Zhang, M. Semantic Differential Analysis of the Soundscape in Urban Open Public Spaces. Build. Environ. 2010, 45, 150–157. [Google Scholar] [CrossRef]
Zhang, F.; Fan, Z.; Kang, Y.; Hu, Y.; Ratti, C. “Perception Bias”: Deciphering a Mismatch between Urban Crime and Perception of Safety. Landsc. Urban Plan. 2021, 207, 104003. [Google Scholar] [CrossRef]
Deng, Y.; Chen, R.; Yang, J.; Li, Y.; Jiang, H.; Liao, W.; Sun, M. Identify Urban Building Functions with Multisource Data: A Case Study in Guangzhou, China. Int. J. Geogr. Inf. Sci. 2022, 36, 2060–2085. [Google Scholar] [CrossRef]
Qiu, W.; Zhang, Z.; Liu, X.; Li, W.; Li, X.; Xu, X.; Huang, X. Subjective or Objective Measures of Street Environment, Which Are More Effective in Explaining Housing Prices? Landsc. Urban Plan. 2022, 221, 104358. [Google Scholar] [CrossRef]
Ali, S.B.; Patnaik, S. Thermal Comfort in Urban Open Spaces: Objective Assessment and Subjective Perception Study in Tropical City of Bhopal, India. Urban Clim. 2018, 24, 954–967. [Google Scholar] [CrossRef]
Zhang, Y.; Dong, R. Impacts of Street-Visible Greenery on Housing Prices: Evidence from a Hedonic Price Model and a Massive Street View Image Dataset in Beijing. ISPRS Int. J. Geo-Inf. 2018, 7, 104. [Google Scholar] [CrossRef]
Zhang, F.; Wu, L.; Zhu, D.; Liu, Y. Social Sensing from Street-Level Imagery: A Case Study in Learning Spatio-Temporal Urban Mobility Patterns. ISPRS J. Photogramm. Remote Sens. 2019, 153, 48–58. [Google Scholar] [CrossRef]
Hu, Y.; Han, Y. Identification of Urban Functional Areas Based on POI Data: A Case Study of the Guangzhou Economic and Technological Development Zone. Sustainability 2019, 11, 1385. [Google Scholar] [CrossRef]
Miao, R.; Wang, Y.; Li, S. Analyzing Urban Spatial Patterns and Functional Zones Using Sina Weibo POI Data: A Case Study of Beijing. Sustainability 2021, 13, 647. [Google Scholar] [CrossRef]
Ji, H.; Qing, L.; Han, L.; Wang, Z.; Cheng, Y.; Peng, Y. A New Data-Enabled Intelligence Framework for Evaluating Urban Space Perception. ISPRS Int. J. Geo-Inf. 2021, 10, 400. [Google Scholar] [CrossRef]
Tang, Y.; Zhang, J.; Liu, R.; Li, Y. Exploring the Impact of Built Environment Attributes on Social Followings Using Social Media Data and Deep Learning. IJGI 2022, 11, 325. [Google Scholar] [CrossRef]
Fotheringham, A.S.; Brunsdon, C.; Charlton, M. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar]
Setianto, A.; Triandini, T. Comparison of Kriging and Inverse Distance Weighted (IDW) Interpolation Methods in Lineament Extraction and Analysis. J. Appl. Geol. 2013, 5. [Google Scholar] [CrossRef]
Chen, T.; Lang, W.; Li, X. Exploring the Impact of Urban Green Space on Residents’ Health in Guangzhou, China. J. Urban Plan. Dev. 2020, 146, 05019022. [Google Scholar] [CrossRef]
He, N.; Li, G. Urban Neighbourhood Environment Assessment Based on Street View Image Processing: A Review of Research Trends. Environ. Chall. 2021, 4, 100090. [Google Scholar] [CrossRef]
Meng, L.; Wen, K.-H.; Zeng, Z.; Brewin, R.; Fan, X.; Wu, Q. The Impact of Street Space Perception Factors on Elderly Health in High-Density Cities in Macau—Analysis Based on Street View Images and Deep Learning Technology. Sustainability 2020, 12, 1799. [Google Scholar] [CrossRef]
Vallejo-Borda, J.A.; Cantillo, V.; Rodriguez-Valencia, A. A Perception-Based Cognitive Map of the Pedestrian Perceived Quality of Service on Urban Sidewalks. Transp. Res. Part F Traffic Psychol. Behav. 2020, 73, 107–118. [Google Scholar] [CrossRef]
Fu, X.; Jia, T.; Zhang, X.; Li, S.; Zhang, Y. Do Street-Level Scene Perceptions Affect Housing Prices in Chinese Megacities? An Analysis Using Open Access Datasets and Deep Learning. PLoS ONE 2019, 14, e0217505. [Google Scholar] [CrossRef] [PubMed]
Yu, X.; Her, Y.; Huo, W.; Chen, G.; Qi, W. Spatio-Temporal Monitoring of Urban Street-Side Vegetation Greenery Using Baidu Street View Images. Urban For. Urban Green. 2022, 73, 127617. [Google Scholar] [CrossRef]
Zhang, J.; Fukuda, T.; Yabuki, N. Development of a City-Scale Approach for Façade Color Measurement with Building Functional Classification Using Deep Learning and Street View Images. ISPRS Int. J. Geo-Inf. 2021, 10, 551. [Google Scholar] [CrossRef]
Hugh, S.; Fox, M.S. Homelessness and Open City Data: Addressing a Global Challenge. In Open Cities | Open Data: Collaborative Cities in the Information Era; Hawken, S., Han, H., Pettit, C., Eds.; Springer Nature: Singapore, 2020; pp. 29–55. ISBN 9789811366055. [Google Scholar]
Li, Y.; Yabuki, N.; Fukuda, T. Integrating GIS, Deep Learning, and Environmental Sensors for Multicriteria Evaluation of Urban Street Walkability. Landsc. Urban Plan. 2023, 230, 104603. [Google Scholar] [CrossRef]
Gan, L.; Shi, H.; Hu, Y.; Lev, B.; Lan, H. Coupling Coordination Degree for Urbanization City-Industry Integration Level: Sichuan Case. Sustain. Cities Soc. 2020, 58, 102136. [Google Scholar] [CrossRef]
Loder, A.; Ambühl, L.; Menendez, M.; Axhausen, K.W. Understanding Traffic Capacity of Urban Networks. Sci. Rep. 2019, 9, 16283. [Google Scholar] [CrossRef]
Lyu, F.; Zhang, L. Using Multi-Source Big Data to Understand the Factors Affecting Urban Park Use in Wuhan. Urban For. Urban Green. 2019, 43, 126367. [Google Scholar] [CrossRef]
Shi, K.; Chang, Z.; Chen, Z.; Wu, J.; Yu, B. Identifying and Evaluating Poverty Using Multisource Remote Sensing and Point of Interest (POI) Data: A Case Study of Chongqing, China. J. Clean. Prod. 2020, 255, 120245. [Google Scholar] [CrossRef]
Din, I.U.; Guizani, M.; Rodrigues, J.J.P.C.; Hassan, S.; Korotaev, V.V. Machine Learning in the Internet of Things: Designed Techniques for Smart Cities. Future Gener. Comput. Syst. 2019, 100, 826–843. [Google Scholar] [CrossRef]
Kamusoko, C.; Gamba, J. Simulating Urban Growth Using a Random Forest-Cellular Automata (RF-CA) Model. ISPRS Int. J. Geo-Inf. 2015, 4, 447–470. [Google Scholar] [CrossRef]
Chaturvedi, V.; de Vries, W.T. Machine Learning Algorithms for Urban Land Use Planning: A Review. Urban Sci. 2021, 5, 68. [Google Scholar] [CrossRef]
Zhang, J.; Fukuda, T.; Yabuki, N. Automatic Object Removal with Obstructed Façades Completion Using Semantic Segmentation and Generative Adversarial Inpainting. IEEE Access 2021, 9, 117486–117495. [Google Scholar] [CrossRef]
Nosratabadi, S.; Mosavi, A.; Keivani, R.; Ardabili, S.; Aram, F. State of the Art Survey of Deep Learning and Machine Learning Models for Smart Cities and Urban Sustainability. In Engineering for Sustainable Future; Várkonyi-Kóczy, A.R., Ed.; Springer International Publishing: Cham, Switzerland, 2020; pp. 228–238. [Google Scholar]
Zhang, J.; Fukuda, T.; Yabuki, N. Automatic Generation of Synthetic Datasets from a City Digital Twin for Use in the Instance Segmentation of Building Facades. J. Comput. Des. Eng. 2022, 9, 1737–1755. [Google Scholar] [CrossRef]
Li, Y.; Yabuki, N.; Fukuda, T. Measuring Visual Walkability Perception Using Panoramic Street View Images, Virtual Reality, and Deep Learning. Sustain. Cities Soc. 2022, 86, 104140. [Google Scholar] [CrossRef]
Fathi, S.; Srinivasan, R.; Fenner, A.; Fathi, S. Machine Learning Applications in Urban Building Energy Performance Forecasting: A Systematic Review. Renew. Sustain. Energy Rev. 2020, 133, 110287. [Google Scholar] [CrossRef]
Goldhammer, M.; Köhler, S.; Zernetsch, S.; Doll, K.; Sick, B.; Dietmayer, K. Intentions of Vulnerable Road Users—Detection and Forecasting by Means of Machine Learning. IEEE Trans. Intell. Transp. Syst. 2020, 21, 3035–3045. [Google Scholar] [CrossRef]
Kontokosta, C.E.; Hong, B.; Johnson, N.E.; Starobin, D. Using Machine Learning and Small Area Estimation to Predict Building-Level Municipal Solid Waste Generation in Cities. Comput. Environ. Urban Syst. 2018, 70, 151–162. [Google Scholar] [CrossRef]
Ki, D.; Lee, S. Analyzing the Effects of Green View Index of Neighborhood Streets on Walking Time Using Google Street View and Deep Learning. Landsc. Urban Plan. 2021, 205, 103920. [Google Scholar] [CrossRef]
Gehl, J. Cities for People; Island Press: Washington, DC, USA, 2013. [Google Scholar]
Du, G.; Cao, X.; Liang, J.; Chen, X.; Zhan, Y. Medical Image Segmentation Based on U-Net: A Review. J. Imaging Sci. Technol. 2020, 64, art00009. [Google Scholar] [CrossRef]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef]
Tan, M.; Le, Q. EfficientNetV2: Smaller Models and Faster Training. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; PMLR; pp. 10096–10106. [Google Scholar]
Oliphant, T.E. A Guide to NumPy; Trelgol Publishing: Provo, UT, USA, 2006; Volume 1. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Ahmed, M.; Seraj, R.; Islam, S.M.S. The K-Means Algorithm: A Comprehensive Survey and Performance Evaluation. Electronics 2020, 9, 1295. [Google Scholar] [CrossRef]
Waskom, M.L. Seaborn: Statistical Data Visualization. J. Open Source Softw. 2021, 6, 3021. [Google Scholar] [CrossRef]
Ji, J.; Li, J.; Yan, S.; Tian, Q.; Zhang, B. Min-Max Hash for Jaccard Similarity. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining, Dallas, TX, USA, 7–10 December 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 301–309. [Google Scholar]
Pajo, J.F.; Kousiouris, G.; Kyriazis, D.; Bruschi, R.; Davoli, F. Evaluating Urban Network Activity Hotspots through Granular Cluster Analysis of Spatio-Temporal Data. In Proceedings of the 2021 17th International Conference on Network and Service Management (CNSM), Izmir, Turkey, 25–29 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 417–421. [Google Scholar]
Dadashpoor, H.; Azizi, P.; Moghadasi, M. Analyzing Spatial Patterns, Driving Forces and Predicting Future Growth Scenarios for Supporting Sustainable Urban Growth: Evidence from Tabriz Metropolitan Area, Iran. Sustain. Cities Soc. 2019, 47, 101502. [Google Scholar] [CrossRef]

Figure 1. Research framework for the present study.

Figure 2. Overview of the study area for the present study: (a) location of Shanghai, (b) location of Shanghai’s central urban district, (c) the road network in Shanghai’s central urban district.

Figure 3. Building function and building categories.

Figure 4. Inverse distance weighting.

Figure 5. Street view images acquisition at the sampling point.

Figure 6. Street view image segmentation through Deeplabv3. (a) Commercial buildings, (b) public buildings, (c) a low building proportion.

Figure 7. Four types of building functions are included. The first line is single-label class, from left to right: residence, commerce, public, and other facilities. The second line is the multi-label class, from left to right: residence and commerce, commerce and public, residence and public, no data.

Figure 8. Urban building classification map based on AOI data.

Figure 9. Results of Building classification based on POI data.

Figure 10. Illustrates the normalized confusion matrices of the seven architectural function classifications.

Figure 11. Results of visual perception of urban building function classification of streetscape pictures at street scale.

Figure 12. Differences in building function distribution based on spatial clustering analysis.

Figure 13. Grid structure spatial pattern similarity.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Uncovering Bias in Objective Mapping and Subjective Perception of Urban Building Functionality: A Machine Learning Approach to Urban Spatial Perception

Abstract

1. Introduction

2. Related Works

2.1. Overview of Subjective Perception in Urban Cognition

2.2. Overview of Objective Mapping in Urban Cognition

2.3. Application of Machine Learning in Urban Perception

3. Methodology

3.1. Study Area

3.2. Functional Classification of Urban Buildings Based on POI Data

3.2.1. POI Data Acquisition and Pre-Processing

3.2.2. Frequency Ratio Method

3.2.3. Inverse Distance Weighting

3.3. Functional Classification of Buildings Based on Visual Perception of Street View Images

3.3.1. Street View Image Acquisition and Pre-Processing

3.3.2. Building Semantic Segmentation and Image Classification

3.4. The Deviation between Objective Mapping and Subjective Perception of Building Functions at City Scale

3.4.1. Spatial Distribution Variance Analysis Based on K-Means Clustering

3.4.2. Analysis of the Similarity of the Spatial Pattern of the Grid Structure

4. Experiments and Results

4.1. POI-Based Building Classification Results

4.2. Building Classification Results Based on Street View Images

4.2.1. Classification Accuracy of Deep Learning Models

4.2.2. Building Function Classification Results and Spatial Distribution

4.3. The Result of Deviation between Objective Statistics and Subjective Perception of Building Functions

Results of Spatial Pattern Similarity Analysis of Grid Structure

5. Discussion

5.1. The Significance of Perceptual Deviations for Urban Science

5.2. Potential Applications of Perceptual Deviations on Improving Urban Planning and Development

5.3. Potential Limitations and Future Research

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics