Street View Imagery (SVI) in the Built Environment: A Theoretical and Systematic Review

: Street view imagery (SVI) provides efﬁcient access to data that can be used to research spatial quality at the human scale. The previous reviews have mainly focused on speciﬁc health ﬁndings and neighbourhood environments. There has not been a comprehensive review of this topic. In this paper, we systematically review the literature on the application of SVI in the built environment, following a formal innovation–decision framework. The main ﬁndings are as follows: (I) SVI remains an effective tool for automated research assessments. This offers a new research avenue to expand the built environment-measurement methods to include perceptions in addition to physical features. (II) Currently, SVI is functional and valuable for quantifying the built environment, spatial sentiment perception, and spatial semantic speculation. (III) The signiﬁcant dilemmas concerning the adoption of this technology are related to image acquisition, the image quality, spatial and temporal distribution, and accuracy. (IV) This research provides a rapid assessment and provides researchers with guidance for the adoption and implementation of SVI. Data integration and management, proper image service provider selection, and spatial metrics measurements are the critical success factors. A notable trend is the application of SVI towards a focus on the perceptions of the built environment, which provides a more reﬁned and effective way to depict urban forms in terms of physical and social spaces.


Introduction
SVI is an innovative type of geographic data used for sensing the physical environment of cities [1].SVI enables users to remotely explore realistic streetscapes by providing 360 • panoramic spatial information and real-time observations of the real world from the perspective of pedestrians, encompassing natural settings and artificial landscapes [2].Furthermore, the rapid development of deep learning and image analysis technologies has facilitated the processing of fine-scale streetscape data.The emergence of such vast data sources has provided an unprecedented opportunity for digitisation, enabling researchers to conduct large-scale studies on the urban environment and human activities.
SVI provides an emerging source of data for the research on the urban built environment, allowing for a more accurate and comprehensive audit by sensing the elements and scenes captured in SVI.The existing research in this area has primarily focused on the architectural characteristics and health implications [3].The evaluation of built-environment exposures is a well-established field of health research that may be applied to mental and physical health outcomes [3].In addition, the domains defined by order, e.g., building tops and façade elements [4,5], are well-established properties of the built environment.They can be used to evaluate critical architectural attributes, including a building's type, condition, and function [6][7][8][9].Conversely, disorderly neighbourhoods, e.g., broken windows and graffiti, might imply poor socioeconomic conditions, such as high crime rates [10].SVI provides an opportunity to virtually audit the study area and evaluate the built environment in numerous locations with little effort or financial cost [11].Meanwhile, since SVI systems are now available in most countries and regions around the world, including many areas without existing footprints or 3D building data, AI-based algorithms can quickly and cost-effectively be used to obtain the 3D urban morphology from SVI data, without any existing building information [12].In addition, SVI can be used to measure urban canyon impact mechanisms such as the radiation temperature [13], buoyancy effects [14], and shortwave irradiance [15] at large scales.Metric community characteristics such as safety [16], housing prices [17], and demographic statistics [18] are derived from the mesoscale.At the microscopic scale, the habitats, resident health [19,20], and greening ornamentation in buildings have been researched [21].The findings of these studies have suggested that the use of SVI and artificial intelligence technology to investigate the quantification and image expression of built environment factors can help to excavate additional geospatial information from the city, as well as provide more complex or specific indicators and enable large-scale and quantitative urban built environment evaluations.This approach positively improves urban resilience towards low-carbon cities [22,23] and contributes to life cycle assessments for buildings and building refurbishment [24].
Since the early days of services providing large-scale SVI, researchers have recognised that this approach is highly suitable for evaluating the characteristics of the built environment [25].However, the few attempts to review the scope of this research area have focused on GSV only and on narrowly defined specific public health areas and micro-neighbourhood environmental aspects, or the studies have not been systematically reviewed.The previous literature has suggested a strong link between the physical urban environment and various health behaviours of citizens.Researchers in epidemiology, psychology, and geography have increasingly examined the effects of the built environment on various health outcomes.Still, few studies have examined the perceptions of the building environment at the geographic scales required for population-based studies [26][27][28][29].The previous studies have examined the subjective perceptions of the urban environment and the role of sensations in mental health.Some studies have examined the composition of perception-related images in favour of safer, greener, or more beautiful environments [30], and such studies, while contributing to the study of SVI in health, only focus on one of the many applications in built environment research.Some rapidly emerging articles have reviewed the use of SVI to quantify the features of building environments [31,32] and to explore their feasibility of use [33].Still, these approaches have focused on physical features, such as trees and sidewalks [26], or specific environmental exposures, such as air pollution [34].Such research is primarily application-oriented and lacks a systematic formal framework.Additionally, the research on SVI adoption in the built environment is far behind its actual development status.A systematic review and assessment of the existing building applications of SVI has not been conducted yet.This review will fill this gap and indicate areas for future research to capitalise on this new and expanding big data source.
Given the current pace of implementation, this paper systematically and comprehensively reviews the application of SVI in the built environment.This research follows the innovation-decision progress framework [35].The following overall research question guides this research: How should SVI be adopted and implemented in the built environment?To answer this question, this review aims to identify and summarise the relevant image platforms, data extraction and analysis methods, research applications, advantages, and limitations.The key findings are summarised, highlighting the potential value of SVI for a wide range of urban built environment research applications.This review not only supplements the deficiencies of the latest assessments of SVI in the built environment, but also provides essential guidance for using SVI technology to improve the built environment.This research is discussed as follows:

•
Section 2 describes the main research methods.The adoption and implementation of SVI in the built environment are explored using the systematic review method and the innovation-decision progress mechanism.
• Section 3 explores the general characteristics of SVI and the needs and main application areas of SVI in the knowledge phase, based on the innovation-decision process.

•
Section 4 analyses the potential benefits of SVI as a new data source and identifies the dilemma of its adoption in the built environment during the persuasion phase.Critical success factors (CSFs) are proposed for SVI implementation based on the reviewed publications and guidance for building environment practitioners.

•
Sections 5 and 6 summarise the current trends and discuss the focus of future research on SVI-based urban environmental assessments.

Data Sources
In this study, the first step was to determine the selection of reference journal articles from the WoS and Scopus databases to create a unified analysis database.WoS and Scopus are still the primary sources of citation data that have authority and representativeness [36].In addition, Scopus has a broader range of journals and WoS can enable a more comprehensive citation analysis [37], and they complement each other in this process to obtain a comprehensive view of the current state of international research and the research frontiers.The search was restricted by creating search strings to make this study more scientific [38].
The second step was to retrieve articles from the databases.Relevant topic papers were selected from journals using search terms in two academic databases (Figure 1).The first screening phase was performed by searching for titles, abstracts, authors, and keywords.Then, we excluded articles that were not available in full-text form as a second check.Multiple keywords were used to conduct a third eligibility check to capture the different trending items in building environment research, such as house prices, historical sites, and neighbourhood activities.Finally, the acquired literature was analysed based on the content, methodology, and year, among other factors (Figure 1).Since Google Maps first launched SVI in 2007, this paper captured the academic literature on the use of panoramic street images for urban built environment research published from 1 January 2007 to 7 April 2022, a span of fifteen years.Figure 1 shows the whole process of review in detail, and 263 related pieces of literature (n = 263) were ultimately obtained.

Research Methods
Firstly, a systematic review approach was identified as the primary evaluation framework for the systematic review, which consists of three main phases: literature collection, identification, and analysis.This method is flexible and accurate [39].The data sources section completes the literature collection phase of the systematic review method.
The articles' contents are identified and categorised in the second stage.The Rogers innovation-decision process [35] was used as the formal conceptual framework to categorise the articles (Figure 2), which helps to systematically grasp the application of SVI in the built environment, including in relation to feasibility and research fields.The innovation-decision process includes the time frame from the first awareness of the innovation to its adoption or rejection by the potential adopters.This model is based on many empirical studies and contains a set of research methods, data collection approaches, and analytical models; it can be applied to studies on the diffusion of innovation and it is predictable [40].The innovation-decision process facilitates a systematic grasp of the application of SVI in the built environment, integrates the strengths and hindrances of researchers in the specific use case, and evaluates the use of SVI in the built environment.The knowledge stage includes streetscape technology sources, requirements, and applications.At the persuasion stage, the benefits and dilemmas of panoramic images are analysed to obtain a better understanding and perspective.The decision stage occurs mainly to adopt or refuse the innovative actions caused by SVI.At the implementation stage, SVI is deployed and used in the built environment.

Source of SVI
Currently, dozens of street view services serve as sources of SVI data, most of which are regional, covering one or a few countries.Google Street View (GSV) is a street view service provider from Google, an American company that services the entire world.However, some countries have their own local SVI services.For instance, GSV is not available in Morocco, but the local service Carte.maStreet View covers about ten major cities in Morocco.Additionally, Google services have been banned in some countries, such as China.This country has two local data sources: Baidu Street View (BSV) and Tencent Street View (TSV).In recent years, Apple integrated the "Look Around" feature into the Apple Maps app on iOS-enabled devices.To some extent, the new feature is fairly similar to the long-standing Street View feature in Google Maps: it enables users to zoom in on a particular area.Apple Maps has enabled the new "Look Around" feature in a growing number of cities.Similar to GSV, it offers a method for interacting with maps, enabling cities to be rendered in 3D [41].Therefore, this section includes Apple Maps within the scope of the key service providers (Table 1).All three types of SVI data are saved as panoramas, preserving the 360 • panoramic visual information of the shooting location.In practical acquisitions and applications, the visual environment of each location can be described via multiple SVIs facing distinct natural view angles.Compared with BSV and TSV, GSV is superior in its coverage and resolution.Secondly, CS can provide images from sidewalks, bike lanes, and walkways at the micro-scale compared to GSV.At large scales, CS possesses a broader coverage and temporal resolution at locations not accessible by GSV [42].The temporal resolution of CS images will be finer in some locations than those of GSV, for which the images are typically acquired every few years and there is limited access to older images.However, CS image resources come from user uploads; thus, the image quality and field of view are often limiting factors, and the positioning accuracy of the CS images is also a cause for concern when compared to GSV.As a new supplier of street view data, Apple Maps' "Look Around" feature is more vibrant and fluid, and the photographs are of high quality.The data acquired by Apple and their high-resolution 3D photos have enabled users to obtain more accurate overall information and expansive views of highways, buildings, parks, airports, shopping malls, and other public locations [41].In terms of privacy protection, Apple Maps has an edge over Google Maps.However, as a new supplier of street view imagery, whether Apple's new "Look Around" program is as precise and accurate as GSV is yet to be proven; therefore, more testing is required.It is also noteworthy that Google Maps can be accessed on almost any device or computer, while Apple Maps is limited to just Apple's own devices.Compared to Apple Maps, GSV is more stable, more dependable, and has greater coverage.
Overall, the spatial coverage rates of these current user-contributed services are far less comprehensive than those of GSV, which tends to have complete coverage of cities and relatively uniform sampling [43].GSV is the most famous and extensive service to provide SVI worldwide.Table 2 compares the three types of services.(2) At some locations, the temporal resolution of the images will be finer than with GSV, for which images are typically acquired every few years and there is limited access to older images; (3) CS usually provides a narrower field of view than GSV images, and the extracted elements are limited; (4) There are biases in the locational accuracy of CS, which may lead to problems in map applications; (5) The spatial coverage rates of these user-contributed services are much less comprehensive than for GSV.
In addition, social media photos are crowdsourced photos shared by users on social media platforms that capture urban indoor and outdoor landscapes.Unlike SVI, streetscape photos are disseminated precisely according to the road network, while social media photos are dispersed in the city's primary locations for employment, recreation, and tourism.The former reflects the objective urban street landscape, while the latter, to some extent, express the specific groups' subjective experiences of the city.As a complement to the collection of streetscape photographs, social media photos may be used as a source.Due to the particularity of the images, social media photos have certain advantages in urban image perception [44,45].

Computer Vision
Computer vision aims to replace human eyes with imaging equipment, to recognise and measure objects, and to extract information from pictures or high-dimensional data [46,47].Traditional computer vision methods mostly use shallow, medium-level, and manually designed features to express images, such as colour spectrum, texture, shape, scale-invariant feature transform (SIFT) [48], the histogram of oriented gradient (HOG) [49], and generalized search trees (GIST) [50] data.These features require a substantial amount of specialist knowledge for feature engineering, have limited picture representation efficiency, and do not apply to various tasks.The introduction of AlexNet in 2012 addressed the difficulty of using feature representation in deep learning when processing high-dimensional data such as images, enabling the application of deep learning methods to image interpretation tasks, through which AlexNet can learn task-related visual characteristics autonomously.

Deep Learning
According to the different task types and model principles, deep learning can be divided into automatic encoder (auto-encoder), generative adversarial neural network (GAN), recursive neural network (RNN), and deep convolution neural network (DCNN) approaches.Among them, deep convolution neural networks are mainly used for image data analysis, and the landmark model AlexNet is the most extensively used classic DCNN.
For the deep learning model of computer vision tasks, the representative structures used for image object classification are AlexNet, VGG, GoogLeNet, ResNet, DenseNet, and others.Before the practical tasks are performed, this structure is often utilised to extract picture features, and then the particular network structure related to the tasks is used for the analysis.On the other hand, the training set heavily influences the model's capacity to generalise and the number of categories detected.The present deep learning model can be trained end-to-end with a high accuracy using open-source datasets (Table 3).The trained model can be directly applied to various street scenes, social media photos, and other data, providing a research basis for the image-based quantitative analysis of the urban built environment.

Places
Objects: 10 million nature images; Categories: Hundreds of categories of scenes.
To date, deep learning has been the primary method for analysing streetscapes.The artificial-intelligence-based SVI analysis methods apply deep learning, computer vision, and other cutting-edge artificial intelligence fields to the processing and analysis of SVI and to city-focused application practices.Compared with traditional methods, most of the methods based on digital image processing and traditional computer vision use shallowand medium-level visual features and manually defined features, with which it is difficult to express deep semantic information in picture scenes completely and efficiently, limiting the large-scale use of SVI in urban research fields.The current computer vision technology supported by deep learning can identify semantic objects and scene contents in pictures more efficiently, providing powerful tools for extracting semantic information from street scenes alongside tools for understanding and quantitatively expressing the contents of the built environment.

Needs of SVI
The examination of building environments is not an emerging or unfamiliar field, as there are many previous studies documenting the characteristics of building environments, such as accessibility, physical barriers, accessibility to public transportation and recreational spaces, and greenery [59,60].By detecting and understanding the elements and scenarios of the built environment, researchers can quantitatively study the urban built environment.However, in this field, most studies rely on in-person assessments and field surveys to collect data on the relevant characteristics of building environments [27,61].Traditional urban spatial studies usually utilise self-reports, questionnaires, and field surveys.Questionnaires and self-reports are the most prevalent data sources for evaluating various neighbourhood characteristics [53].Previous data collection methods have faced deficiencies in terms of their high labour intensity, lengthy update cycles, and geographic restrictions.With the acceleration of urbanisation, it is difficult for traditional theoretical methods to cope with rapid urban development and to describe the dynamic evolution of the urban built environment as a complex system with accurate quantitative data.
Currently, most data used for field observations are collected by walking or driving about the study region using predetermined questionnaires to record and characterise the surroundings [27], which is time-consuming and impractical for large-scale applications.The traditional approach uses a field research-based methodology, which makes it difficult to evaluate on a large, fine-grained scale [62].Although remotely sensed images can provide a bird's eye view of cities from macro-and high altitudes, they are expensive, and their high resolution is susceptible to atmospheric influence, environmental interference, sensor jitter and other factors, making the acquired data uncertain.The secondary sources include those based on the spatial analysis and modelling of predefined environmental measures, such as spatial accessibility measures [63,64].However, this approach fails to characterise the built environment in detail and may be limited to particular environmental factors [64].
Without adequate support from appropriate technologies, these challenges are generally inevitable.SVI data provide a visual record of a building's environmental features and can support more effective and scalable alternatives to site-based approaches.SVI systems can collect images in multiple directions to create panoramic views, and image users can observe the features contained in the built environment using audit instruments through a virtual "driving" community.With the use of SVI systems to provide object visibility and broad access to data, researchers can improve their workflow efficiency, review multiple cities simultaneously, and obtain micro-scale streetscape elements more effectively.Accuracy and coherence have been shown between observational field audits using SVI and image-based interpretations [11].Therefore, this new paradigm should be proposed to solve these problems and guide studies on sensing the urban built environment.

Main Application Areas
SVI has been extensively used in various environmental perception practices to allow for the quantitative representation and analysis of physical spaces and to extrapolate the semantic information related to socioeconomic and human activities embedded beyond the physical space.SVI is widely adopted in building environment quantification, spatial emotion perception, and spatial semantic speculation.

•
Element identification Visual object recognition, scene type, and attribute classification applications are the most prevalent applications when measuring a building environment.In terms of visual object recognition, vegetation is the most sophisticated area type.SVI can capture the vegetation in the street at different height levels with an extremely high resolution.Furthermore, SVI also provides high-resolution and multi-layered information about trees, shrubs, lawns, and other forms of vegetation in the street and allows for vegetation assessments [65].The green view index (GVI), the sky view factor (SVF), the tree view factor (TVF), and street-tree visual audit methods are often used to quantify urban greening and analyse the visibility of urban forests.In addition to cross-sectional comparative analyses, SVI offers the possibility of longitudinal studies, facilitating the analysis of the temporal changes in GVI in cities [66].
On the other hand, SVI provides multidimensional information about the form, colour, material, and other aspects of a building, which can be extracted to present the building type [67], the building's condition [68], and the building's age [69], as well as the height and number of floors of the building [70].Other studies focused on extracting building features have involved detecting building façade features (including the building façade's colour and other features), graffiti artwork [56], and window-to-wall ratios [71].This research field also seems to be currently focused to a large extent on more minor urban features and street facilities, or those that are often overlooked in spatial datasets, such as traffic signs and traffic signals [72,73], utility poles [74], and access holes [75].

•
Physical environment assessment SVI is applied for thermal environment simulations, the detection of sound and light environment, and air quality evaluations.
Firstly, SVI is mainly utilised for radiation and temperature simulations by combining meteorological data computations and numerical modelling calculations with information about the SVI system's shooting position and geometric characteristics [15].With deep learning techniques, the SVI system can extrapolate the SVF in the environment to evaluate the urban heat island effect and thermal comfort [76].In addition, the influence of vegetation on the thermal environment has become a research hotspot, and the utilisation of SVI to extract different types of vegetation on the roadside can help to analyse the spatial relationship between the layout and thermal environment.Regarding the radiance, by projecting the solar trajectory onto a fisheye image of the streetscape, SVI can be used to calculate the solar duration [77] and to quantify the total street-level shortwave irradiance [15].In contrast to the expensive and limited use of 3D building models to calculate solar radiation, SVI fisheye images are a highly desired supplemental data source for simulating solar radiation within street canyons.However, the existing SVI-based radiation estimation models require a combination of dynamic weather conditions in practical cases and in the analysis of separated radiation direction maps.
Secondly, SVI systems are equipped with ground-based photographic equipment to capture the physical urban environment in a three-dimensional profile view, conveying more detailed visual content and calculating the various effects of elemental indicators on people's behaviour and perceptions.Therefore, SVI systems can further estimate the PV potential in densely populated metropolitan regions, areas where vehicle traffic may cause solar glare, and for human perceptions of noise.For instance, SVI systems can quantify the impacts of building façades, courtyards, and streetscapes on noise annoyance and stress levels [78], and SVI can also be used to detect traffic noise in urban environments.
Finally, deep learning methods can be used to analyse the features extracted from SVI data, and SVI can be applied to assess air quality.Mobile monitoring (either bicycle-based or GSV-based) has been frequently used to gather real-time air quality measurements to evaluate local air quality and air pollutant exposures [79][80][81], including black carbon [82] and particle count concentrations [83].Meanwhile, architectural elements such as greenery and buildings in the built environment are gradually becoming crucial points in air quality research [84].

Emotional Perception
Individuals develop unique sensations of place based on their unique visual surroundings, experiences, and resident activities in the environment.Deep learning models trained with datasets can simulate individuals' emotions about scenes in the built environment to further evaluate the built environment with respect to three main areas: a sense of security, health, and the quality of life.

• Community safety
A sense of security is a high-level attribute of people's perceptions of urban scenes.By revealing the environmental factors associated with crash data, including road conditions [85] and road characteristics [86], the analysis of the SVI can provide valuable information for pedestrian and driver safety.In addition, the neighbourhood environmental disorder level has been considered a strong predictor of neighbourhood crime rates and residents' fear of crime.This involves physical features related to the spatial layout of buildings, street design, and the diversity of land use.Therefore, the use of SVI enables research into the relationship between crime and the physical characteristics of the built environment [87].

•
Public health SVI data represent a significant, publicly available data source that can be utilised to create metrics for the characterisation of the physical environment through machine learning techniques [88].The current research has suggested that built environments' characteristics are correlated with mental health and chronic disease.Further research includes concerns regarding well-being [89] and obesity [90].In addition, the architectural characteristics may have an indirect impact on the psychological health of the occupants through factors such as the walkability [91], greenery [92], and public open spaces [93].Stress and mental health are the primary research focal points.Infectious illness research is also vital to health and well-being since disease transmission is directly connected to environmental variables.SVI provides an excellent opportunity to examine the environments in which infectious agents breed, with current studies covering potential dengue breeding environments [94], areas of high risk for COVID-19 [95], and pathogenic environmental factors and their transmission pathways [19].

•
Environmental behaviour The building form and function and human-scale features in the built environment are the main factors influencing the vitality of a street.SVI is similar to the human perspective.Hence, it has been utilised in a wide range of urban perception studies, with the main extracted features including sidewalk quality [96], recreational facilities [97], and street interface fencing [98] features.Using SVI to analyse the quality of life in the built environment also includes identifying potential urban congestion points [99], understanding measures to mitigate near-road pollution [100], predicting the difficulty of driving a car [101], and identifying garbage dumps [102].Moreover, the built environment can affect the behaviour of people who engage in physical activity [103].Utilising SVI enables the measurement of residential environments related to walking infrastructure and traffic safety, such as the effect of greenery on walking behaviour [104] and walking infrastructure [105].Cycling is another type of physical activity that has health and environmental benefits [106].The images captured via SVI can be used to evaluate the environmental factors influencing cycling behaviour and to determine road recyclability.

Spatial Semantic Speculation
The urban scenes depicted in the streetscape not only convey the visual information in the scene but also implicitly express the information about the city's function, history, culture, and the socioeconomic and human activities behind the visual scenes.SVI records the city's physical environment, and the characteristics of the physical environment can predict the non-visual aspects of the city.This information can be combined with spatial material attribute data, such as household income and house price data, to check the prediction and evaluate the economic environment.The income level, education level, and even the political orientation of a neighbourhood can be inferred by identifying parked cars [107], neighbourhood store signs [108], and even vegetation [109].The relationship between changes in urban physical space and socioeconomic levels can be studied by quantifying how places in neighbourhoods change [110].Based on the broken window theory of the built environment, house photos, and the condition of a house's surroundings, streetscape pictures can predict crime in the neighbourhood to some extent [111].Streetscape pictures can be used to predict house price information and perform electricity consumption assessments [112].

Development Outlook of SVI 4.1. Perceived Benefits
As the element that links the street to the city as a whole, the quality of the built environment is essential to the urban environment.SVI is an excellent way to observe the built environment and to examine the relationships between the built environment and its parts.SVI has numerous perceived benefits in the built environment (Table 4).

Benefits Findings (Empirical Research, Opinion-Based)
Wide coverage 360 • panoramic views [113]; Views of the entire city at street level [11].
High coverage density SVI was sampled for the road network [114]; The images of the sampling points contain street scenes of buildings, people, vehicles, trees, roads, billboards, telegraph poles, etc. [115].

Detailed content
Parameters can be adjusted [116]; Measuring multiple variables in the micro-build environment [10].

Highly efficient acquisition
Only 7.3 s to rate each item with 360 • GSV scenes [117]; Automatically extracts information for a more consistent, objective, and large-scale collection [118].

Anthropomorphic perspective
Images are captured by a camera mounted on a car, bike, or backpack [118]; A rich sense of reality and strong messages [119].

Others
Remote access to location capability at a low cost [11]; NZ $0.70 per km for a field researcher and NZ $0.02 per MB for a virtual audit [11]; Comparative data at the international level [11]; Security considerations [10]; Virtual surveys can be conducted year-round, regardless of the season or weather conditions [120].
The significant benefits of SVI include its comprehensive coverage, high coverage density, complex expression level, acquisition efficiency, and anthropomorphic perspective.
Firstly, SVI systems already cover most cities within the coverage area and can be viewed in 360 • [121], allowing researchers to analyse the data from a worldwide perspective.
Secondly, in terms of the coverage density, SVI provides high-density coverage of all levels of the road network in the built environment.The visual images between sampling points can be seamlessly combined, giving a complete picture of the physical spaces of urban streets.
Thirdly, regarding the expression content, the SVI provides an exhaustive and detailed representation of the actual state of the urban built environment from a human perspective.The continuous availability of high-definition images ensures the fineness of the SVI representations of the physical space in the urban built environment.With the further support of relevant artificial intelligence technologies, the precise extraction of semantic targets and the efficient understanding of a scene's content can be achieved.
Fourthly, regarding the data acquisition efficiency, Google, Baidu, and other map service providers provide commercial and free street view data under certain conditions, which can be accessed and downloaded through applicable APIs, thereby simplifying the procedure and encouraging the use of automated techniques.In addition, the use of artificial intelligence technology dramatically increases the overall audit speed [117], enabling quick and efficient evaluations of large amounts of image data, and allowing researchers to audit more streets in almost half the number of days using SVI [66].
Fifthly, SVI can capture objective cityscapes from a human perspective.The information contained in SVI data can be used to explore the intangible aspects of urban life and people's perceptions of the environment [122].SVI captures a three-dimensional profile view of the urban streetscape and can record the views or perceived scenes from the ground.
Lastly, this review revealed that SVI is consistently secure, protecting researchers [10] and enabling them to conduct research at a low cost [120], in addition to enabling worldwide data comparisons [11].

Dilemmas of SVI Adoption
The adoption and application of SVI in the built environment presents several challenges related to image acquisition, image quality, data spatial distribution, data timing, and analysis methods (Table 5).

Image acquisition
The images of adjacent acquisition points have similarities [118]; User-uploaded images are not available in GSV [123]; User permission restrictions [117].

Image quality
Blurred and inadequate 2D pixels result in low reliability and detection rates [113]; Instability and potential bias in extracting relevant streetscape variables [10]; The vector data registration process is complex [118].

Spatial distribution of data
Uneven distribution of countries and urban areas [124]; Open spaces and backyards are not covered [97]; Vehicles collecting GSV data may not reach every location [114].

Data time
Uncertain image update time: 2-4 years [125]; The information extracted from the images is difficult to match with weather or time data [118].

Analysis method
The depth of the neighbourhood features extracted by the computer vision model may be limited [95]; Small containers are difficult to identify, such as jars and bottles [95,97].

Cost Limitations
It takes time and is unsuitable as an online method [126]; Computational cost and processing speed become issues [115].

Image Acquisition Challenges
The obstacles to the acquisition of SVI sources can hinder the method's application.First, regarding the restricted access, most SVIs currently originate from map service providers such as Google and Baidu.The accessibility of street view data relies heavily on such companies' business development directions and data provision policies.Still, some service providers require that users pay to use their service, thereby increasing the acquisition cost [10].In addition, the Google website only contains information about the devices used to capture images, the areas currently covered, and the areas they are currently imaging.Although the recent inclusion of user-uploaded data (including images) in Google Maps may increase the variability in the image quality and authenticity, user-uploaded data are included in a separate unlinked "photo sphere" that is subject to acceptance criteria [123].
Second, user permission restrictions require that users obtain prior written authorisation to publish any content provided on the map and that they do not advertise or provide instructional information about illegal activities.These regulations may severely limit the opportunities to implement SVI, such as in the field of criminology or for historical update studies of the built environment.
Third, the availability of GSV images varies worldwide because of different political, economic, legal, and technical factors.For instance, no GSV service is available in most parts of Africa, South America, the Middle East, India, China, Southeast Asia, and Russia.GSV mostly comprises sporadic, unofficial coverage in several nations.This is noteworthy, since it resembles certain other crowdsourced datasets (e.g., Mapillary).
Finally, more minor characteristics, such as door numbers, are occasionally lost or erroneous owing to the "noise" in the acquired images, rendering certain elements unsuitable for recognition using GSV images.

Image Quality Issues
Mapping services have established assumed quality assurance mechanisms.There will be inevitable deficiencies in the quality of images caused by factors such as lighting issues and weather, given the number of images, environmental conditions, and geographic coverage [127,128].In addition, objects in the user's focus are often obstructed in the images, such as passing cars and people [129].Vegetation seems to be the main obstacle, often obscuring buildings and other objects.While this, on the one hand, facilitates the research of greenery in the built environment, it hinders the range of the images.For example, large objects such as buildings tend to be completely obscured.

Uneven Spatial Distribution
SVI services tend to have geographically dense coverage, but the coverage is unevenly distributed.The spatial distribution of SVI is hotspot-shaped, occurring heavily and frequently in some localised areas or cities, and remaining unavailable in about half of the countries worldwide.In contrast, smaller towns and rural areas may not always be included in areas where such features are available.This means that the research is tailored to the urban built environment [130].In addition, the image availability and capture frequency rates vary across cities, with more affluent communities having higher image availability rates and better capture times.

Temporal Instability of Data
In addition to limitations in geographic coverage, the temporal instability of SVI has been criticised as a weakness in its systematic use for observing the built environment.First, the frequency of updates, which seems to be a common problem [125], may be higher because some elemental situations and features of the streetscape environment may change over time, while exhibiting random variations and regular day-, season-, or weather-related fluctuations in measurement errors.These can include the number, features, and activities of pedestrians; parked or moving vehicles; and many physically disordered markers such as litter.Thus, in some areas, images are collected infrequently (often out of date) and insufficiently to research the current conditions and perform an updated analysis or longitudinal temporal analysis (e.g., change detection).Second, the image acquisition time is frequently highlighted as a concern.The image capture rate may not match the desired research period, and inconsistencies in the time of day, season, and weather are present in field observations.SVI may also lead to bias or may not match the periods of other datasets used in the research [116].For example, collecting streetscape videos early in the morning may result in lower levels of observed social and pedestrian activity, which may (depending on the timing of trash removal) affect the degree of physical disorder in the measured streetscape.In addition, it is noteworthy that different city sections can be captured over different periods.Finally, there is often a dispersion in the timing of the image collection, where some images are taken in winter and others are taken in summer, spring, or fall.Differences in the temporal distribution can easily lead to bias [131].For example, in studies evaluating green spaces in the built environment, which require images taken during the same period, it is necessary to examine images and exclude data that are not from the same period to maintain the temporal consistency.

Analysis Method Deficiencies
There are two significant trends in the current use of street view images.The first trend is to directly use pre-trained models based on deep learning to classify or regress street scenes.Such methods can predict explicit semantic information in street scenes, such as identifying objects and scene types.However, the pre-trained model's training set is not similar to the application set's distribution, and the "domain adaptation" diminishes the model's accuracy and affects the statistical analysis.Most research ignores rigorous statistical analyses and causal inferences, such as correlations between SVI visual items, spatial dependencies, and SVI visual objects.
The second trend is to employ deep learning models to extract generic scene features, such as the 512 features retrieved by the ResNet model based on the Places dataset [58], to represent the scene's visual similarity and specificity to other scenes in different places and regions.Since the high-dimensional features used in such methods are extracted using deep learning "black box" models, it is still difficult to interpret the semantic content expressed by the features, which can lead to a lack of interpretability of the conclusions during research.

Cost Limitations
Currently, some service providers require that users pay to use their services, making the acquisition cost higher [10].As for GSV, the Mapbox API is free of charge if the number of dynamic maps the JavaScript API calls is less than 50,000 per month [132].To date, the cost of GSV starts at USD 0.007 each (USD 7.00 per 1000), with a usage limit of 30,000 maximum queries per minute [133].The BSV system based on SDK (including Location SDK, Map SDK, Navigation SDK, Eagle Eye SDK, etc.) and the JavaScript API are free.The WebAPI services exceed the free quota and need to be purchased for an additional fee.Although BSV offers a variety of service purchase options, 60,000 RMB per month for multiple service options is not cheap [134].
In addition, computer vision models using supervised learning methods typically require large training datasets consisting of tens of thousands of manually labelled images to train the models adequately.Thus, the research teams must have enough time and resources to create these large training datasets.For example, the architecture for Faster R-CNN and ResNet-101, which has a near-maximum accuracy on the Microsoft COCO object detection dataset, still requires excellent runtime performance [52].On a PC with a 3.6 GHz i7-7700 processor, 32 GB RAM, and 1080 Ti graphics, it took 95 h of processing time to perform object detection on 1 million images [97].Therefore, the dataset processing time and cost are affected by various aspects, including not only the number of datasets but also the technical facilities required for researchers, and there is a lack of research on the comparative cost of each dataset.Although the related literature explores the differences in categories between different datasets [52], it mainly focuses on the characterisation of datasets and concentrates on the computer domain.It is noteworthy that some work has been carried out to predict the required execution times for a wide range of the most frequently used components of neural networks [135,136].Although these approaches cannot be used to compare diverse data sets, they can be used not only to infer the execution time for a batch or entire epoch, but they can also support making a well-informed choice for the appropriate hardware and model [137].This will contribute to future AI-based SVI research.

Other Dilemmas
Various other dilemmas are associated with the current use of SVI to analyse the built environment, such as privacy issues and technical costs [115,126].Images featuring human features must be erased or hidden to protect people's privacy.This may lead to underestimating the neighbourhood environment and issues in urban safety research, which may bias study conclusions.In addition, upon review, it was found that there is a partial lack of data sharing (e.g., code and trained models), which can also lead to unavailability in some of the corresponding studies (e.g., replication or duplication).

Critical Success Factors
Even though this review shows that SVI can be used in a wide range of ways to evaluate the built environment and provide useful urban information that was unknown beforehand, there are still some challenges to be addressed in the current application of SVI.To weaken the barriers to the implementation of SVI, the CSFs developed based on a literature review of case studies deserve attention.

•
Selecting an accurate image service provider Currently, the use of SVI is in the early adoption stage.Although there are several picture resource providers (as mentioned in Section 3.1), their correctness in practical applications must be considered.OpenStreetMap (OSM) can have open data problems such as an insufficient coverage or irregular alignments, which must be handled via validation masks.These masks filter the samples used to train the model in a proper way [18].Therefore, selecting a reliable and image-rich provider of SVI resources is crucial for the research on the whole urban built environment.

•
Appropriate spatial metrics The quantitative measures of the built environment are mainly used for the components of the street space, including the street pavement, interfaces, and the enclosed sky view and streetscape [138], although SVI provides a vast array of features and scenarios from which to pick, which implies that unless the urban building space is evaluated quantitatively using a specified measure, the image recognition will result in a mismatch of the feature points falling on these elements.The use of appropriate geographic areas for estimating environmental exposures is essential to studying the determinants of the built environment; an uncertain geography will highlight the spatial extent to which individuals experience their environment, as well as the temporal uncertainty in the timing and duration of these experiences [139].Consequently, it is difficult to assess whether the elements identified through SVI as spatial metrics are a true reflection of the environments exposed in everyday life.Furthermore, it is equally vital to identify appropriate metrics for activity spaces.Using various methods to define neighbourhoods and activity spaces can cause different results, such as measuring the GVI of an area through vegetation and street characteristics, which can ultimately affect the quality of the spatial perceptions within the area.

•
Data integration and management The SVI data are large in volume and dynamic, and the identification of SVI elements often requires the evaluation of a massive number of street images.Even though deep learning improves the detection efficiency compared to machine learning, it still requires the use of the researcher's equipment.Additionally, the investigation team must have sufficient time and resources to process the dataset.Therefore, the researchers need to identify targets based on the data they expect to generate.Meanwhile, there is a need to provide an effective mechanism to convert the segmented semantic SVI data into accurate and meaningful information, and more importantly to apply the acquired dataset in quantitative evaluation studies of the urban built environment.The statistical analysis mechanisms based on data and space fulfil this need.The statistical analysis techniques can be used to examine causal linkages between evaluation themes and SVI components; the spatial analysis methods can be used to visualise the spatial distribution patterns of the urban built environment's elements and to display the relationships between the urban built environment and these elements (Table 6).

Spatial analysis
Graph-based spatial analysis [62] Overlay analysis; Buffer analysis; Network analysis.

Discussion
In this research, we analysed the present applications of SVI in several respects.Firstly, the systematic description of the numerous applications highlights the adaptability and variety of SVI systems.Secondly, the innovative decision process framework can help in systematically reviewing the research challenges in the built environment, which is why SVI is required as a novel method to acquire data, offering a comprehensive clarification of how SVI is adopted and implemented.This provides valuable guidance for understanding the adoption of and the decision-making process for SVI in the built environment, which is relatively uncommon in other studies, as most are application oriented.Thirdly, this paper summarises the present benefits and challenges of SVI, allowing researchers to make quick judgments.These advantages and limitations are equally instructive in studies of SVI promoting healthy cities, walkability, urban planning, and other related issues.
In this review, we found that experiments and simulations are the primary tools for evaluating the urban built environment.Deep learning is the standard and most sophisticated approach to image processing.Deep learning is commonly utilised in research, and it accelerates the extraction of features and the segmentation of pictures, which is crucial for much of the research discussed in Section 3. In addition, due to the advantageous nature of deep learning in terms of semantic speculation, the research on urban environment perception has received increasing attention.The application of SVI analysis methods in urban research has advanced beyond scene categorisation, object backdrop distinction, and position detection to the physical environments of streets and to spatial perception.
One of the most prevalent current uses of SVI involves plants and greenery.The following factors were considered herein: (1) Street trees, shrubs, lawns, and other forms of greenery have long been known to be essential elements of urban landscape design.SVI provides multiple benefits in urban environments, meeting diverse and overlapping goals.(2) Street greenery significantly contributes to the beauty and walkability of residential streets.The presence of plants often enhances the aesthetic assessment of urban settings.(3) Remote sensing imagery has been used to calculate green space percentages, green space/building area ratios, green space densities, and other factors so as to analyse, evaluate, and visualise urban greenery [145].At the same time, SVI provides an entirely new perspective when assessing the profile view of street greenery.The integration of both data types bridges the gap between the previous studies and provides new research perspectives, with additional research opportunities for urban greenery studies.
In addition, the detection of temporal variation in SVI is becoming increasingly attractive [146], and temporal variation has been intensely discussed in relation to the recent data infrastructure for urban architecture.However, most street view services (including GSV as the most popular service) do not allow the retrieval of historical images through APIs.The only time-series studies have also collected data from GSV's web interface (including historical images) or other means, rather than through APIs [147,148].The current SVI providers gradually continue to restrict access, which may favour the development of crowdsourcing services.This may alleviate these problems but could substantially limit the current research in the field.
Lastly, the real value of SVI can only be seen when combined with existing semantic segmentation techniques.It is difficult to segregate SVI applications and papers into meaningful groups, since some cover more than one domain, but this shows that the research topic is multidisciplinary.

Future Studies
The future studies on the evaluation of the built environment based on SVI should also focus on the following areas:

•
The integration of various data sources, such as remote sensing images, geotagged social media data, cell phone signalling, and bus cards.Attention should also be paid to the use of new methods, including deep learning and big data analysis, to conduct multiangle and multilevel research within a fine-scale perspective on the urban environment and to improve the reliability and accuracy of the evaluation based on SVI in the built environment.

•
With the advent of the 5G era, the real-time uploading of SVI data recorded via webcasts, geotagged social media, and traffic loggers will be faster and easier.Street view data stored in the cloud will be more diverse and available in real-time.Com-puter vision technologies and web-based developments also offer the possibility for interactive platforms that enable street-view uploading and analysis.The growing CS platforms, developments in autonomous driving, and urban infrastructure are expected to address the current spatial coverage and temporal sampling frequency issues of street view data with crowdsourced street view sharing platforms such as OpenStreetCam and Mappilary.

•
With the recent progress that has been made, the deep learning technology applied in urban research appears to be more accurate.The trend of semantic segmentation to achieve more rapid and higher-resolution images was identified by combing the previous related research.The current semantic segmentation approach mainly involves low-resolution representation or recovering high-resolution representation learning.With the progress and development of deep learning technology, high-and low-resolution parallel learning is gaining more attention.In addition, the latest semantic segmentation model can provide more possibilities for street view images in urban research.

•
As the dense coverage of indoor data at the microscale becomes more available (e.g., the extension of voluntary SVI), we predict that this might bring about enhancements and novelties for applications such as change detection for indoor data.

•
In addition to using street maps for elemental measurements [8], the current research advances show the feasibility of generating 3D models from street maps [149][150][151], which can be combined with a model database to quickly generate virtual cities with certain style requirements and high accuracy.

Restrictions
The limitations of this research are worth discussing and focusing on in future research work:

•
Advancements in computer vision and processing capacity are crucial for the future development of SVI.However, the chosen publications do not address the technical research elements, and the investigation of the semantic segmentation approaches for SVI is beyond the scope of this review.

•
The concept of the "built environment" in this review may still limit the applications of SVI.For example, this paper does not consider studies of urban parks, trails, and urban agricultural areas.This review also deliberately excludes direct traffic observations through SVI, which may affect the assessment of the perceived quality of life in the built environment.

•
The growing interest in SVI research and the corresponding increase in the number of publications has created a need for other researchers to follow this area and contribute to the knowledge base.
In conclusion, SVI research must be given continual and dynamic attention.Nextgeneration information technologies such as big data, artificial intelligence (AI), cloud computing (5G), and Internet of Things (IoT) technologies all have dynamic development rules that need to be taken into consideration.SVI research in the built environment should be closely tracked across disciplines to ensure that the literature is complete and to better clarify the development of SVI research in general.

Conclusions
This paper provides a comprehensive answer to adopting and implementing SVI in the built environment by summarising and analysing the papers contained in the Scopus and the WoS databases.
Primarily, SVI can capture elements of the built environment at the line-of-sight level for assessment at a lower cost.With the considerable development of this urban data source and the establishment of supporting infrastructure (e.g., services), the use of SVI for urban analysis has become a trend that will continue to grow for the foreseeable future, as seen in the number of SVI-related studies and applications.
Secondly, with the support of artificial intelligence technology, using the street-level landscape and three-dimensional profile information provided by SVI, representative evaluation elements, such as roads, pedestrians, trees, and buildings, can be selected to analyse and evaluate a specific environmental element or comprehensive environmental elements within the spatial scope of streets, communities, and cities.This enables the quantification of the urban environment, environmental perception detection, and semantic speculation.
Thirdly, SVI adoption is not an easy task, presenting obstacles from both the imagery and technology facets.The data integration and management, the selection of appropriate imagery service providers, and spatial metrics are the critical success factors that can reduce these barriers.
Finally, the supporting infrastructure (e.g., services, volume, coverage of data, and computer vision technology) needs to be further developed and enhanced.The future application trends for SVI are mainly focused on the perception of the urban environment.The research shows that the emergence of SVI provides a practical aid for analysing urban environmental perceptions in terms of the spatiotemporal coverage and granularity, offering the possibility for more refined, efficient, and large-scale depictions of urban forms from physical and social spaces.At the same time, this study provides planning and design policy advice and assistance for sustainable smart city development and the management of residents' health regarding urban design teaching, research, and practice.

Figure 1 .
Figure 1.Literature collection process and analysis.

Figure 2 .
Figure 2. SVI adoption and implementation process in the built environment.

Table 1 .
Overview of the major service providers of SVI.

Table 2 .
The main comparison of the three types of services.

Table 3 .
Major open-source training sets.

Table 4 .
Summary of the perceived benefits of SVI in the built environment.

Table 5 .
Summary of dilemmas when using SVI in the built environment.

Table 6 .
Major elemental data analysis methods.