Panoramic Street-Level Imagery in Data-Driven Urban Research: A Comprehensive Global Review of Applications, Techniques, and Practical Considerations

The release of Google Street View in 2007 inspired several new panoramic street-level imagery platforms including Apple Look Around, Bing StreetSide, Baidu Total View, Tencent Street View, Naver Street View, and Yandex Panorama. The ever-increasing global capture of cities in 360° provides considerable new opportunities for data-driven urban research. This paper provides the first comprehensive, state-of-the-art review on the use of street-level imagery for urban analysis in five research areas: built environment and land use; health and wellbeing; natural environment; urban modelling and demographic surveillance; and area quality and reputation. Panoramic street-level imagery provides advantages in comparison to remotely sensed imagery and conventional urban data sources, whether manual, automated, or machine learning data extraction techniques are applied. Key advantages include low-cost, rapid, high-resolution, and wide-scale data capture, enhanced safety through remote presence, and a unique pedestrian/vehicle point of view for analyzing cities at the scale and perspective in which they are experienced. However, several limitations are evident, including limited ability to capture attribute information, unreliability for temporal analyses, limited use for depth and distance analyses, and the role of corporations as image-data gatekeepers. Findings provide detailed insight for those interested in using panoramic street-level imagery for urban research.


Introduction
As a merger of cartographic and photographic forms of representation, the addition of the Street View platform to Google's stable of geolocation applications (also including Maps and Earth) in 2007 not only introduced interactive geolocated panoramas to the masses [1,2], it also advanced the use of street-level imagery for research purposes. Although Google Street View coverage is patchy and incomplete overall, many cities are now included and regularly updated on the platform, particularly those in the global North [3]. Following Google Street View, other digital platforms have released street-level panoramic imagery products including Microsoft Bing StreetSide (US and select European cities), Apple Look Around (select US and international cities), Baidu Total View and Tencent Street View (Chinese cities), Kakao/Daum Road View and Naver Street View (South Korea), and Yandex (Russia and some Eastern European countries), as well as the corporate-owned crowdsourced platforms KartaView (formerly OpenStreetCam, operated by Telenav) and Mapillary (recently acquired by Facebook). Viewers of these platforms can navigate between images captured at regularly defined intervals along streets, virtually experiencing a diversity of urban streetscapes, built environments, and human activity from an eye-level, 360 • perspective. While the interactive, immersive potential of streetlevel imagery platforms draws in users for informational, educational, and experiential purposes, researchers around the world are drawn to their vast repositories of panoramic imagery as a source of urban big data [4]. This paper provides the first comprehensive, global, state-of-the-art review of literature on the use of panoramic street-level imagery for data-driven urban research.
Corporate street-level imagery platforms typically provide access to panoramic images and their associated metadata (e.g., geographic coordinates, timestamp) for free or low cost to researchers. As global coverage has increased in recent years, researchers have developed both manual and computational techniques to analyze and extract locationbased information from street-level images for a diverse array of urban research topicsfrom access to greenspace [5] and estimation of pedestrian volumes [6], to perceptions of safety [7], car ownership patterns [8], the relationship between physical attributes of homes and crime risk [9], and between street sign linguistics and area-based socioeconomic status [10]. Considerable recent growth in the analysis of street-level panoramic imagery aligns with a surging interest in 'proximate sensing'-in which urban datasets are derived from images captured at high resolution near to the subject matter-an exciting prospect in the era of data-driven urban analytics.
Despite the rapid growth in use of street-level imagery for research on cities around the world, the few attempts to assess the scope of this research area have focused on Google Street View only, and on narrowly defined application domains (e.g., in health research, see [11,12]). This paper addresses this knowledge gap through a comprehensive global review of research using street-level imagery from corporate panoramic street view platforms from around the world. The following overall research question guides this study: how is imagery from panoramic street-level image platforms used in data-driven urban research? To answer this question, we systematically review literature on the use of street-level imagery for research on cities, focusing on identifying the imagery platforms used, research application areas, methods of data extraction and analysis, and benefits and limitations. Key findings are summarized, highlighting the potential value of panoramic imagery for a wide range of urban research applications. Notably, this review identifies an increasing interest in the use of street-level imagery as a source of data on urban research problems via both manual 'virtual audit' and survey approaches applied to small image samples, as well as computationally driven visual analytics and machine learning techniques applied to massive image datasets. The considerable advantages are described in detail, as well as the limitations that researchers should consider before using street-level imagery in urban research projects. Finally, we summarize new advances in panoramic imaging technology poised to further advance the collection and analysis of street-level images.

Methods
An English language literature search was undertaken to capture academic literature on the use of panoramic street-level imagery for urban research published in the 14-year period between 1 January 2007 and 31 December 2020 (when Google Street View was released in the US up to the time of review completion). This paper follows the scoping review methodology, a comprehensive literature review method undertaken to summarize the scope and extent of an area of knowledge [13]. As a knowledge synthesis approach, scoping reviews provide a means to rapidly consolidate knowledge about an area of research or practice, and are especially useful for delineating the contours, themes, concepts, and key issues pertaining to new and emerging topics, with the objective of shaping future research priorities [14]. Although conducted similarly to systematic reviews, scoping reviews differ in that they aim to summarize the state of a research domain through more expansive inclusion criteria, rather than answer narrow research questions [15]. As such, scoping reviews are ideal for synthesizing an area of research defined by heterogeneity with regard to research design, methods, and application domains [16]. Three academic databases were used to access published academic research from journals, conference proceedings, and university theses: Google Scholar, Web of Science, and ProQuest Dissertations and Theses, respectively. Further, reference lists of key articles were hand searched to identify relevant citations not acquired through the database searches.
Guided by the project research question and based on a preliminary scan of the published literature, the research team identified 5 key types of information to extract from each item in the scoping review sample:

1.
Urban research application area.

3.
Methods for extracting and analyzing data from the imagery.

4.
Benefits and advantages of panoramic imagery.

5.
Barriers and limitations of panoramic imagery.
Since the overall aim of the review was to understand how images from street-level imagery platforms are used across a wide variety of urban research topics, an assessment of research design quality or consistency of outcomes across studies was not appropriate and, therefore, was not conducted. The final list of literature search terms used is provided in Table 1. The searches combined one or more terms from Group A with one or more terms from Group B, in an iterative process and with various truncated versions applied. Several exclusion criteria were applied to refine the scope of the review. Articles were excluded if the study used conventional street-level images; for instance, studies based on imagery captured from a dashcam and made available on the crowdsourced KartaView or Mapillary platforms. Articles were also excluded if the main focus was not an application to an urban research topic. Although most studies were based on imagery of urban settings since coverage is most complete in cities, examples of excluded studies include those focused on the development of data extraction algorithms, image processing or enhancement, and the assessment of imagery completeness across different platforms. Some studies were identified that reported on the use of panoramic imagery platforms primarily for immersive experience, orientation, and spatial cognition purposes rather than the extraction of data, and were thus excluded from the review. Further exclusions were applied to articles in which panoramic imagery was produced for the project rather than accessed from an existing street-level imagery platform, and when imagery was used solely for illustrative purposes. Both authors reviewed all articles included in the final scoping review sample. In a small number of cases the reviewers disagreed on whether to include an article in the sample, and the final decision to include or exclude was resolved through discussion.

Results
The literature search produced a total of 234 articles that fit the inclusion criteria. The majority were published in academic journals, followed by conference proceedings, and university dissertations or theses. The final scoping review sample reveals a wide variety of disciplines in which this imagery has been used-including across the health and social sciences, and in urban planning, environmental studies, and computer science. Figure 1 illustrates the year of study publication, highlighting a progressive increase in the use of this imagery in the latter years of the review period. Note that no studies included in this review were published in 2007 or 2008. Table 2 highlights the findings for a selection of 10 articles representing the various areas of urban research and imagery platforms used. The full list of publications reviewed is available at this link: https: //tinyurl.com/panoramic-images-review (accessed on 8 June 2021).

Study Locations and Street-Level Imagery Platforms Used
The scoping review identified seven panoramic street-level imagery platforms used for urban research globally. Google Street View provided the source of imagery in a majority of the studies (166, 71%). The second and third most popular sources of imagery were Baidu Maps Total View (40, 17%) and Tencent Street View (23, 10%), both of which have comprehensive coverage of cities in China only. Bing StreetSide from Microsoft-available in US cities primarily-was used in a small number of studies (5). Two South Korean platforms were used in a handful of studies, Kakao/Daum Road View (2) and Naver Street View (5). Three studies used imagery from Yandex Panorama, the Russian platform available in Russian and some Eastern European cities.
A majority of publications focused on cities in the USA (87, 37% of studies) and the Republic of China (69, 29%), reflecting the wide coverage of Google Street View and Baidu Total View/Tencent Street View in those countries, respectively. European cities (primarily in the UK and France) accounted for 64 (27%) studies. A small number were conducted in Canada (7, 3%), New Zealand (6, 3%), Australia (5, 2%), and South Korea (5, 2%). No studies were from cities in Africa, likely due to the very limited coverage in any platform (apart from Google Street View in South Africa). Limited literature in the sample from South America (4 studies across Chile and Brazil) is likely due to both the exclusion of articles not published in English and the limited availability of imagery there.

Techniques Used for Data Extraction and Analysis
Most of the studies included in the scoping review used street-level imagery as a source of evidence for the presence or absence of various urban features or phenomena, based on the premise that data can be extracted from the imagery using either a manual or computational approach, and that the imagery is a good representation of urban space. Studies extracted both quantitative information (e.g., presence, absence, quantity of a given attribute) as well as qualitative information (e.g., interpretations, subjective scene ratings). Temporal change analyses were conducted in a small number of studies based on a systematic analysis of imagery captured at different time periods (available in some platforms including Google Street View and Tencent Street View). In many studies, data extracted from the imagery were then analyzed in conjunction with additional datasets and evidence to support urban theories or models, from health behaviours to crime prevention (see application areas below). In most cases the urban environments investigated by the researchers had pre-determined parameters (e.g., specific locations of interest, features or phenomena to record), with the rare exception of studies that chose to randomize these parameters [27].
A total of 162 (69%) studies used manual methods of data extraction, largely undertaken as a form of 'virtual audit' or survey. In such cases, imagery was analyzed either directly in the fully 360 • environment of the street-level imagery platform-including the 'drop-and-spin' method [28]-or by first obtaining imagery through the platform's application programming interface (API) and then viewing it in digital imaging software. Many studies used custom virtual audit techniques based on a predefined checklist or survey instrument to record the object, feature, or phenomena of interest. Several established virtual auditing tools adapted for panoramic image scenes were used across several studies, including FASTVIEW [29], CANVAS [30], and SPACES [31]. These studies largely relied on trained members of the research team to conduct virtual audits, in some cases also evaluating inter-rater reliability [32]. A further subset of studies developed crowdsourced research designs in which distributed participants were provided training to carry out audits directly via the Web platform's public interface [33].
The review identified a continued growth in the use of computational methods for analyzing street-level images, accounting for 72 (31%) studies. Many such studies aimed to go beyond the descriptive and inventorying objectives of studies based on manual methods, analyzing this imagery for the purposes of more advanced feature recognition, segmentation, modelling, and inference. In most such studies, computational methods were applied to images accessed in bulk through platform APIs, enabling analysis of vast amounts of imagery using statistical modelling and data science methodologies. Computational techniques applied to street-level imagery include a variety of automated and algorithmic methods, in which imagery is analyzed and processed under researchersupervised conditions. A number of studies also utilized artificial intelligence (AI) and machine learning (ML) methods, particularly 'computer vision' feature recognition and image segmentation techniques based on convolutional neural networks (CNNs) designed to recognize, extract, categorize, and quantify visual information with lower levels of human oversight [34,35]. Some studies accessed street-level imagery from large preassembled datasets designed for computational analysis, including the Place Pulse dataset built from images extracted from the Google Street View API [36,37]. To train and test computer vision models, further urban imagery datasets containing pre-labelled and segmented imagery are often used, such as the CityScapes [38] and ADE20K [39] datasets.

Urban Research Areas
The findings of the scoping review reveal a wide range of applications for panoramic street-level imagery in urban research. We refine this heterogeneity into five overarching thematic categories: built environment and land use; health and wellbeing; natural environment; urban modelling and demographic surveillance; and area quality and reputation. As illustrated in Figure 2, the two most common applications of street-level imagery were research on built environment/land use and health/wellbeing, with 91 studies in each category. In general, across all application areas, the objective of most studies was the acquisition of data based on individual aspects of the image (objects, features, or phenomena) or the whole image itself, what Kang et al. [12] refer to as 'element' level and 'scene' level observation, respectively.

Built Environment and Land Use
With complete or near-complete coverage in many global cities, analysis of the built environment is perhaps the most ready-made application area for street-level imagery. With regard to manual techniques to identify built environment features, Hara et al. [40] recruited participants to identify the presence of accessibility features such as curb ramps in Google Street View imagery using a crowdsourcing approach and manual audit data extraction technique. Plascak et al. [41] produced a high-resolution map of New Jersey's sidewalk conditions using a manual virtual survey approach based on the CANVAS auditing tool. The Russian-owned platform Yandex Panorama has been used in a small number of instances to manually audit built environments, including to identify the locations and aesthetics of 'third-wave' coffeeshops in Istanbul [42], and to evaluate building quality for seismic risk assessments in Sochi [43]. More commonly, however, street-level imagery is analyzed computationally to detect built environment features such as street infrastructure [44,45], as part of a wider area of research applying computer vision methods for urban design and built environment problems. As an example of how features recognized in imagery are used as a basis to explain social processes, Jiang et al. [46] analyzed imagery from Baidu Street View in a CNN model to detect traffic signs in two Chinese cities, enabling the linkage of sign placement in the built environment with the presence of 'traffic violation-prone' areas.
Regarding urban land use, analyses of street-level imagery are often presented as a complement to or alternative approach to land use classification using conventional data sources, particularly remotely sensed data (e.g., satellite imagery). Given the fully panoramic horizontal field of view captured at the level of human interaction with the city, street-level imagery offers an alternative viewpoint for land use analysis which may afford a more detailed classification of urban land use types-which can sometimes be difficult to differentiate from the aerial nadir perspective [47]. Land use applications of street-level imagery typically require alternative classification methods than those conventionally used in remote sensing given the absence of multiple spectral bands (only RGB are available), so much of the literature utilizes an array of machine learning techniques known as computer vision. In essence, these approaches undertake feature recognition and classification based on geometric and topological properties rather than spectral signatures, and, thus, can be applied to both remote and street-level imagery. Cao et al. [48] document a process for land use classification at pixel level from both viewpoints using a scene segmentation neural network for New York City, illustrating the potential of higher accuracy for classifying socioeconomic land use categories compared to aerial imagery solely (see also an analysis of house prices [34] and patterns of gentrification [49]). Notably, this method relies on a spatial interpolation technique to estimate the pixel values between street-level images (since they are captured intermittently), which adds an additional element of uncertainly when fusing remote and 'proximate' sensing datasets. Srivastava et al. [50] developed an urban land use detection approach that fused aerial remote sensing imagery from Google Maps with street-level imagery from Google Street View to detect element-level land use features, using a CNN model trained on a crowdsourced-labelled OpenStreetMap dataset. Illustrating the added value of the horizontal viewpoint of street-level imagery, Li et al. [19] analyzed images from three US cities to classify residential land uses at the block level, based on detailed building facades as opposed to the more generic rooftops analyzed in aerial imagery classification. The results produced land use maps at the individual block-level, which represents a scalar improvement over conventional neighborhood-level land use classification.

Health and Wellbeing
Some of the earliest analysis of street-level imagery was for urban health and wellbeing studies, which itself now represents an established study design in this field as evidenced through recent reviews [11,12]. Urban health applications are situated in an understanding of health and wellbeing as shaped by social factors and the places in which we live, work, and play-known as the social and environmental determinants of health [51]. Most analyses focus on associating urban natural-and built-environment characteristics with observed health patterns. In the majority of studies, the aim is to examine the influence of potential exposures (e.g., to local environment features) on health and wellbeing, rather than to extract evidence of actual health conditions or health behaviours of individuals or groups captured in the imagery. One notable exception is a study of alcohol in urban streetscapes in Wellington, New Zealand, that included an audit of 'visible alcohol consumption' [52], and a small number of studies of pedestrian behaviours and volumes [53]. This relationship between health, society, and place has been examined through street-level imagery across a wide range of physical and mental health conditions. A large number of studies have examined the relationship between urban design and health, including the presence of physical activity infrastructure and rates of obesity [54,55] and mental wellbeing [56], and between street characteristics/infrastructure and walking behaviours [57][58][59], pedestrian injury [30,60], and cycling safety [23,61]. The relationship between exposure to urban natural environments and a range of health outcomes is a further substantial research focus (e.g., [62,63]), as well as between natural environments and health risk factors including stress [64]. Based conceptually on notions of salutogenesis (exposure to factors that enhance rather than impair health), a number of studies also explored the impact of urban natural environments in promoting health and a wider sense of wellbeing [65,66].
From a risk identification and mitigation standpoint, studies analyze this imagery to identify areas or individual elements of the built environment that could be targeted for intervention. Examples of this approach include a manual audit of Google Street View to identify significantly greater 'obesogenic advertising' than other forms of sign advertising within an 800m radius of schools in Auckland [67]. Nguyen et al. [68] analyzed 164 million Google Street View images in a machine learning model to predict areas at elevated risk of COVID-19 based on recognition of built environment features thought to be associated with the elevated virus risk, including non-single family homes, dilapidated structures, and visible wires.

Natural Environment
The review identified a considerable focus on the use of street-level imagery for scene-level analyses of urban greenery, blue spaces, and open space using computational techniques. In a study of Beijing, Tencent Street View images were trained in a machine learning model (the FCN8 CNN model) against labelled imagery from the ADE20K dataset to recognize blue and green space [69]. Also in Beijing, images from the Tencent Street View API were used to calculate the Green View Index (a measure of green in images as an indicator of vegetation presence) using an automated scene segmentation algorithm based on SegNet, a pixel-level CNN for semantic segmentation [70]. At the element-level of analysis, researchers use this imagery to recognize, categorize, and quantify individual natural environment features within the image scene. To answer the question of whether virtual audits can replace in-situ visits by field tree surveyors, volunteers with varying levels of experience were recruited to manually inventory street trees present in Google Street View images of suburban Chicago [71]. Findings indicated that such approaches can replace in-person surveys for basic information such as tree location but are less likely to be a replacement if tree species or diameter estimations are required. In many studies focused on natural environments, researchers also attempted to identify a correlation with health outcomes, mobility, or socioeconomic status, illustrating considerable crossover between this area of urban research and the other thematic categories used for this review. Notably, researchers have analyzed street-level imagery to make spatial associations between the presence of greenery and various social and structural conditions, including active transportation behaviours [62,72], pollution [73], gentrification [74], and housing prices [75][76][77], as well as the relationship between the availability of open space and poverty [78].

Urban Modelling and Demographic Surveillance
This research category includes studies that applied street-level imagery in large-scale computational modelling of urban environments or populations. In this category, the objective of most studies was to examine the potential of this imagery as an alternative or complement to computer-generated 3D media, and conventional forms of demographic indicator data. Through applying a variety of computation methods, researchers have used street-level imagery to reconstruct urban scenes and scene elements, particularly for use in urban planning. With the aid of GSV imagery, Takizawa and Kinugawa [79] reconstructed 3D cityscapes, and Cetiner [80] focused on a specific element of the urban environment, the modelling of bridges. A few studies have examined the characteristics of 'urban canyons'; this includes a study that used Baidu imagery to estimate daily sun duration at streetlevel, and another study that used Google Street View imagery to model the pedestrian experience in an urban canyon [81]. Across these studies, street-level imagery provided an alternative to methods requiring more costly 3D-modelled reconstructions.
With regard to the use of street-level imagery to infer socioeconomic and demographic characteristics of cities, a number of studies illustrate how it may serve as a proxy for administrative datasets for comprehensive health and demographic surveillance. Suel et al. [82] analyzed over a million Google Street View images in a neural network to categorize areas of London according to common measures of socio-demographic inequality. In comparison to conventional sources of such data (e.g., census), the predicted results demonstrated considerable alignment with observed data, suggesting a new big data source for area-level demographic surveillance. Similarly, a study by Gebru and colleagues [83] used a deep learning model to identify car year and make information from 50 million Google Street View images across 200 US cities, demonstrating how neighbourhood-level socioeconomic status could be inferred at a large scale, with considerable accuracy and time and cost efficiency (see also [84]). In another study, a CNN was trained to predict neighbourhood-level income brackets based on Google Street View images of Oakland, aiming to answer the question "what observable features predispose a locale to low or high poverty levels?" [85].

Area Quality and Reputation
Street-level imagery is also analyzed to assess the quality or reputation of urban areas based on both subjective human perceptions as well as computational approaches that seek to provide objective comparative assessments. As part of a larger research focus on socalled 'neighbourhood effects' linking social outcomes to urban spatial characteristics [86], area quality research using street-level imagery is often based on inferring the reputation of an area through the presence of particular elements within imagery, associating these features with phenomena such as crime, inequality, and 'anti-social behavior'. In general, this research analyzes place characteristics via street-level imagery deductively in an attempt to explain existing urban social phenomena (see [87]). Two distinct objectives can be discerned-analyses aimed at characterizing 'risky' areas (especially related to crime, disorder, and environmental hazards), and analyses aimed at characterizing 'livable' urban environments. Regarding risky urban spaces, researchers have analyzed street-level imagery to identify scenes and individual elements associated with criminal activity, often drawing from established theories in environmental criminology such as broken windows theory [88,89]. For instance, Langton and Steenbeek [9] developed a method to analyze burglary susceptibility through manually auditing images of residential properties in Google Street View. Comparing findings with local crime statistics, results suggested that certain characteristics of properties (e.g., ease of escape, extent to which it is closed to surveillance from neighbours) are associated with increased risk of burglary, allowing for more local-scale (individual properties) risk assessments compared with studies identifying risk as a function of neighbourhood-level wealth.
Computational methods are comparatively less common in this research area due to the more subjective nature of assessing area quality and reputation. A significant attempt to examine subjective perceptions of urban space at a large scale was the Place Pulse project at MIT (https://www.media.mit.edu/projects/place-pulse-new/overview/) (accessed on 8 June 2021). Place Pulse assembled a large database of images from Google Street View and enrolled members of the public to compare images according to perceptions of wealth, safety, liveliness, and so on. The resulting dataset of labelled and categorized images has been used to train deep learning models to recognize urban quality of life indicators embedded in urban environments, including research that used this dataset to create the Streetscore measure of urban safety for the US [37], and to identify six quality of life indicators (safe, lively, beautiful, wealthy, boring, depressing) globally [90]. Similarly, Choiri [91] crowdsourced the labelling of 800 street-level images of Amsterdam for perceptions of 'urban attractiveness' and used it to train a CNN model which would enable the automated identification of attractive areas from much larger imagery datasets. Several studies used Tencent Street View images to assess the 'visual quality' of streetscapes in Chinese cities. This includes Tang and Long's study [92] of the historic Hutong areas of Beijing which accessed imagery captured between 2012 and 2016 from Tencent's 'Time Machine' feature, illustrating the potential for temporal analyses with street-level imagery (see also [93]).

Advantages and Benefits of Street-Level Imagery in Urban Research
A majority of studies explicitly considered the benefits of street-level imagery, and in many cases its added value over conventional urban data sources. Notably, however, more recent studies were less likely to explicitly indicate advantages and benefits, which, along with the substantial increase in published studies in recent years, suggests a maturation of street-level imagery as a recognized data source for urban research. From analyzing such statements, this review identified several key benefits and advantages of streetlevel imagery in two distinct but related areas: (1) research design, and (2) knowledge production.

Research Design
In studies based on manual data extraction, many researchers noted that virtually auditing urban streetscapes enables rapid data collection, across a growing number of cities around the globe covered by street-level imagery, and at a cost deemed considerably lower than in-person visits [94][95][96]. Chang et al. [97] praised Baidu imagery due to its low cost, its coverage of 95% of Chinese cities (three million kilometres of streetscapes), and regular updates which enable spatial and temporal analysis. Since street-level imagery is generally one component of a digital mapping platform, researchers also explicitly identified the ability to extract precise geographic coordinates through the API as a key advantage over other sources of imagery [45,98,99], such as Flickr, which may not include location metadata. Interestingly, however, no studies explicitly detailed the cost of accessing imagery in bulk via an API, although costs are likely to be quite low for all but the largest-scale image acquisitions. The Google Street View Static API currently charges just $5.60 USD per 1000 panoramic images (when up to 500,000 are accessed) via its 'pay-as-you-go' pricing model, which also provides access to image metadata including geographic coordinates and timestamp. Familiarity with street-level imagery platforms was identified as a benefit for studies employing a crowdsourced research design, since participants would not need explicit training in their use [40]. Similarly, researchers noted the potential for enhanced researcher/participant comfort and safety through the 'remote presence' of virtual audits, which may be particularly valued when auditing areas deemed risky or dangerous [100].

Knowledge Production
The literature also argues the case for the epistemic advantages of street-level imagery; in other words, its unique affordances enable new knowledge about the world to be produced. A number of studies claim that street-level imagery can enhance our understanding of urban spaces, features, and processes through the unique pedestrian/vehicle point of view (POV), not easily captured in others forms of imagery or representation [62,101,102]. The possibility of new forms of analysis and knowledge production is identified in builtenvironment feature recognition and land use classification applications, as images are generally captured close to the subject matter, and from a POV that reveals greater detail and variation, especially compared to the aerial POV. With regard to urban modelling and demographic surveillance applications, researchers noted the general ease with which millions of street-level images could be accessed and analyzed in a deep learning environment. This facilitates large-scale comparative analysis and generalization under accelerated timescales, which could substantially improve our understanding of urban form, pattern, and process [83,84].

Limitations and Weaknesses of Street-Level Imagery in Urban Research
Much of the literature also considered limitations and weaknesses. Although only explicitly mentioned in a few studies, a key weakness identified across the studies in this review is the limited dimensionality of data extracted from images, whether using a manual or computational approach (Berland et al., 2019;Meunpong et al., 2019). In other words, many studies used street-level imagery as a source of data on the binary presence/absence of features, objects, or phenomena in geographic space-where things are located as opposed to what their attributes are (e.g., quantities, qualities, values). A number of studies did attempt to infer quality and quantity information of scenes and objects; however, the overall limited ability to definitively capture attribute information reveals a substantial limitation of street-level imagery as a source of data on cities. Estimates and inferences may become more reliable as standardized virtual audit instruments and machine learning techniques develop further and are systematically compared against other data sources. The presence of objects such as light poles, vehicles, or pedestrians can obstruct features of interest [103]. This limitation can also be further exacerbated by factors intrinsic to image capture and processing; namely, lighting, seasonality, weather conditions, or privacy blurring, which can reduce image quality and, therefore, their potential representational affordances [82]. In particular, this means that street-level images are sometimes unreliable for temporal analyses due to shifts in image quality between years and missing data during a particular time period, in addition to missing time stamps in older imagery [32,[104][105][106].
While some studies based on manual data extraction interacted directly with streetlevel imagery in the platform's virtual 360 • environment, a considerable amount of research is based on analysis of rectangular flat images extracted from the full panorama. Depending on the area of the image captured and the specific projection used, there may be some geometrical distortions that might affect the accuracy of results in feature recognition applications [46,107,108]. There is currently limited applicability of this 'fake 3D' imagery [109] for analyses based on depth and distance [79]; however, the recent integration of LiDAR sensors into street-level imagery capture processes by Google, Microsoft, and Apple may provide future studies with the ability to undertake distance and depth measurements. Relatedly, however, the role of private corporations as gatekeepers of street-level imagery may be a concern for at least two reasons. First, access depends on the company continuing to apply the current low or no-cost structure for research, but this could easily change. Google, for example, is attempting to diversify their revenue generation streams away from their current business model based on personal data accumulation and targeted advertising, including increasing fees to access their APIs [110,111]. Second, corporate platforms decide what cities, and what areas within cities, are captured in street-level imagery, as well as how often updates occur. Platforms make these decisions based largely on an economic return on investment, and so relying on street-level imagery as a source of information on cities could result in a further urban inequality-between those areas deemed worthy of street-level imagery coverage (and, therefore, analysis), and those left invisible and unanalyzed [3]. On the other hand, as seen in some studies described above, large-scale automated classification of urban spaces according to risk categories (e.g., unhealthy, crime-ridden) produces 'hypervisibility', which may have negative consequences for identified social groups and neighbourhoods [112]. While individual privacy is addressed through blurring of faces and sensitive information, more recent articulations of 'group privacy' [113] suggest that such analyses can have harmful effects and so the ethical consequences require further attention. Finally, largely missing from the reviewed literature is a substantial awareness about threats to civic and national security due to the comprehensive, panoramic evidence of cities and their inhabitants provided through this imagery. For instance, imagery showing critical urban infrastructure, national security sites, and gathering locations for large groups could provide nefarious actors with a source of data for targeting their actions.

Conclusions and Future Directions
This paper provides the first comprehensive state-of-the-art review of the use of panoramic images from street-level imagery platforms around the world, across the full spectrum of research on cities. Results of this scoping review provide a detailed knowledge base for researchers interested in using street-level imagery, by identifying the platforms used around the world for accessing images, key areas of application in urban studies, methods of data extraction and analysis, and the key benefits and limitations. Overall, the results point to an accelerated use of street-level imagery as a source of urban data in recent years, as corporations continue to increase the amount of urban area captured in 360 • , and as manual and computational methods of image-based data extraction and analysis become further established in the research community. The review identified considerable advantages and benefits including low-cost, rapid, and widespread data collection, enhanced safety through remote data collection, and a unique pedestrian POV that presents not just an alternative data source but a way to capture cities from the perspective in which people experience them. However, several limitations constrain its potential as an urban data source, including the limited ability to capture information beyond the existence/absence of spatial features, unreliability for temporal analyses, limited use for depth and distance analyses, and the role of corporations as image-data gatekeepers.
The continued growth of research applications for street-level imagery over the 14-year review period suggests expanded use of this technology as new opportunities arise, which may also address some of its current limitations. As indicated above, corporate panoramic imagery platforms are integrating additional sensors into image capture processes, including Google Street View's addition of LiDAR for depth sensing and 3D modelling, higher resolution cameras for automatic object recognition, and Aclima air quality sensors [114]. Public access to data from these sensors is limited, although Google has now made street segment-level air quality data for Copenhagen and London available via their Environmental Insights Explorer program [115]. Further access to these and other sensor data would expand possibilities for 360 • urban sensing. Although few studies were identified that used street-level imagery to analyze rural spaces, increasing coverage outside of cities suggests the potential for greater use of this imagery for analysis of rural areas in the future. The use of street-level imagery in virtual reality environments [116] is another area of research likely to see increased attention in the coming years, in particular since it is now easy to create your own panoramic imagery and immersive environments using low-cost technologies [3]. Consumer-grade 360 • cameras are capable of capturing not just photos but also panoramic videos and spatial audio, which could make possible new forms of urban analytics independent of the restrictions of corporate street-level imagery ecosystems.
This review was conducted systematically following the scoping review approach and stands as a comprehensive synthesis of academic research on the use of street-level imagery for research on cities. However, it may be possible that a small number of studies were missed due to non-inclusion in the databases used for the review. Further, some studies may have been unintentionally excluded due to human judgement or error; for instance, deciding whether an article fit the 'urban research' inclusion criterion was not straightforward in a minority of instances. Finally, the focus on English language publications will have likely excluded some research published in other languages, and so future reviews could consider including research published in a variety of languages.

Conflicts of Interest:
The authors declare no conflict of interest.