Next Article in Journal
Construction, Detection, and Interpretation of Crime Patterns over Space and Time
Previous Article in Journal
An Approach for Filter Divergence Suppression in a Sequential Data Assimilation System and Its Application in Short-Term Traffic Flow Forecasting
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Crowdsourcing Street View Imagery: A Comparison of Mapillary and OpenStreetCam

1
Department of Computational and Data Sciences, George Mason University, Fairfax, VA 22030, USA
2
Center for Geoinformatics and Geospatial Intelligence, George Mason University, Fairfax, VA 22030, USA
3
Department of Geography and Geoinformation Science, George Mason University Fairfax, VA 22030, USA
4
Department of Computer Science, William and Mary, Williamsburg, VA 23187, USA
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2020, 9(6), 341; https://doi.org/10.3390/ijgi9060341
Submission received: 8 April 2020 / Revised: 12 May 2020 / Accepted: 24 May 2020 / Published: 26 May 2020

Abstract

:
Over the last decade, Volunteered Geographic Information (VGI) has emerged as a viable source of information on cities. During this time, the nature of VGI has been evolving, with new types and sources of data continually being added. In light of this trend, this paper explores one such type of VGI data: Volunteered Street View Imagery (VSVI). Two VSVI sources, Mapillary and OpenStreetCam, were extracted and analyzed to study road coverage and contribution patterns for four US metropolitan areas. Results show that coverage patterns vary across sites, with most contributions occurring along local roads and in populated areas. We also found that a few users contributed most of the data. Moreover, the results suggest that most data are being collected during three distinct times of day (i.e., morning, lunch and late afternoon). The paper concludes with a discussion that while VSVI data is still relatively new, it has the potential to be a rich source of spatial and temporal information for monitoring cities.

1. Introduction

Cities are complex, dynamic systems that require various sources of geographical data for monitoring and assessing their health and wellbeing. Traditionally, data collection was a top-down process with government agencies and private companies collecting most of these data. Now, with the proliferation of location-aware devices and Web 2.0 services and applications, a fundamental shift has occurred in the way that geographical data on cities are being collected and the role that individuals now play in this data collection process [1]. Online users are not only consumers of geographical data but also producers of it. These data can range from personal information about users themselves (e.g., check-ins on online social media platforms such as Swarm and Facebook) to that of contributed information with much wider societal significance (e.g., mapped locations of damaged buildings following a natural disaster on OpenStreetMap (OSM). Over time, this has led to increasing amounts of geographical data being contributed by users. Goodchild [2] introduced the term Volunteered Geographic Information (VGI) to contrast this new stream of geographic information to more traditional sources collected by more authoritative bodies. Popular examples of VGI include contributions made to online platforms such as OSM and Wikimapia (e.g., [3,4,5,6,7,8]).
Within the realm of VGI, Street View Imagery (SVI) has emerged in recent years as a novel and rich source of data on cities from which geographic information can be derived. Perhaps the most well-known example of SVI utilization is that of Google Street View (GSV). Offered as a free online service, GSV provides interactive panoramas along mobility corridors such as streets and walking paths, which can be used to virtually explore an area. While SVI has been traditionally collected by governmental agencies and companies alike, we are now also witnessing the emergence of Volunteered Street View Imagery (VSVI), which relies on a crowdsourced effort to provide geotagged street-level imagery coverage of traversable pathways (e.g., a street or trail). Such imagery, similar to GSV, provides detailed information about the location of objects such as cars, road markings, traffic lights and signs and allows for the automatic extraction of features at scale. Such imagery can also be mined using machine learning algorithms to automatically derive points of interest (POI) databases (e.g., locations of coffee shops and fire hydrants) without the intervention of the citizen. As such, this automated process has the potential to generate much more geographical information than ever before. Moreover, there is a growing number of SVI service providers worldwide [9], highlighting an important extension of mapping urban areas from traditional foundational features (e.g., roads and buildings) to finer-scale features at a much higher level of fidelity (e.g., street signs and traffic lights).
This newly emerging type of information is already being slowly harnessed in support of smart and sustainable city initiatives. For example, SVI is being used to update land use and land cover maps (e.g., [10,11]) to assess and monitor physical road conditions (e.g., [12]) and conduct virtual inspections of critical infrastructure and is being used as a tool for assessing property values within cities (e.g., [13]). Moreover, these data are also providing new insights into cities, such as understanding the predictors of urban change (e.g., [14]) and travel patterns within them (e.g., [15]). As SVI, and in particular VSVI, becomes increasingly available, its utility as an important source of geographical data on people, place and society is expected to increase as well.
Motivated by these advancements, this paper examines VSVI data collected from two different platforms: Mapillary [16] and OpenStreetCam (OSC) [17]. These online platforms accept sequences of images captured from mobile devices and uploaded via an app on the device. Images are geolocated using the device’s global positioning system (GPS). OSC users can additionally share their vehicles’ on-board diagnostics to help improve geolocation accuracy in areas with poor GPS signal (e.g., dense tree cover or tunnels). A key premise in this work is that these platforms are providing new sources on information that together with other sources of information on cities, can be used to provide a better understanding of their evolution and needs (e.g., infrastructure) over time. Using the two VSVI platforms (as discussed above) as a case study, we examine two lines of inquiry:
a)
An examination of the level of spatial coverage of each platform in order to assess the overall potential of such platforms to provide adequate coverage of geographic information.
b)
An examination of user contribution patterns in Mapillary and OSC in order to understand how users are contributing to these platforms.
The remainder of this paper is organized as follows. Section 2 presents a background of previous work on VGI and in particular studies involving the use of SVI. Following this, Section 3 presents the study areas, data and methodology used for our case studies followed by the result and an analysis of our findings in Section 4. Finally, Section 5 summarizes this paper, outlines its key findings and provides a discussion of how VSVI has the potential to shape the future of geographic information generation with respect to monitoring cities, along with offering areas of further work.

2. Background

Today, VGI comes in many types and from many different sources. One example is that of image data, which are found all over the Internet and are fast becoming one of the most popular sources for deriving geographical information. In large part, this is due to the exponential growth in the number of GPS-enabled mobile phones and social media users who capture and upload massive amounts of image data on a daily basis [18]. Examples of studies that use such data include those using Instagram to explore the relationship between tourist hotspots and safe areas within cities (e.g., [19]), studying people’s perception of their environments (e.g., [20]) and the development of a route-based travel recommendation system using Flickr (e.g., [21]), and exploring the use of Twitter images to map the spatial extent of cities (e.g., [22]). While such platforms provide a rich and useful source of information on cities, they often contain a lot of noise as users are not restricted to any specific location or to capturing any specific feature within the city.
As noted in Section 1, another source of image data on cities is SVI, which is typically captured along roads, walking paths and other mobility corridors. Due to its more focused effort, SVI tends to include less noise and provides greater spatial coverage along roadways compared to images collected from social media platforms. Google has traditionally been the main provider of SVI, using professional cameras mounted onto various platforms (i.e., vehicles, motorcycles, snowmobiles, special backpacks and indoor trolleys [23]). Collecting imagery in this way is both costly and time-consuming, with few companies having the resources necessary to compete in this realm. Some companies such as Apple are expected to unveil their own SVI product in the near future [24]. While other companies (e.g., [25]) are currently expanding efforts towards global coverage, it is still unknown how these companies will measure up to Google as competitors in the provision of SVI services in the mid- to long-term future.
As GSV has become more prevalent in platforms such as Google Maps and Google Earth, many researchers have used such imagery to study different aspects of cities, exploring mental and physical health (e.g., [26,27,28,29,30,31]), gentrification (e.g., [32,33]), classification of different building types (e.g., [34]), quantification of green canopy coverage along streets (e.g., [35]) and generating scenic routes (e.g., [36,37]) within cities. Other research has additionally used deep learning to automatically extract features (e.g., street signs) from GSV to study health inequalities (e.g., [38]) and the demographic makeup of neighborhoods within cities (e.g., [39,40]), for monitoring urban assets (e.g., [41,42,43]) and for land use characterization (e.g., [44]). However, as a commercial product, Google restricts the amount of GSV data that can be downloaded and used [45]. This presents issues when trying to scale up such studies to large geographical areas.
An alternative to commercial SVI is VSVI. Currently, only two publicly available online volunteered platforms exist for SVI: Mapillary and OSC (a subsidiary of Telenav). As these companies rely on mobile devices and the crowd to collect data, they have a lower barrier for data collection. Similar to Google, these companies use SVI for commercial purposes. Specifically, the imagery is being used to extract features of interest (e.g., traffic lights, road signs and crosswalks) to fit specific business needs. In the case of Telenav, OSC information is also being used to improve the quality of its navigation products [46]. VSVI is fundamentally changing the tenants of VGI. VGI contributors, for the most part, have always played a central and active role in the lifecycle of these data. VGI users spend significant amounts of time contributing the data, validating their accuracy online, and are the primary end users of these data. In some cases, they have also assisted with making decisions concerning the future direction of these data as well (e.g., OSM Foundation board). However, in the case of Mapillary and OSC, users now have a more passive role; they provide the platforms (i.e., smartphones and dashboard cameras) and are acting as sensors in collecting the imagery. Compared to the time consuming and labor-intensive point-and-click approach of collecting traditional VGI data, massive amounts of VSVI are being collected in an automated manner and uploaded to these online platforms. Once there, machine learning algorithms are used to extract and mine specific features of interest from these data, which has the potential to support many sustainable and smart city initiatives. For example, the growth and expansion of POIs such as coffee shops can be extracted from SVI and monitored over time as an early warning indicator of gentrification occurring within cities [47,48]. Figure 1 shows user interfaces for Mapillary and OSC, respectively. This figure shows the immersive nature of this data compared to the traditional 2D map view of representing cities.
Specific to VSVI, few studies to date have assessed the usefulness of these data for urban applications. Juhasz and Hochmair [49] compared user contribution patterns and spatial completeness for Mapillary and found that most users contributed on a regular basis, with the largest number of users located in Europe followed by North American countries. This pattern followed an expected power-law relation with very few users doing most of the work (which has also been reported for OSM by Ma et al. [50]). The authors also found that these countries had the largest completeness values. Comparing these results with GSV, they further reported that GSV consistently provided greater completeness. Similar results were also reported by Juhasz and Hochmair [51] assessing the completeness of Mapillary for cities in Germany and Austria. Building on this work, Juhasz and Hochmair [52] further examined the cross-linkage between Mapillary and OSM and reported that most Mapillary tags used within OSM relate to changesets (i.e., group of edits) compared to individually edited features. In the same study, they identified that Mapillary was primarily being used to map transportation (i.e., highway, public transport, traffic sign) and leisure-(i.e., natural, amenity, tourism) type features. More recent work by Quinn and Leon [53] qualitatively compared coverage across GSV, Mapillary and OSC platforms and found that GSV has often taken an all-or-nothing approach to collecting coverage in world cities, whereas contributions from both Mapillary and OSC were more evenly distributed. Further, Ma et al. [54] compared the spatiotemporal patterns of contributor activity at the country level between Mapillary and OSM. They showed that while there was less inequality in contributions from Mapillary as compared to OSM, collection patterns in Mapillary tend to be more seasonal (e.g., users contributed more VSVI data during warmer months of the year, such as June and July, compared to months with lower temperatures, such as December and January).
Compared to the above work on summarizing VSVI data, other work has further explored the utility of these data. Mapillary, for example, has been used as a source of in-situ data for crop identification, rotation and phenology (e.g., [55]). Dev et al. [56] applied deep learning approaches to Mapillary images to extract advertisement billboards. Different approaches to segment and extract features have also been explored (e.g., [57]), along with the positional accuracy of features extracted from such imagery (e.g., [41,42]). These studies all show the potential of using VSVI to support urban applications.
Unlike the previous studies, to the authors’ knowledge, this is the first study to compare the spatial patterns of coverage and contributor activity across two VSVI platforms—Mapillary and OSC—in a systematic and quantitative manner. The closest work to our own is by Quinn and Leon [53]; however, that study only used a qualitative approach to compare road coverage, which is not scalable to larger geographical areas and for the continuous monitoring of cities over time. This is especially important given the growing and dynamic nature of cities, which require up-to-date and timely information in order to provide feedback for adequately managing their health, well-being and growth. Further, this paper also examines a much longer temporal period and compares the types of roads being mapped in Mapillary and OSC to an authoritative data source, TIGER. Moreover, we include a discussion of the future of this emerging trend in VSVI consumption and its utility in shaping the future of smart and sustainable cities (see Section 5), and we would argue that such a discussion, related to SVI, is not being discussed in papers that discuss emerging trends in VGI (e.g., [58,59]).

3. Methodology

Four Census Metropolitan Statistical Areas in the United States were selected as case studies in this research: Washington (District of Columbia), San Francisco (California), Phoenix (Arizona) and Detroit (Michigan). Their selection was primarily based on their geographical dispersion and the availability of data for all three road sources in the United States as shown in Table 1. This table also shows the data source, type of data, mode of acquisition and period for which the data were collected.
Moving from data to methods, Figure 2 outlines the various steps used in our methodology. The first step involved the retrieval of Mapillary and OSC data from their respective online platforms. Data on authoritative roads were extracted from the Topologically Integrated Geographic Encoding and Referencing (TIGER) database as polylines. To extract Mapillary and OSC road data, their online Application Programming Interface (API) was used to extract point traces. Both APIs accept different user inputs to extract data. In the case of Mapillary, the coordinates of the minimum bounding rectangle extracted from each study area were passed as a query to the Mapillary API. A json file containing the sequence id, authors’ name, timestamp and latitude and longitude for each image was then retrieved. The sequence id information was then used to reconstruct road segments. In the case of OSC, a two-part querying process was used to retrieve data. The latitude and longitude for each road intersection in each study area and a search radius value of 5 km were passed as a query to the OSC API. This search radius was used because a larger radius resulted in an API error. The result of this query was a json file containing the sequence id of all road sequences within 5 km of a road intersection. Duplicate road sequences were then removed from these data, and the sequence ids were passed to the OSC API once more to retrieve all image locations for each sequence id. The retrieved json files contained information on the sequence id, sequence index (ordering of images), authors’ name, timestamp and latitude and longitude information.
Step 2 involved the postprocessing of the road and population data. In order to be able to compare the different data sources, each source was clipped into a grid with a cell size of 1 × 1 km (similar to the approaches used in [65,66]). This particular grid size was chosen as it allowed us to match the grid resolution in which the population data is provided. The clipping process of the vector data (i.e., TIGER, Mapillary and OSC) was done using the GeoPandas Python library [67]. Figure 3 shows the spatial distribution of TIGER, Mapillary and OSC polyline data for the study sites. This figure also shows the coverage of Mapillary and OSC to be variable across the sites, with TIGER having the most roads in general. A discussion of the analysis of the magnitude and extent of these patterns in the data after the clipping process is presented in Section 4.
The final step involved the processing of the data and comparing the resulting statistics across sites. For each road layer, the total length of roads per 1 × 1 km grid cell was computed, and their spatial coverage was compared to population density. The spatial coverage of Mapillary and OSC roads were also compared to the TIGER road layer. To compare variables across sites (e.g., road length and number of contributors and images per 1 × 1 km grid cell), data for all variables were stacked, and the Jenks optimization algorithm was used to determine suitable breakpoints for visualizing these data. It should be noted that road conflation was not applied to the VSVI data as it is a significant research challenge that is beyond the scope of this research. While attempts were made to implement a tolerance-based conflation solution with varying tolerance thresholds, these did not yield consistent and reliable results (e.g., cases where one contribution was matched to multiple road segments). This issue is a well-known challenge as automatic road conflation is often considered an algorithmically complex and time-consuming process, with the spatial completeness of results dependent upon various factors, including the accuracy of the map data. The various limitations of existing conflation methods have been widely discussed (see [68,69,70,71,72] for further details). As a result, manual approaches to conflation still continue to be widely used today for conflating roads and other map data [73]; however, such methods are not scalable for our study. Given these considerations, in this paper we focus on spatial coverage, which could be more reliably computed, and not spatial completeness. To compare temporal patterns in Mapillary, OSC and TIGER, the timestamps of user-contributed road traces were collected for the different sites. This information was stored in Coordinated Universal Time (UTC) format and was converted to the local time at each site. The day and hourly information of user contributions were then extracted and analyzed.

4. Results

In this section, we provide the results of this study. Section 4.1 provides information on road coverage patterns between the different data sources. Section 4.2 and Section 4.3 provide spatial and temporal contributor patterns in Mapillary and OSC respectively. Finally, Section 4.4 analyzes the types of roads mapped by contributors.

4.1. Spatial Comparison of Road Network Coverage

An overall summary of road statistics for all three data sources in the study areas is provided in Table 2. As can be seen, a comparison of Mapillary and OSC with TIGER coverage ranges between 14% to 31% and 14% to 53%, respectively. With the exception of Detroit, both platforms differ on average by about 10% in their road coverage compared to TIGER. Furthermore, for the study areas of Phoenix and Detroit, Mapillary has larger computed total road lengths, which is primarily due to the non-conflation of road traces in this study. In this case, several users may contribute road traces for the same 1 × 1 km grid cell area. Table 3 shows summary statistics for road length, number of unique contributors and the number of images collected per 1 × 1 km grid cell area. The results show variation between all three variables across the four study sites.
Figure 4 shows the total length of roads per 1 × 1 km grid cell area across the four study sites. In Washington and San Francisco, Mapillary has greater spatial coverage than OSC, while for Phoenix and Detroit, OSC has greater coverage than Mapillary. OSC coverage is especially prominent in Detroit, which, unlike other study areas, also contains a large number of rural roads mapped in these data. As Mapillary and OSC are still emerging sources of data, it is understandable at this current time that TIGER, the more authoritative and mature data set, has the most complete coverage at all sites. In all platforms the highest density of roads were located in in urban areas.
In previous work that has studied VGI, population density has been shown to correlate with user contributions (e.g., [74]). Following this line of inquiry, we examined whether the same trend also applies to these new VSVI data sources. To accomplish this, we overlaid LandScan ambient population data with each road data source using a 1 × 1 km grid. The population data for each study site was then normalized for a comparison between the different sites. The normalization process involved dividing the population density in each grid cell by the total population density for that study area. This normalized population density was then divided by the total road length per grid cell (when the total road length was zero, a zero value was assigned to this calculation) in order to examine whether higher road lengths are associated with more densely populated areas. The relationship between normalized population density and road length per grid cell is shown in Figure 5. This figure shows that while population density varies across the study sites, there is a noticeable association between areas with high population densities and the total length of roads contributed in those areas. As was previously suggested for other VGI data such as OSM, this association may be in part due to the large number of users at those locations who map roads there [6]. Unlike OSM, however, where volunteers can contribute data from anywhere in the world as long as they have Internet access, with respect to VSVI, contributors to these platforms have to be onsite to collect the data. Furthermore, another possible explanation for this is that areas with high population density are often accompanied by road segments, and particularly road intersections, that are more likely to be traveled by VSVI contributors.
To more quantitatively assess the strength of the associations between population density with the length of roads mapped and the number of users that have mapped them, three correlation measures were used: Pearson’s r, Kendall’s Tau and Spearman’s rank. These measures have been widely used for examining pairwise associations between variables and provide parametric and non-parametric options for comparison [75,76]. Further, they have been implemented in various statistical software and programming packages (e.g., SAS, SPSS, the R Project and Python) for reproducibility of results. The results of these correlations are shown in Table 4, with the total length of roads per 1 x 1 km grid cell area included in parentheses. Both linear (i.e., Pearson’s r) and non-linear (i.e., Kendall’s Tau and Spearman’s rank) measures suggest a moderate relationship [77] between population density with the length of roads mapped and the number of contributors. These results are in part related to the fact that there are few unique users who map each city and the variability in the number of roads that each user contributes (this will be discussed further in Section 4.2).
Delving further into the quantitative differences in spatial coverage across the four study sites, Figure 6 shows the pairwise differences in road coverage between each data source. In Washington and San Francisco, Mapillary has more roads than OSC, whereas for Phoenix and Detroit, the length of roads in OSC is greater (first row of Figure 6). At some specific locations, especially within the urban cores of Washington and San Francisco study sites, we found that Mapillary and OSC have similar amounts of road coverage (i.e., white clusters in the first row of Figure 6).

4.2. Unique Contributors

To further understand distribution patterns in Mapillary and OSC, contributor activity was also analyzed. Figure 7 shows the total number of unique contributors per 1 × 1 km grid area. This figure shows that Phoenix and Detroit have a greater number of unique contributors in OSC. Mapillary, on the other hand, has more unique contributors in Washington and San Francisco. At all sites, however, the number of unique contributors per 1 × 1 km grid area is relatively small (ranging from 1 to 26). This is in addition to the overall total number of unique contributors at each study site, which is also small. For Washington, San Francisco, Phoenix and Detroit, the total number of unique contributors was 192, 143, 44, and 26 for Mapillary, and 25, 32, 56 and 99 for OSC, respectively. This pattern aligns with a similar finding from an earlier study by Juhasz and Hochmair [49], which only explored Mapillary. The small number of contributors may be due to several factors surrounding participation inequality in VGI, as previously discussed in [78], the fact that users must be onsite to collect the data as previously discussed in Section 4.1, or that these new sources of VSVI are still in their infancy and their user base is still evolving (which we explore in more detail in Section 4.3).
Turning to the number of grids mapped by contributors, Figure 8b shows variation across sites in OSC. This variation can be in part explained by the size of the urban areas. As was discussed in Section 4.1 (see Figure 4), urban areas have more mapped contributions at all locations, and as a result, sites with larger urban areas (i.e., Phoenix and Detroit) have more mapped grids. Mapillary, Figure 8a, showed similar patterns in the number of grids mapped by contributors. As suggested in [46], Mapillary makes an explicit attempt to differentiate the spatial coverage of its roads from other SVI service providers, in part related to their business-to-business sales model, which is primarily dependent upon the extraction of urban features from SVI. This is in comparison to OSC, which is mainly concerned with using user contributions for improving the navigation technologies of its already well-established parent company Telenav [53]. Further examination of the average length of roads contributed by individual users on both platforms is shown in Figure 9. In general, a typical Mapillary user is mapping more roads on average compared to a typical OSC contributor. This finding may also help explain the much greater coverage of roads in Mapillary compared to OSC at some sites.
Finally, with respect to the number of 1 × 1 km grid cells mapped more than once by contributors, in both platforms, Table 5 shows the percentage of all 1 × 1 km grid cells at each site where users contributed more than once to that cell over the entire study period. This table shows a lot of duplication at all sites, with most duplication occurring in the larger cities of Phoenix and Detroit.

4.3. Temporal Analysis

Figure 10 shows the total number of image sequences collected by day of the week. This figure shows, in general, that Mapillary has more contributions at each site and for all days of the week compared to OSC. In comparison, with the exception of Detroit, OSC has more contributions most of the week than Mapillary. The spike in activity in Mapillary on Tuesdays and Thursdays for Detroit is primarily due to the contributions of one person that uploaded 15% and 22% of all road data for those days, respectively. No patterns can be observed between the two platforms for weekdays vs. weekends.
Moving from a daily to an hourly temporal resolution, Figure 11 depicts the distribution of total user contributions over a period of 24 h in increments of one hour (from 0 h to 23 h local time). Here, only Mapillary contributions were used as roughly 50% of the OSC data did not have timestamp information when it was retrieved from the API. As our analysis is based on the total distribution of user contributions across time (rather than a relative measure), a full comparison of these distributions across Mapillary and OSC is therefore not possible. Figure 11 shows generally higher contributing activity around 8 a.m., 1 p.m., and 5 p.m. at all sites. These times correspond approximately to the daily rhythms of city life (i.e., morning and afternoon commuting and lunch). This finding agrees with a recent study by Ma et al. [54] for several countries (including the US), showing higher levels in contributor activity in Mapillary during the day, but the actual hour in which contributions peaked varied between countries.
Finally, in order to further understand how users contribute to Mapillary and OSC over time, the cumulative lengths of image sequences in both platforms were analyzed. The results of this analysis is shown in Figure 12. This figure shows that with the exception of several short bursts in collection activity, that occur at different dates, collection patterns in Mapillary tends to be increasing at relatively consistent rates at each site. In comparison, most OSC contributions, as discussed before in Section 4.2, occurred in Phoenix and Detroit; at these sites a consistent upward increase in contributions is noticeable around mid 2017, increasing a much later point in time compared to Mapillary. The changes in contributions in Washington and San Francisco in OSC remain relatively constant throughout the study period. With lack of information on users we are not entirely sure as to the specific reasons for these spikes in activity in Mapillary and the growth rate in both platforms. It is important to note that both platforms have at times used their employees to help fill gaps in missing coverage [53]. Both platforms also use gamification as a way of incentivizing its users to contribute data, with Mapillary also having a paid driver program [79]. It is expected that these factors are in part responsible for the difference in contributions patterns at the different sites in Mapillary and OSC.

4.4. Road Categories

The final part of our analysis compared mapped road categories in Mapillary and OSC to TIGER road data, which contained road categories based on the National Map Feature Road Class (TNMFRC) classification system [80]. In order to do this, each road layer from Mapillary and OSC was first segmented into a list of individual edges. Then, the edges from Mapillary and OSC were assigned a road class from the TIGER road layer using the road class in TIGER that had the minimum Euclidean distance to each edge. The total length of all associated road categories for Mapillary and OSC was then computed and is summarized in Table 6. Unlike the figures shown before, this table shows the types of roads that are being captured through VSVI. In all study sites in Mapillary, local roads have the most contributions followed by controlled-accessed highways. This pattern was also observed for Phoenix and Detroit in the OSC platform. The Washington and San Francisco study sites, on the other hand, have a higher contribution of controlled-access roads followed by local roads in OSC. These road types, as shown in the TIGER column of Table 6, account for most the road infrastructure at all the study sites. Another possible reason for the greater presence of controlled-access highways and local roads in Mapillary and OSC may be a function of their utility as used by Mapillary and OSC drivers. More specifically, controlled-access highways are providing transportation corridors for the flow of people to and from work as part of their daily commute, whereas local roads act as conduits for social connectivity between people compared to other road types.

5. Discussion

With new types and sources of data continually being added, the nature of VGI is evolving. This has been in large part due to the democratization of the Internet and related technologies that have allowed more people to engage in collecting and contribute geographic information. One such emerging source of VGI is VSVI, which, unlike traditional VGI sources, relies on the volunteered efforts of people using mobile devices and dashboard cameras to collect geotagged street-level imagery along traversable pathways. Such information has the potential to provide new insights into cities, capturing information at the street level where most societal interactions between people and places occur.
In order to better understand the utility of VSVI, and in particular, what feature types can be mapped from such raw data, it is first important to better understand the data characteristics. Towards this goal, this paper has examined the spatial coverage and contributor patterns across two VSVI data sources, namely Mapillary and OSC. To the best of our knowledge, this paper is the first to systematically and quantitatively analyze these two emerging VGI sources in terms of coverage and user contribution patterns in large urban areas in the US. The results of this study indicated that most Mapillary and OSC contributions occurred along local roads and control-access highways, and that the overall coverage in these sources is variable in comparison to an authoritative source (i.e., TIGER). This, as we further showed, may be explained at least in part by the large ambient populations in these locations (Section 4). It should be noted that in some of the study areas, in particular Phoenix and Detroit, the length of roads in OSC surpassed that of the authoritative road source. This result suggests that some roads in urban areas such as the ones studied here are being covered multiple times by VSVI, thus providing a potential source of more up-to-date information about roads in urban areas.
This study also highlights some noteworthy contribution patterns of VSVI. Specifically, our results showed that while the number of contributors varied across sites, only a few contributors were responsible for producing most of the raw data. User contribution patterns were also different in Mapillary and OSC. Specifically, we found that while patterns in coverage were variable for the different OSC sites, coverage patterns in Mapillary tended to be similar among sites. This finding may be linked to several factors, including differences in mapping practice or issues with participation inequality, a topic that has been highly researched for other VGI platforms such as OSM but which is still lacking within VSVI. Furthermore, user contributions in Mapillary tended to be higher around 8:00 a.m., 1:00 p.m. and 5:00 p.m. (local time). This finding suggests that VSVI contributions tend to coincide with the morning and afternoon commute and the lunch hour of the contributors. Notably, user contribution activity did not exhibit an observed pattern when examined across the days of the week. In the context of our study, these temporal contribution patterns seem to align with that of working professionals who may be collecting data as part of their daily routine movement in a city. While our study has highlighted that relatively few users are contributing to VSVI, this finding does not necessarily imply poor coverage of streets in dense urban areas. Take, for example, taxis in Manhattan, New York, where research has shown that just 10 taxis cover 33% of all street segments on a daily basis [81].
Overall, the results of this study demonstrate that while VSVI is still a relatively new form of VGI, it can provide a new valuable lens for understanding cities. For example, information derived from raw VSVI data can be used to assess the conditions of city infrastructure (e.g., sidewalks or potholes in need of repair), evaluate the abundance and condition of green vegetation along roads or identify missing or ambiguous road signage (as discussed in Section 2). This information can be used to study the evolution of cities at much higher spatial and temporal scales than was previously possible, providing a near-real-time connection between how micro processes at the street-level (i.e., function) are influencing macro changes at the city level (i.e., form). Such information could have a transformational impact on how we monitor and make decisions within urban environments.
However, as this study has shown, it is first important to understand the characteristics of VSVI before using it for any citywide implementation. One issue that we identified is that of gaps in spatial coverage. While this could be in part due to technical issues with the online hosting platform (e.g., having to process very large volumes of data), other issues could also be at play. For example, since VSVI contributors have to be onsite to collect the data, collection biases may come into play such as perceived “unsafe areas”, locations with poor road conditions or, more generally, areas where users might wish to avoid.
That being said, compared to more traditional VGI data (e.g., OSM) where contributors typically map specific features that may contain different sources of error (e.g., positional accuracy, selection bias and mislabeling) [82], VSVI captures everything at the street level along the traversed route. This makes VSVI a rich archive of information that can be used to extract new features without the need to collect data again, and as a source for validating and enriching existing data captured on cities. Moreover, these data can be used to understand activity patterns within cities. Take for example our finding of increased contributor activity at different times of the day. Such insights can be compared over time to better understand diurnal patterns of activity within cities. This links to our second issue that we identified with VSVI, that of variation in user contributions. It unclear at this time, due to the lack of information about the users, what motivates them to contribute at specific locations and times of the day. However, should such information become available, it could be used to study movement patterns at the scale of individual users to better understand why certain places within the city may be more or less interesting to them. These examples highlight how cities can benefit from using VSVI towards gaining a greater understanding of what is taking place within them. This could provide actionable information for urban planners and policymakers for making more informed decisions.
As more SVI data sources become available (e.g., from drones and the expected growth of autonomous vehicles), it is important to remain cognizant of some of the potential concerns with the use of these data. Chief among these is the issue of privacy. While efforts are being made to automatically obfuscate personal information such as license plates or human faces in VSVI data, such solutions are not always perfect. Similarly, users who contribute SVI data may unintentionally compromise personal information by repeatedly collecting data along routine commuting routes (e.g., [83]). Other issues of concern include the quality of VSVI data from different sources, the availability of relevant assessment methods [84], location spoofing [85], the uploading of fake data [86] and issues with participation inequality [87,88]; however, these could be overcome as the number and diversity of contributors and data sources increase. In order to advance our understanding of VSVI in light of these issues, and in particular with respect to data trustworthiness and quality, we suggest that additional research that focuses specifically on spatial and temporal user contribution patterns is needed. For instance, as noted in Section 3, while our study did not utilize conflation methods, further research that utilizes conflation and other trajectory matching methods to detect common and uncommon user contribution trajectories in VSVI data is needed. The detection of such contribution patterns could further assist in protecting personal user information and assessing VSVI data quality. We would also recommend carrying out a more detailed analysis in different cities and in different countries, allowing a comparison of spatial coverage among them (e.g., [49,53,54].
Looking forward, as more people are expected to live in urban areas in the coming decades, and as machine learning tools further evolve and mature, we believe that VSVI will increasingly become a key information source for a better and more timely understanding of both the form and the function of cities. This, we argue, could potentially transform how VGI data are collected and used to meet the ever-growing geographic information needs of cities in the 21st century. Specifically, we argue that while in the past VGI relied solely on humans to collect and extract geographic information on cities, recent changes in data collection platforms are shifting the role of humans to data collection alone, while automated machine learning techniques are largely assuming the role of generating information from those data. As a result of this trend, we foresee that humans will become increasingly removed from the information production process.
Given its potential, we believe that VSVI data in combination with machine learning can play a central role in addressing such needs. As such data become increasingly available, it can be mined through the use of machine learning algorithms (e.g., so-called “deep learning” image analysis) in order to better analyze and understand the underlying patterns and processes that shape cities. Already, some US government departments have realized the importance of utilizing such machine learning approaches. Recently, for example, five US states of departments of transportation have uploaded their complete photologs of their road networks, totaling 40 million images and covering over 270,000 miles of roads, to the Mapillary platform [89]. This imagery is being used to help monitor and maintain state assets (e.g., signs) and to assess the safety conditions along roads. Some local governments are also establishing partnerships with Mapillary to benefit from their computer vision expertise in the same way [90]. We expect that many more such examples will occur in the future. From monitoring road infrastructure to deriving up-to-date POI databases, VSVI would allow us to better monitor the health of cities.

Author Contributions

All authors (i.e., Ron Mahabir, Ross Schuchard, Andrew Crooks, Arie Croitoru, Anthony Stefanidis) conceived and designed the research presented in this paper. Ron Mahabir and Ross Schuchard carried out the data collection and curation while Ron Mahabir carried out the data analysis; Ron Mahabir wrote the initial draft while all authors reviewed and edited subsequent drafts and revisions. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhen, F.; Wang, B.; Wei, Z. The rise of the internet city in China: Production and consumption of internet information. Urban Stud. 2015, 52, 2313–2329. [Google Scholar] [CrossRef]
  2. Goodchild, M.F. Citizens as sensors: The world of volunteered geography. GeoJournal 2007, 69, 211–221. [Google Scholar] [CrossRef] [Green Version]
  3. Uden, M.; Zipf, A. Open building models: Towards a platform for crowdsourcing virtual 3D cities. In Progress and New Trends in 3D Geoinformation Sciences; Pouliot, J., Daniel, S., Hubert, F., Zayadi, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 299–314. [Google Scholar]
  4. Camargo, C.Q.; Bright, J.; McNeill, G.; Raman, S.; Hale, S.A. Estimating traffic disruption patterns with volunteered geographic information. Sci. Rep. 2020, 10, 1–8. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Gkountouna, O.; Pfoser, D.; Wenk, C.; Züfle, A. A unified framework to predict movement. In Proceedings of the International Symposium on Spatial and Temporal Databases, Arlington, VA, USA, 21–23 August 2017; Springer: Cham, Switzerland, 2017; pp. 393–397. [Google Scholar]
  6. Barrington-Leigh, C.; Millard-Ball, A. Global trends toward urban street-network sprawl. Proc. Natl. Acad. Sci. USA 2020, 117, 1941–1950. [Google Scholar] [CrossRef] [Green Version]
  7. Mahabir, R.; Croitoru, A.; Crooks, A.; Agouris, P.; Stefanidis, A. News coverage, digital activism, and geographical saliency: A case study of refugee camps and volunteered geographical information. PLoS ONE 2018, 13, e0206825. [Google Scholar] [CrossRef]
  8. Tizzoni, M.; Panisson, A.; Paolotti, D.; Cattuto, C. The impact of news exposure on collective attention in the United States during the 2016 Zika epidemic. PLoS Comput. Biol. 2020, 16, e1007633. [Google Scholar] [CrossRef]
  9. Wikipedia. Available online: https://en.wikipedia.org/wiki/List_of_street_view_services (accessed on 1 March 2020).
  10. Li, X.; Zhang, C.; Li, W. Building block level urban land-use information retrieval based on Google street view images. Gisci. Remote Sens. 2017, 54, 819–835. [Google Scholar] [CrossRef]
  11. Cao, R.; Zhu, J.; Tu, W.; Li, Q.; Cao, J.; Liu, B.; Zhang, Q.; Qiu, G. Integrating aerial and street view images for urban land use classification. Remote Sens. 2018, 10, 1553. [Google Scholar] [CrossRef] [Green Version]
  12. Laohaprapanon, S.; Ortleb, K.; Sood, G. Street Sense: Learning from Google Street View. Available online: https://arxiv.org/abs/1807.06075 (accessed on 23 March 2020).
  13. Cyclomedia. Available online: https://www.cyclomedia.com (accessed on 1 February 2020).
  14. Naik, N.; Kominers, S.D.; Raskar, R.; Glaeser, E.L.; Hidalgo, C.A. Computer vision uncovers predictors of physical urban change. Proc. Natl. Acad. Sci. USA 2017, 114, 7571–7576. [Google Scholar] [CrossRef] [Green Version]
  15. Goel, R.; Garcia, L.M.; Goodman, A.; Johnson, R.; Aldred, R.; Murugesan, M.; Brage, S.; Bhalla, K.; Woodcock, J. Estimating city-level travel patterns using street imagery: A case study of using Google street view in Britain. PLoS ONE 2018, 13, e0196521. [Google Scholar] [CrossRef] [Green Version]
  16. Mapillary. Available online: https://www.mapillary.com (accessed on 2 March 2020).
  17. OSC. Available online: https://openstreetcam.org (accessed on 2 March 2020).
  18. Sester, M.; Arsanjani, J.J.; Klammer, R.; Burghardt, D.; Haunert, J.H. Integrating and generalising volunteered geographic information. In Abstracting Geographic Information in a Data Rich World-Methodologies and Applications of Map Generalisation; Burghardt, D., Duchene, C., Mackaness, W., Eds.; Springer: Basel, Switzerland, 2014; pp. 119–155. [Google Scholar]
  19. Paül i Agustí, D. Tourist hot spots in cities with the highest murder rates. Tour. Geogr. 2020, 22, 151–170. [Google Scholar] [CrossRef]
  20. Dunkel, A. Visualizing the perceived environment using crowdsourced photo geodata. Landsc. Urban Plan. 2015, 142, 173–186. [Google Scholar] [CrossRef]
  21. Sun, Y.; Fan, H.; Bakillah, M.; Zipf, A. Road-based travel recommendation using geo-tagged images. Comput. Environ. Urban Syst. 2015, 53, 110–122. [Google Scholar] [CrossRef]
  22. Zhao, N.; Cao, G.; Zhang, W.; Samson, E.L.; Chen, Y. Remote sensing and social sensing for socioeconomic systems: A comparison study between nighttime lights and location-based social media at the 500 m spatial resolution. Int. J. Appl Earth Obs. Geoinf. 2020, 87, 102058. [Google Scholar] [CrossRef]
  23. Google. Available online: https://www.google.com/streetview/explore (accessed on 1 April 2020).
  24. Kastrenakes, J. Apple Maps is Getting Its Own Version of Google Maps’ Street View. Available online: https://www.theverge.com/2019/6/3/18650877/apple-maps-ios-13-google-street-view-wwdc-2019-keynote (accessed on 1 April 2020).
  25. HERE. Available online: https://www.here.com/en/drive-schedule (accessed on 23 March 2020).
  26. Bandland, H.M.; Opit, S.; Witten, K.; Kearns, R.A.; Mavoa, S. Can virtual streetscape audits reliably replace physical streetscape audits? J. Urban Health 2010, 87, 1007–1016. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Odgers, C.L.; Caspi, A.; Bates, C.J.; Sampson, R.J.; Moffitt, T.E. Systematic social observation of children’s neighborhoods using Google Street View: A reliable and cost-effective method. J. Child Psychol. Psychiatry 2012, 53, 1009–1017. [Google Scholar] [CrossRef] [Green Version]
  28. Griew, P.; Hillsdon, M.; Foster, C.; Coombes, E.; Jones, A.; Wilkinson, P. Developing and testing a street audit tool using Google Street View to measure environmental supportiveness for physical activity. Int. J. Behav. Nutr. Phys. Act. 2013, 10, 103. [Google Scholar] [CrossRef] [Green Version]
  29. Mooney, S.J.; Bader, M.D.; Lovasi, G.S.; Neckerman, K.M.; Teitler, J.O.; Rundle, A.G. Validity of an ecometric neighborhood physical disorder measure constructed by virtual street audit. Am. J. Epidemiol. 2014, 180, 626–635. [Google Scholar] [CrossRef] [Green Version]
  30. Bethlehem, J.R.; Mackenbach, J.D.; Ben-Rebah, M.; Compernolle, S.; Glonti, K.; Bárdos, H.; Rutter, H.R.; Charreire, H.; Oppert, J.M.; Brug, J.; et al. The SPOTLIGHT virtual audit tool: A valid and reliable tool to assess obesogenic characteristics of the built environment. Int. J. Health Geogr. 2014, 13, 52. [Google Scholar] [CrossRef] [Green Version]
  31. Rzotkiewicz, A.; Pearson, A.L.; Dougherty, B.V.; Shortridge, A.; Wilson, N. Systematic review of the use of Google Street View in health research: Major themes, strengths, weaknesses and possibilities for future research. Health Place 2018, 52, 240–246. [Google Scholar] [CrossRef]
  32. Hwang, J.; Sampson, R.J. Divergent pathways of gentrification: Racial inequality and the social order of renewal in Chicago neighborhoods. Am. Sociol. Rev. 2014, 79, 726–751. [Google Scholar] [CrossRef]
  33. Ilic, L.; Sawada, M.; Zarzelli, A. Deep mapping gentrification in a large Canadian city using deep learning and Google Street View. PLoS ONE 2019, 14, e0212814. [Google Scholar] [CrossRef] [PubMed]
  34. Kang, J.; Körner, M.; Wang, Y.; Taubenböck, H.; Zhu, X.X. Building instance classification using street view images. Isprs J. Photogramm. Remote Sens. 2018, 145, 44–59. [Google Scholar] [CrossRef]
  35. Richards, D.R.; Edwards, P.J. Quantifying street tree regulating ecosystem services using Google Street View. Ecol. Indic. 2017, 77, 31–40. [Google Scholar] [CrossRef]
  36. Quercia, D.; Schifanella, R.; Aiello, L.M. The shortest path to happiness: Recommending beautiful, quiet, and happy routes in the city. In Proceedings of the 25th Conference on Hypertext and Social Media, Santiago, Chile, 1–4 September 2014. [Google Scholar]
  37. Runge, N.; Samsonov, P.; Degraen, D.; Schöning, J. No more autobahn!: Scenic route generation using googles street view. In Proceedings of the 21st International Conference on Intelligent User Interfaces, Sonoma, CA, USA, 7–10 March 2016. [Google Scholar]
  38. Suel, E.; Polak, J.W.; Bennett, J.E.; Ezzati, M. Measuring social, environmental and health inequalities using deep learning and street imagery. Sci. Rep. 2019, 9, 6229. [Google Scholar] [CrossRef] [PubMed]
  39. Gebru, T.; Krause, J.; Wang, Y.; Chen, D.; Deng, J.; Aiden, E.L.; Fei-Fei, L. Using deep learning and Google street view to estimate the demographic makeup of neighborhoods across the United States. Proc. Natl. Acad. Sci. USA 2017, 114, 13108–13113. [Google Scholar] [CrossRef] [Green Version]
  40. Diou, C.; Lelekas, P.; Delopoulos, A. Image-based surrogates of socio-economic status in urban neighborhoods using deep multiple instance learning. J. Imaging 2018, 4, 125. [Google Scholar] [CrossRef] [Green Version]
  41. Krylov, V.A.; Kenny, E.; Dahyot, R. Automatic discovery and geotagging of objects from street view imagery. Remote Sens. 2018, 10, 661. [Google Scholar] [CrossRef] [Green Version]
  42. Krylov, V.A.; Dahyot, R. Object geolocation from crowdsourced street level imagery. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Dublin, Ireland, 10–14 September 2010. [Google Scholar]
  43. Novack, T.; Vorbeck, L.; Lorei, H.; Zipf, A. Towards Detecting Building Facades with Graffiti Artwork Based on Street View Images. Int. J. Geo Inf. 2020, 9, 98. [Google Scholar] [CrossRef] [Green Version]
  44. Srivastava, S.; Lobry, S.; Tuia, D.; Vargas-Muñoz, J.E. Land-use characterization using Google street view pictures and OpenStreetMap. In Proceedings of the 21st Conference on Geographic Information Science, Lund, Sweden, 12–15 June 2018. [Google Scholar]
  45. Google. Google Maps/Google Earth Additional Terms of Service. Available online: https://www.google.com/help/terms_maps (accessed on 15 December 2019).
  46. Leon, L.F.A.; Quinn, S. The value of crowdsourced street-level imagery: Examining the shifting property regimes of OpenStreetCam and Mapillary. GeoJournal 2019, 84, 395–414. [Google Scholar] [CrossRef]
  47. Barton, M. An exploration of the importance of the strategy used to identify gentrification. Urban Stud. 2016, 53, 92–111. [Google Scholar] [CrossRef]
  48. Kilkenny, K. A Brief History of the Coffee Shop as a Symbol for Gentrification. Pacific Standard. Available online: https://psmag.com/economics/history-of-coffee-shop-as-symbol-for-gentrification (accessed on 27 November 2019).
  49. Juhász, L.; Hochmair, H.H. User contribution patterns and completeness evaluation of Mapillary, a crowdsourced street level photo service. Trans. Gis 2016, 20, 925–947. [Google Scholar] [CrossRef]
  50. Ma, D.; Sandberg, M.; Jiang, B. Characterizing the heterogeneity of the OpenStreetMap data and community. Isprs Int. J. Geo-Inf. 2015, 4, 535–550. [Google Scholar] [CrossRef]
  51. Juhasz, L.; Hochmair, H. Exploratory completeness analysis of Mapillary for selected cities in Germany and Austria. Gi_Forum J. Geogr. Inf. Sci. 2015, 535–545. [Google Scholar] [CrossRef] [Green Version]
  52. Juhász, L.; Hochmair, H.H. Cross-linkage between Mapillary street level photos and OSM edits. In Proceedings of the 19th Conference of the Association of Geographic Information Laboratories in Europe on Geographic Information Science, Helsinki, Finland, 14–17 June 2016; pp. 141–156. [Google Scholar]
  53. Quinn, S.; León, A.L. Every single street? Rethinking full coverage across street-level imagery platforms. Trans. Gis 2019, 23, 1251–1272. [Google Scholar] [CrossRef]
  54. Ma, D.; Fan, H.; Li, W.; Ding, X. The State of Mapillary: An Exploratory Analysis. Int. J. Geo-Inf. 2020, 9, 10. [Google Scholar] [CrossRef] [Green Version]
  55. D’Andrimont, R.; Yordanov, M.; Lemoine, G.; Yoong, J.; Nikel, K.; van der Velde, M. Crowdsourced street-level imagery as a potential source of in-situ data for crop monitoring. Land 2018, 7, 127. [Google Scholar] [CrossRef] [Green Version]
  56. Dev, S.; Hossari, M.; Nicholson, M.; McCabe, K.; Conran, A.N.C.; Tang, J.; Xu, W.; Pitié, F. The ALOS dataset for advert localization in outdoor scenes. In Proceedings of the 11th International Conference on Quality of Multimedia Experience, Berlin, Germany, 5–7 June 2019. [Google Scholar]
  57. Geus, D.; Meletis, P.; Dubbelman, G. Panoptic Segmentation with a Joint Semantic and Instance Segmentation Network. Available online: https://arxiv.org/abs/1809.02110v2 (accessed on 1 January 2020).
  58. See, L.; Mooney, P.; Foody, G.; Bastin, L.; Comber, A.; Estima, J.; Fritz, S.; Kerle, N.; Jiang, B.; Laakso, M.; et al. Crowdsourcing, citizen science or volunteered geographic information? The current state of crowdsourced geographic information. Isprs Int. J. Geo-Inf. 2016, 5, 55. [Google Scholar] [CrossRef]
  59. Mocnik, F.B.; Ludwig, C.; Grinberger, A.Y.; Jacobs, C.; Klonner, C.; Raifer, M. Shared data sources in the geographical domain—A classification schema and corresponding visualization techniques. Isprs Int. J. Geo-Inf. 2019, 8, 242. [Google Scholar] [CrossRef] [Green Version]
  60. USGS. Available online: https://catalog.data.gov/dataset/usgs-national-transportation-dataset-ntd-downloadable-data-collectionde7d2 (accessed on 5 October 2019).
  61. Mapillary. Available online: https://www.mapillary.com/developer (accessed on 2 October 2019).
  62. OSC. Available online: http://api.openstreetcam.org/api/doc.html (accessed on 1 October 2019).
  63. USCB. Available online: https://catalog.data.gov/dataset/tiger-line-shapefile-2018-nation-u-s-current-metropolitan-statistical-area-micropolitan-statist (accessed on 2 October 2019).
  64. ORNL. Available online: https://landscan.ornl.gov/ (accessed on 1 October 2019).
  65. Haklay, M. How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets. Environ. Plan. B Plan. Des. 2010, 37, 682–703. [Google Scholar] [CrossRef] [Green Version]
  66. Mahabir, R.; Stefanidis, A.; Croitoru, A.; Crooks, A.; Agouris, P. Authoritative and volunteered geographical information in a developing country: A comparative case study of road datasets in Nairobi, Kenya. Isprs Int. J. Geo-Inf. 2017, 6, 24. [Google Scholar] [CrossRef]
  67. GeoPandas. Available online: https://geopandas.org (accessed on 2 October 2019).
  68. Brakatsoulas, S.; Pfoser, D.; Salas, R.; Wenk, C. On map-matching vehicle tracking data. In Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, 30 August–2 September 2005. [Google Scholar]
  69. Yang, B.; Zhang, Y.; Luan, X. A probabilistic relaxation approach for matching road networks. Int. J. Geogr. Inf. Sci. 2013, 27, 319–338. [Google Scholar] [CrossRef]
  70. Ruiz, J.J.; Ariza, F.J.; Urena, M.A.; Blázquez, E.B. Digital map conflation: A review of the process and a proposal for classification. Int. J. Geogr. Inf. Sci. 2011, 25, 1439–1466. [Google Scholar] [CrossRef]
  71. Zhang, M.; Yao, W.; Meng, L. Automatic and accurate conflation of different road-network vector data towards multi-modal navigation. Isprs Int. J. Geo-Inf. 2016, 5, 68. [Google Scholar] [CrossRef] [Green Version]
  72. Daneshgar, F.; Sadabadi, K.F.; Haghani, A. A Conflation Methodology for Two GIS Roadway Networks and Its Application in Performance Measurements. Transp. Res. Rec. 2018, 2672, 284–293. [Google Scholar] [CrossRef]
  73. Lei, T.; Lei, Z. Optimal spatial data matching for conflation: A network flow-based approach. Trans. Gis 2019, 23, 1152–1176. [Google Scholar] [CrossRef]
  74. Mullen, W.F.; Jackson, S.P.; Croitoru, A.; Crooks, A.; Stefanidis, A.; Agouris, P. Assessing the impact of demographic characteristics on spatial error in volunteered geographic information features. GeoJournal 2015, 80, 587–605. [Google Scholar] [CrossRef]
  75. Fredricks, G.A.; Nelsen, R.B. On the relationship between Spearman’s rho and Kendall’s tau for pairs of continuous random variables. J. Stat. Plan. Inference 2007, 137, 2143–2150. [Google Scholar] [CrossRef]
  76. Puth, M.T.; Neuhäuser, M.; Ruxton, G.D. Effective use of Spearman’s and Kendall’s correlation coefficients for association between two measured traits. Anim. Behav. 2005, 102, 77–84. [Google Scholar] [CrossRef] [Green Version]
  77. Ratner, B. The correlation coefficient: Its values range between+ 1/− 1, or do they? J. Target. Meas. Anal. Mark. 2009, 17, 139–142. [Google Scholar] [CrossRef] [Green Version]
  78. Budhathoki, N.R.; Haythornthwaite, C. Motivation for open collaboration: Crowd and community models and the case of OpenStreetMap. Am. Behav. Sci. 2013, 57, 548–575. [Google Scholar] [CrossRef]
  79. Mapillary. Driving with Mapillary: Commonly Asked Questions. Available online: https://help.mapillary.com/hc/en-us/articles/360010392280-Driving-with-Mapillary-commonly-asked-questions#h_1c52655b-a2eb-4dd0-9887-ebaf892bae7f (accessed on 25 April 2020).
  80. USGS. Available online: https://www.usgs.gov/faqs/what-are-code-value-definitions-tnmfrc-attribute (accessed on 5 December 2019).
  81. O’Keeffe, K.P.; Anjomshoaa, A.; Strogatz, S.H.; Santi, P.; Ratti, C. Quantifying the sensing power of vehicle fleets. Proc. Natl. Acad. Sci. USA 2019, 116, 12752–12757. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  82. Basiri, A.; Amirian, P.; Mooney, P. Using crowdsourced trajectories for automated OSM data entry approach. Sensors 2016, 16, 1510. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  83. Gkountouna, O.; Terrovitis, M. Anonymizing collections of tree-structured data. IEEE Trans. Knowl. Data Eng. 2015, 27, 2034–2048. [Google Scholar] [CrossRef]
  84. Senaratne, H.; Mobasheri, A.; Ali, A.L.; Capineri, C.; Haklay, M. A review of volunteered geographic information quality assessment methods. Int. J. Geogr. Inf. Sci. 2017, 31, 139–167. [Google Scholar] [CrossRef]
  85. Zhao, B.; Sui, D.Z. True lies in geospatial big data: Detecting location spoofing in social media. Ann. Gis 2017, 23, 1–14. [Google Scholar] [CrossRef]
  86. Deng, X.; Zhu, Y.; Newsam, S. What is it like down there? Generating dense ground-level views and image features from overhead imagery using conditional generative adversarial networks. In Proceedings of the 26th SIGSPATIAL International Conference on Advances in Geographic Information Systems, Washington, DC, USA, 6–9 November 2018. [Google Scholar]
  87. Oksanen, J.; Bergman, C.; Sainio, J.; Westerholm, J. Methods for deriving and calibrating privacy-preserving heat maps from mobile sports tracking application data. J. Transp. Geogr. 2015, 48, 135–144. [Google Scholar] [CrossRef] [Green Version]
  88. Bergman, C.; Oksanen, J. Estimating the Biasing Effect of Behavioural Patterns on Mobile Fitness App Data by Density-Based Clustering. In Geospatial Data in a Changing World: Selected Papers of the 19th AGILE Conference on Geographic Information Science; Sarjakoski, T., Santos, M.Y., Sarjakoski, T., Eds.; Spring: Berlin/Heidelberg, Germany, 2016; pp. 199–218. [Google Scholar]
  89. Mapillary. Five US Departments of Transportation Upload 270,000 Miles of Road Data to Mapillary to Understand Road Safety. Available online: https://blog.mapillary.com/news/2019/04/17/five-us-dots-upload-photologs-to-mapillary.html (accessed on 14 April 2020).
  90. Mapillary. Helping Cities Across the US to Understand Their Street: Unveiling Our Partnership with IWorQ. Available online: https://blog.mapillary.com/update/2019/11/13/mapillary-partners-with-iworq.html (accessed on 14 April 2020).
Figure 1. Graphical user interfaces of (a) Mapillary and (b) OpenStreetCam (OSC).
Figure 1. Graphical user interfaces of (a) Mapillary and (b) OpenStreetCam (OSC).
Ijgi 09 00341 g001
Figure 2. Overview of methodology.
Figure 2. Overview of methodology.
Ijgi 09 00341 g002
Figure 3. Spatial distribution of road networks.
Figure 3. Spatial distribution of road networks.
Ijgi 09 00341 g003
Figure 4. Total length of roads in kilometers per 1 × 1 km grid cell area.
Figure 4. Total length of roads in kilometers per 1 × 1 km grid cell area.
Ijgi 09 00341 g004
Figure 5. Normalized population density per 1 × 1 km road lengths.
Figure 5. Normalized population density per 1 × 1 km road lengths.
Ijgi 09 00341 g005
Figure 6. Spatial comparison of roads in kilometers.
Figure 6. Spatial comparison of roads in kilometers.
Ijgi 09 00341 g006
Figure 7. Unique contributors per mapped 1 × 1 km grid cell.
Figure 7. Unique contributors per mapped 1 × 1 km grid cell.
Ijgi 09 00341 g007
Figure 8. Number of 1 × 1 km grid cells mapped by contributors in (a) Mapillary and (b) OSC.
Figure 8. Number of 1 × 1 km grid cells mapped by contributors in (a) Mapillary and (b) OSC.
Ijgi 09 00341 g008
Figure 9. Average length of road sequence contributed per user in (a) Mapillary and (b) OSC.
Figure 9. Average length of road sequence contributed per user in (a) Mapillary and (b) OSC.
Ijgi 09 00341 g009
Figure 10. Number of images contributed during the days of the week in local time in (a) Mapillary and (b) OSC.
Figure 10. Number of images contributed during the days of the week in local time in (a) Mapillary and (b) OSC.
Ijgi 09 00341 g010
Figure 11. The distribution of user contributions over a span of twenty-four hours in Mapillary.
Figure 11. The distribution of user contributions over a span of twenty-four hours in Mapillary.
Ijgi 09 00341 g011
Figure 12. Cumulative lengths of image sequences contributed over time in (a) Mapillary and (b) OSC.
Figure 12. Cumulative lengths of image sequences contributed over time in (a) Mapillary and (b) OSC.
Ijgi 09 00341 g012
Table 1. Data.
Table 1. Data.
DataSourceData TypeMode of AcquisitionDateReferences
TIGER roadsUS Census BureauPolylineOnline geoportal2018[60]
Mapillary roadMapillaryPoint tracesAPICurrent up to 08/31/2018[61]
OSC road sequencesOpenStreetCamPoint tracesAPICurrent up to 08/31/2018[62]
Metropolitan boundariesUS Census BureauPolygonOnline geoportal2018[63]
PopulationLandScanPolygonOak Ridge National Laboratory online geoportal2018[64]
Table 2. Summary road statistics and coverage.
Table 2. Summary road statistics and coverage.
TIGERMapillary
(% of TIGER)
OSC
(% of TIGER)
Washington
Cells containing roads (out of 25,430)23,6436032 (25.51)3409 (14.42)
Total road length per dataset (km)82,110.1328371.99 (34.55)15,015.77 (18.29)
San Francisco
Cells containing roads (out of 10231)82442529 (30.68)2060 (24.99)
Total road length per dataset (km)37,759.4936,719.20 (97.24)29,819.88 (78.97)
Phoenix
Cells containing roads (out of 53121)27,772 6173 (22.22)9262 (33.35)
Total road length per dataset (km)77,754.5272,618.34 (93.39)257,891.07 (331.67)
Detroit
Cells containing roads (out of 16835)15,3862284 (14.84)8139 (52.90)
Total road length per dataset (km)59,343.5316,986.84 (28.62)504,405.51 (849.98)
Table 3. Summary statistics (mean ± standard deviation) per 1 × 1 km grid cell area.
Table 3. Summary statistics (mean ± standard deviation) per 1 × 1 km grid cell area.
Mapillary OSC
Washington
Mean road length 1.09 ± 5.120.56 ± 2.51
Mean contributors 0.50 ± 1.330.28 ± 1.09
Mean number of images 43.68 ± 475.1621.34 ± 133.83
San Francisco
Mean road length 3.58 ± 12.902.79 ± 13.32
Mean contributors0.75 ± 2.140.61 ± 1.87
Mean number of images217.56 ± 985.5176.23 ± 388.55
Phoenix
Mean road length 1.36 ± 8.914.77 ± 30.96
Mean contributors 0.21 ± 0.660.68 ± 2.03
Mean number of images 44.73 ± 268.09123.86 ± 785.66
Detroit
Mean road length 0.99 ± 4.1229.38 ± 98.886
Mean contributors0.23 ± 0.733.36 ± 6.19
Mean number of images130.52 ± 731.86483.93 ± 1580.43
Table 4. Correlation between the number of contributors and length of roads (denoted in parentheses) with population density per 1 × 1 km grid area. All correlation values were found to be significant at the 0.01 (i.e., 99%) level.
Table 4. Correlation between the number of contributors and length of roads (denoted in parentheses) with population density per 1 × 1 km grid area. All correlation values were found to be significant at the 0.01 (i.e., 99%) level.
Study AreaMapillaryOSC
Number of Unique Contributors (Length of Roads)
PearsonKendallSpearmanPearsonKendallSpearman
Washington0.46 (0.40)0.28 (0.27)0.34 (0.34)0.18 (0.18)0.23 (0.23)0.28 (0.28)
San Francisco0.51 (0.52)0.48 (0.48)0.56 (0.56)0.32 (0.39)0.43 (0.43)0.51 (0.51)
Phoenix0.49 (0.33)0.51 (0.50)0.54 (0.51)0.55 (0.45)0.59 (0.59)0.63 (0.63)
Detroit0.38 (0.42)0.34 (0.34)0.41 (0.42)0.54 (0.46)0.51 (0.51)0.66 (0.66)
Table 5. Percentage of 1 × 1 km grid cells mapped more than once by the same users.
Table 5. Percentage of 1 × 1 km grid cells mapped more than once by the same users.
Study AreaMapillaryOSC
Washington73.5557.14
San Francisco66.6752.73
Phoenix94.2574.24
Detroit80.1777.95
Table 6. Summary statistics for road categories.
Table 6. Summary statistics for road categories.
Road TypesTIGERMapillaryOSC
Total road length (km)
Washington
Controlled-access highway2077.348453.337095.17
Secondary Highway or Major Connecting Road2816.074780.772521.81
Local Connecting Road4042.364010.081472.26
Local Road69,721.469986.353152.07
Ramp1379.081128.50767.10
4WD2072.060.475.08
Ferry Route0.730.000.39
Tunnel1.050.091.90
Total sum of all categories82,110.1328,371.9915,015.77
San Francisco
Controlled-access Highway1424.806546.7816,144.67
Secondary Highway or Major Connecting Road36.6078.31271.59
Local Connecting Road827.541438.731229.66
Local Road33,965.9327,641.189492.00
Ramp910.96980.792629.26
4WD577.680.650.57
Ferry Route0.000.000.00
Tunnel15.9832.7752.13
Total sum of all categories37,759.4936,719.2029,819.88
Phoenix
Controlled-access Highway1980.8416,401.6888,971.87
Secondary Highway or Major Connecting Road904.223447.707627.81
Local Connecting Road807.133476.1053,229.91
Local Road71,518.16462,218.19135,496.89
Ramp920.54303.0520,190.79
4WD1620.6321.9555.87
Ferry Route0.000.000.00
Tunnel3.0017.66217.93
Total sum of all categories77,754.5272,618.34257,891.07
Detroit
Controlled-access Highway1935.924165.46157,851.13
Secondary Highway or Major Connecting Road578.831368.9322,593.38
Local Connecting Road1430.961637.1063,357.19
Local Road54533.838872.74231,776.91
Ramp849.71936.3828,141.12
4WD5.680.000.08
Ferry Route2.650.000.00
Tunnel5.966.22685.68
Total sum of all categories59343.5316,986.84504,405.51

Share and Cite

MDPI and ACS Style

Mahabir, R.; Schuchard, R.; Crooks, A.; Croitoru, A.; Stefanidis, A. Crowdsourcing Street View Imagery: A Comparison of Mapillary and OpenStreetCam. ISPRS Int. J. Geo-Inf. 2020, 9, 341. https://doi.org/10.3390/ijgi9060341

AMA Style

Mahabir R, Schuchard R, Crooks A, Croitoru A, Stefanidis A. Crowdsourcing Street View Imagery: A Comparison of Mapillary and OpenStreetCam. ISPRS International Journal of Geo-Information. 2020; 9(6):341. https://doi.org/10.3390/ijgi9060341

Chicago/Turabian Style

Mahabir, Ron, Ross Schuchard, Andrew Crooks, Arie Croitoru, and Anthony Stefanidis. 2020. "Crowdsourcing Street View Imagery: A Comparison of Mapillary and OpenStreetCam" ISPRS International Journal of Geo-Information 9, no. 6: 341. https://doi.org/10.3390/ijgi9060341

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop