Understanding the spatio-temporal distribution of pedestrian volume in urban environments is essential for informing urban management and planning decisions to create livable and thriving city centers as well as mitigating negative effects including increased traffic and crime rates. While evidence from public health [1
], transportation [2
], environmental sciences [3
], and environmental psychology [4
] can provide theoretical basis for informing urban design, planners are increasingly using data-driven methods to identify opportunities for infrastructure improvements, analyze the use of design features, and evaluate the impacts of special events or public space redevelopment.
Given the importance of up-to-date and reliable data, researchers and urban planners studying the use of public spaces have been testing and applying different types of data collection methods varying in the type and geographic extent of information they provide, cost of their implementation, and privacy issues they raise. Labor-intensive, manual observation methods are often replaced by automated data collection using a variety of technological approaches, such as counting gates, GPS receivers and accelerometers in smart phones [5
] and more recently also using call detail records (CDR) [7
] and WiFi probe request data [9
]. The challenges of asking people to actively wear the different sensors and privacy issues associated with telecommunication data, together with data and participant inaccessibility for research has led researchers to take advantage of crowdsourced, often publicly available, big data coming from social media networks, such as Twitter, Flickr, and Instagram [10
]. Geolocated tweets and images have been used recently to study, e.g., urban parks visitation and access [11
] and to gain insight about the paths of tourists through cities [13
]. Strava—a network for tracking athletic activity—provides even more geospatially rich data which has been used to investigate cycling behavior [15
], cycling infrastructure [16
], and air pollution exposure of commuting cyclists [17
One of the increasingly popular methods for public space monitoring uses closed-circuit-television (CCTV) footage and webcams to capture and interpret images for a variety of purposes including security, weather monitoring, and pedestrian, bicycle, and motor vehicle traffic [18
]. Although webcams do not provide individual mobility data, they represent a rich source of spatio-temporal data about people and their environment, and have been used in the past to study air pollution [20
], phenology [21
], and beach usage [22
]. The existing global network of public webcams allows us to study local phenomena on global scale, providing ways to link frequent, high-resolution, on-the-ground observations of environment with typically coarser satellite data [23
]. Having recognized the need for an organized, searchable network of webcams, Jacobs et al. [24
] established AMOS—Archive of Many Outdoor Scenes—which was collecting images worldwide between 2006 and 2018 and thus capturing the dynamics of urban spaces for many years. As such, AMOS represents a free, unique source of information about pedestrian density, its changes throughout a day, week, month and year, and information on how pedestrians react to changes in their environment. Webcams often capture open places, such as plazas, which typically serve multiple distinct purposes (tourism, commuting, commerce, leisure), and thus other approaches, specifically those gathering data in point locations (e.g., counting gates) or from a specific population (e.g., social networks) may not be effective.
Unlike data from other methods (counting gates, GPS), webcam data cannot be directly analyzed, rather the information—spatial and non-spatial—has to be first extracted, either manually, or automatically using machine learning algorithms, which are increasingly getting better at identifying objects in scenes, whether we look for people, bicycles, or cars. Subsequent spatial analysis typically operates within the coordinate system of the webcam image [18
], or focuses on pre-defined activity areas [19
], limiting the potential of the data to be integrated within existing Geographic Information System (GIS) databases and thus analyzed within the local context. Given the fact that measured distances in webcam images represent varying on-the-ground distances, many urban webcam studies have been focusing on counts and traffic flows rather than spatial mapping [18
Besides the spatial component of the webcam data, each image is associated with the time when it was captured. Although including the temporal component of the webcam data makes its analysis and visualization much more challenging [26
], it provides a more complete picture of the urban dynamics. Space-time cube (STC) representation has proved to be useful means of conceptualization, analysis and visualization of spatio-temporal events [27
] and trajectories [29
]. It has been used for characterizing various urban phenomena, including crime hotspots [30
], urban fires [31
], and dengue fever [32
], as well as for studying human activity patterns [33
] and describing big trajectory datasets [35
In this paper, we build on a previous study by Hipp et al. [25
] and present a new method to derive high-resolution spatio-temporal pedestrian density from webcam images. Given the three-dimensional nature of the density, we propose a novel visualization using a continuous space-time cube representation, aiming at providing at-a-glance view of the dynamics of pedestrian density in space and time. The proposed visualization allows exploration and communication of complex hourly, daily, or weekly spatio-temporal patterns in an efficient and concise way. To demonstrate the method, we analyzed AMOS webcam images capturing two plazas, one in Germany and one in Australia, each highlighting different aspects of the method.
The remainder of the article is organized as follows: in Section 2
we describe data collection and its processing including georeferencing and subsequent aggregation and computation of 2D kernel densities, which are then used to construct the STC representation. Section 3
demonstrates the method using two case studies and summarizes the resulting observations. The strengths and the limitations of the presented method are discussed in Section 4
with conclusion in Section 5
Collecting data about the use of public spaces and their analysis is instrumental for informing urban design and developing city policies that better reflect the needs of local residents and visitors [8
]. Increasing walkability, reducing traffic congestion, or identifying underused infrastructure, all these objectives may require different data collection methods and analyses. Moreover, the inherent complexity and spatio-temporal nature of the data calls for effective visualizations to extract meaningful information [29
]. In this study, we demonstrated the potential of webcams, an inexpensive and easy-to-use technology to study the current use of public spaces and to evaluate any ongoing changes. We introduced a methodology allowing us to transform time series of webcam images into geospatial representation of pedestrian densities, which can be readily applied to study the current use of public spaces or evaluation of ongoing changes. With the proposed space-time cube representation, we cannot only meaningfully visualize the urban pedestrian dynamics, but also compare space-time cubes derived from different time periods or extract information for specific locations or time instances.
Although this study dealt with pedestrian densities only, it is simple to apply the same method for bicycles or motor vehicles. As with subtracting the space-time cubes before and after reconstruction of the plaza, comparing the space-time cubes representing these different modes of transportation can reveal their interactions, e.g., identify places with safety concerns. Even though the webcam data do not provide information about the direction pedestrians are walking or time spent on one location, we can infer more about the causes of observed patterns by analyzing the densities in the local context including streets, amenities, or current weather. Using Digital Surface Models of cities would enable the incorporation of results from high-resolution spatial modeling of solar radiation or viewscapes [51
Given the growing city sizes and spatial and temporal scarcity of human mobility data, pedestrian volumes have been estimated by simulations, often using agent-based models (ABM) [53
]. Given the well-known challenges with calibrating and validating ABMs [40
], deriving pedestrian densities from webcams could replace labor-intensive manual survey methods, especially in open spaces, such as plazas, where ”gate count” methods [55
] would be ineffective.
Like other active transportation behavior studies, our methodology includes some limitations and assumptions. Webcams are not always located and oriented in an optimal way. When a webcam is installed low above the terrain, the accuracy of the georeferenced coordinates quickly deteriorates with the increasing distance from the webcam (see the error ellipses in Figure 1
). Therefore, the webcam needs to be installed as high as possible to capture areas of interest under large view angles. Furthermore, in this work, we assumed the plazas are perfect horizontal planes, which in many cases is a reasonable assumption. However, in cases where the observed urban area lies on a tilted plane or has even more complex topography, the georeferencing method would need to be corrected to take the topography into account. Finally, obstructions in the webcam view can cause missing data, which may obscure the analysis. This can be a challenge in urban environments with trees or large monuments. A possible solution is to fill the gaps by integrating georeferenced data from a second webcam capturing the area of interest from an alternative location.
Although a crowdsourcing platform has been successfully used before for labeling pedestrians, bikes, and cars [37
], in many cases the accuracy may not be sufficient. Requiring multiple workers to repeat the same task may be too costly and it does not necessarily help in avoiding certain issues. For example, with low resolution webcams certain permanent objects may be easily confused with pedestrians. These cases can be fortunately easily detected when looking at the webcam image time series and the affected labels removed. Furthermore, when webcams capture crowded scenes during an event, person labeling the scene will not be able or willing to label everything, leading to likely underestimation of densities. Many of these issues can be solved by using machine learning algorithms to resolve the labeling automatically using the crowdsourced data as a learning dataset. Although not without errors, detection of people in both sparse and crowded images has been studied and successfully used for many years [56
]. If visibility is reduced due to nighttime, or weather conditions such as heavy rain or fog, the images are most likely unsuitable for analysis. In our case studies we therefore avoided processing night images, but, apart from that we have not encountered any conditions which would significantly affect visibility. To better inform the analysis, future studies may incorporate recorded weather conditions (including rain, temperature, wind) to disentangle the influence of weather on pedestrian density.
Although the pedestrian density could be visualized in a variety of ways the presented space-time cube visualization provides an effective way to communicate the complex information and allows flexibility to explore spatio-temporal patterns based on different temporal aggregation types, where Z
axis can represent hours, days, days of the week, or months, depending on the explored patterns. Moreover, STC as a 3D raster data structure can be used for operations such as 3D raster algebra computation or extractions. By time drilling or time cutting [26
] we can obtain temporal behavior of density at certain location or spatial density at certain time, respectively. Using 3D raster algebra, we can simply combine different STCs, e.g., representing densities of pedestrian, bikes and cars to analyze when and where there could be a potentially dangerous conflicting use of public spaces. The proposed visualization should be further tested for usability and efficiency, and to identify which audience it is best suited for.