1. Introduction
The form a city takes and the way land is allocated can have both positive and negative consequences for population health and well-being. For example, cities with compact forms have been found to lead to better health outcomes [
1,
2,
3] and reductions in per capita emissions [
4]. City design can also be a factor in encouraging or discouraging the uptake of active transport [
5] leading to better health outcomes [
6] or locking-in car dependency [
7], increasing levels of air pollution [
8], obesity [
9], and road trauma [
10]. Due to the long time scales of urban change and the high stability of city structures [
11], we must consider current cities as both a snapshot in time but also a culmination of years of construction. Policy-makers and urban/transport planners have an opportunity to embrace strategies that proactively support safe active transport modes as facilitated by urban designs witnessed in some countries around the world. However, the rigid structure of cities makes rapid changes difficult and changes should be undertaken with due care, as the impacts will be very long-lived [
12,
13]. Planning for change requires methods that can objectively compare cities while also accounting for local context, so as to understand the associations between urban design features, transport networks, and environmental outcomes.
Urban typologies allow a city to be “read”, linking the urban morphological form with the social, demographic, and political uses of these spaces [
14]. These grew from a theoretical basis, examining the division of a city as well as the cities themselves as artefacts [
15]. Krier [
16] endeavoured to quantify how urban space typologies could be derived, starting with basic geometric shapes (squares, triangles, and circles), their distortions, and then their combination into composite street plans. Typologies are also influenced by an urban area’s original development and later growth. Rossi [
15] traced city development in the Americas through two main starting points. Cities in Latin America and New York City were based on a grid system (influenced by the Laws of the Indies [
17]), while the stereotypical Old West town originated as a main street village. Urban development evolved as cities outgrew their (now obsolete) city walls and adapted to the industrial revolution [
16]. Other theoretical typologies considered different urban elements and scales. Argan and Rykwert [
18] defined a number of typologies based on different levels—an urban scale (the configurations of buildings), a building scale (the constructed elements), and a detailed scale (the decorative elements on buildings). These theoretical typologies more often described morphology rather than function.
In order to derive function-based typologies, a number of methods have been devised. Harris [
19] used occupational and employment figures to determine a city’s most important economic activity (such as manufacturing, retail, or tourism). Other studies used economic activity data to find similar sorts of functional typologies [
20]. Finally, Bruce and Witt [
21] used a range of city statistics to group cities into clusters. While these typologies can discover a city’s main functional use, they say nothing of a city’s urban design and its potential impact on residents’ movement patterns or modes.
Contemporary Approaches
New methods to define city typologies emerged in the 1980s and 90s with the growing availability of databases of spatial data and increased computing power. Much of this work focused on road infrastructure in cities, and drew from the structural sociology field, in which groups of people were represented as part of a broader network structure. The “space syntax” of Hillier [
22] established a correlation between configurations of urban forms and variations of human interactions within it. Other recent remote-sensing-based methods depart from the pure network analysis methods to derive urban typologies. Nighttime light data has been used to categorise cities into stages of urbanisation and levels of economic activities [
23]. Urban metrics (road geometry, building dimensions and heights, and vegetation heights) have also been used to classify cities into typologies of differing periods of historical design and urban planning (i.e., 19th Century, 1950s, 1970s, etc.) [
24]. Local climate zones (LCZ) enable urban climate modelling parametrisations using metrics of building heights, street widths, and surface types fractions to classify urban areas [
25,
26].
Building on recent advances in computing power, artificial intelligence, and urban imagery, new approaches have been created to discover unique visual characteristics of cities and how they are used. For example, large numbers of geo-tagged photos have been used to detect patterns of urban usage and public perception of a number of areas’ functional and social attributes [
27,
28]. Place Pulse, a database of urban imagery using crowd-sourced classifications (including safety, beauty, and liveliness) has been built to quantify perceptions of urban areas [
29,
30] and inequality [
31]. Doersch et al. [
32] used a large number of geo-localised street-level images to discover common visual features across a number of cities.
Still, most methods described above require some amount of subjective classification of local input data, the quality and availability of which can vary widely across collection or political districts. We propose to overcome these limitations by using neural networks to find similarities in imagery of urban areas from Google and Baidu. This imagery offers universal coverage of satellite imagery and maps and nearly universal coverage of street view imagery. In addition, it provides a high consistency for map imagery (except in the case of South Korea), and street view imagery is captured using a common methodology and equipment set. This work grows from a series of projects intended to overcome these limitations by utilising a range of different types of imagery to allow analysis of urban design at a global scale. The first trains neural networks to recognise specific cities with digital street map images and to find clusters of city types and attribute the urban design’s contribution to road trauma [
10]. A second extracts block-scale metrics from maps to locate neighbourhood types [
33]. Expanding on those works in this paper, we also use street view and satellite imagery, in addition to the map imagery, to train neural networks. Paris, France is an iconic international city [
34] with widely recognisable visual elements [
32], leading many cities (including Melbourne and Sydney) to claim that they have a "Paris-end" of town [
35], or are a “Paris on the [insert name of local river]” [
36] (e.g., “Paris on the Yarra”). To illustrate the advantages and disadvantages of each imagery type, we therefore use the comparison city of Paris as an exemplar. We perform a case study using two Australian cities, Melbourne and Sydney, and determine if a “Paris-end” of town exists or can be found in these cities using three different types of imagery datasets, as well as determine what scales are most appropriate with these datasets to find typologies and what types of features are best suited towards a particular research question.
3. Results
Using 25% of the training data, validation was performed on each model. The neural network models for Google Maps (GM), Google Satellite maps (GS), and Google Street View/Baidu Street view (GSV-BSV) reached a final accuracy of 73.2% (top five: 85.0%), 99.4% (top five: 99.97%), and 43.1% (top five: 69.8%), respectively. These accuracies were calculated at the end of each epoch during the training step, testing the neural network’s skill in correctly identifying the correct city out of the nearly 1700 cities (excluding Melbourne and Sydney).
The resulting predictions from model inference of the evaluation data were analysed in various ways. First, the top 20 predicted cities for the evaluation points for each imagery dataset were calculated (see
Table 2 for GM, GS, and GSV-BSV).
3.1. Top 20 Predicted Cities
The GM (map view) neural network predictions (
Table 2a) are dominated by other Australian cities (Brisbane, Canberra, Sunshine Coast, Gold Coast, Newcastle and Lake Macquarie, Perth, and Adelaide) as well as a number of cities from Israel, South Africa, and the United States. Alternative Australian cities make up nearly 20% of the top 20 predictions for Melbourne and 17% for Sydney. Melbourne and Sydney also show strong similarities with each other with the neural network considering them similar to the same 12 cities out of the top 20 predictions.
The GS (satellite view) neural network predictions (
Table 2b) show wider divergences from other Australian cities and between Melbourne and Sydney themselves, with both often matched to Brazilian cities. Melbourne is matched to Brazil in 11% of the evaluation locations while Sydney is matched to Brazilian cities in 15%. Melbourne and Sydney show wider divergences from each other using the GS network in comparison to the GM network, only having 8 of the top 20 predicted cities in common. In diverging predictions, 4.1% of Melbourne is confused with Wellington, New Zealand, while 4.7% of Sydney is considered similar to Sevastopol, Ukraine.
The GSV-BSV (street view) neural network predictions (
Table 2c) show strong similarities between Melbourne and Sydney. In the Melbourne evaluation, just under 18% (seven of the top nine picks) are other Australian cities, while Sydney matched other Australian cities in 20.5% of the evaluation locations (and were seven of the top seven picks) and spread somewhat evenly through these other cities. In addition, 15 of the top 20 predicted cities were shared between Melbourne and Sydney.
To explore the identified differences, cities predicted for an evaluation location were plotted on maps of Melbourne and Sydney, with the colour scheme for the plots determined by the latitude and longitude of the predicted city. This colour scheme is shown in
Figure 2. As such, in the following figures, predicted cities in Australia will show up in shades of yellow, the rest of the Southern Hemisphere in greens, Asia in reds, North America and Europe in blues, and the Middle East in blue-greys.
3.2. Melbourne Evaluation
Figure 3 shows the top predicted cities (>0.1%) plotted against the Melbourne evaluation locations for the GM neural network. Further, “Paris-like” evaluation locations within Melbourne and Sydney are highlighted with black stars (22 in total, but five with probabilities greater than 50%). As can be seen, Australian cities (in yellow) show strong groupings in the inner and outer suburbs while the central business district (CBD) region shows no single strong grouping of regions or specific cities. In Melbourne’s far outer suburbs and rural areas, a wide mix of North and South American, South African, European, and Mid-Eastern cities (in greens blues and greys) with small localised clusters of each can be seen. In the CBD, a few locations are predicted as Paris, and are mostly associated with Docklands or parklands.
Figure 4 shows the top predicted cities (>0.1%) plotted against the Melbourne evaluation locations for the GS neural network with “Paris-like” locations again highlighted with a black star (one location, but 0 locations above 50% probability). Other Australian cities (yellows) show a strong grouping in the inner and outer suburbs, while the CBD region shows no single strong grouping of regions or specific cities but with a range of predictions including Miami, United States (blues) and Mendoza, Argentina (greens). In Melbourne’s far outer suburbs and rural areas, a wide mix is seen of North and South American (USA, Brazil, and Argentina), South African, European (Italy and Spain), and Mid-Eastern (Iran and Turkey) cities with small localised clusters of each. Only a single prediction of Paris, France was made by the GS neural network for any evaluation location in Melbourne (but not above a 50% probability).
Figure 5 shows the top predicted cities (>0.1%) plotted against the Melbourne evaluation locations for the GSV-BSV neural network. "Paris-like" locations are predicted in 13 locations (but only two with a probability over 50%). The overall predictions are dominated by other Australian cities (yellows) scattered widely throughout the entire greater Melbourne area. The remaining evaluation locations show no strong groupings of any predicted countries or cities. Common predictions include cities from South Africa (greens), New Zealand (yellows), and the United States and European countries (blues). The CBD again shows a wide scattering of predictions with no dominant single city or country.
3.3. Sydney Evaluation
Figure 3 shows the top predicted cities (>0.1%) plotted against the Sydney evaluation locations for the GM neural network. “Paris-like” areas are predicted in 54 locations (but only 15 above 50% probability). Alternative Australian cities (yellows) appear in the western and southeastern suburbs, while Mid-Eastern cities (greys) tend to appear in northern and southern suburbs. The CBD and central parts of the city show less single-city or regional groupings but with stronger highly localised clusters of each. Some cities commonly represented in the CBD include waterfront cities such as Hong Kong, London, Toulon, and Kaohsiung.
Figure 4 shows the top predicted cities (>0.1%) plotted against the Sydney evaluation locations for the GS neural network. The overall predictions are dominated by cities in Brazil and other South American locations (greens) in the north, west, and central regions, and Ukraine (blues) in the south. Other Australian cities are only predicted in a few locations around the city. In the CBD, predictions continue to be dominated by Brazilian cities with some more scattered predictions of cities from Japan, Haiti, and Mexico. No predictions of Paris, France were made by the GS neural network for any evaluation location in Sydney.
Figure 5 shows the top predicted cities (>0.1%), plotted against the Sydney evaluation locations for the GSV-BSV neural network. Six “Paris-like” locations were predicted (but none with probabilities greater than 50%). The results are very similar to the Melbourne evaluation. Again, the overall predictions are dominated by other Australian cities scattered widely throughout the entire greater Sydney area. The remaining predicted results show no strong groupings of any predicted countries or cities but some of the common predictions include cities from the United States, New Zealand, South Africa, and a number of European countries. The CBD shows a similar scattering of predictions with no single city or country dominating. A summary of the predicted “Paris-like” locations across all three neural networks for each city is presented in
Table 3.
3.4. What Cities Are Similar to Paris?
Utilising the confusion each neural network recorded in correctly identifying each city, using an approach from Thompson et al. [
10], we identified which other cities shared similar features with Paris. A confusion matrix [
57] was generated for each neural network and cities were ranked by the frequency that each was incorrectly identified as Paris.
The GM neural network, which achieved an accuracy of 73.2%, most commonly misidentified the following cities as Paris, ranked in order of decreasing frequency: London, GB, Berlin, DE, New York, US, Rome, IT, Los Angeles, US, Tokyo, JP, Zurich, CH, Istanbul, TR, Brasília, BR, and Munich, DE.
The GSV-BSV neural network, which achieved an accuracy of 43.1%, most commonly misidentified the following cities as Paris, ranked in order of decreasing frequency: Toulouse, Lyon, Rouen, Saint-tienne, Strasbourg, Bordeaux, Lille, Montpellier, Douai-Lens, Grenoble, Nantes, Valenciennes, Tours, and Toulon (all in France).
The GS neural network, which achieved a final trained accuracy of 99.4%, confused no other cities with Paris. In order to add some confusion, we rolled back to an earlier training iteration. At epoch 50, the neural network achieved an accuracy of 74.5% (top five: 92.7%) and most commonly misidentified the following cities as Paris, ranked in order of decreasing frequency: New York, US, Vancouver, CA, Karlsruhe, DE, Brisbane, AU, Colorado Springs, US, Genova, IT, Lisbon, PT, and Montpellier, FR.
4. Discussion
In this study, we used the question of “Is there a ‘Paris-end’ of Melbourne or Sydney” as a means of answering a broader, more important question of how the use of three different sources of available imagery might be used to identify urban typologies. There are a number of different ways to look at the large number of results resulting from the three different large datasets.
Figure 3,
Figure 4 and
Figure 5 contain an implicit assumption embedded in the colour scheme that geographically close locations are similar. While this is true for the cities that are like Paris (in
Section 3.4) for the GSV-BSV neural network, it is not the case for the other two neural networks. Conversely, the figures show that Melbourne and Sydney (for the GM and to a lesser extent the GS neural network) show localised clustering of locations that are similar to other (geographically similar) cities while the GSV-BSV network shows little of this localised clustering across Melbourne and Sydney.
In looking at the few locations that are deemed to be “Paris-like”, there are a number of common characteristics that stand out. A gallery of all of the images for Melbourne and Sydney that the GM neural network found were similar to Paris are presented in
Figure 6 and
Figure 7. There are a number of common elements in these images. Many show large parklands (in green) embedded in the cities. Orange lines of public transport (rail and tram) are also prominent as well as large water bodies (in blue). Large arterial and trunk roads run near smaller (often curving) local roads; however these local roads tend to still be larger and do not reach the small intricate layouts of many Asian cities. In addition, among the cities misidentified as Paris by the neural networks, large Western European and US cities, including London, Berlin, New York, and Rome, also feature large numbers of these elements.
The GM neural network makes predictions based on mapping imagery, capturing characteristics such as the mix and detail of public transport, green space, water bodies, and the road network structure. This includes whether the roads are grid-like, the mix of arterial vs. neighbourhood roads, and their integration with the rest of the urban form. Seven Australian cities were included in the training data (Perth, Brisbane, Sunshine Coast, Gold Coast, Newcastle and Lake Macquarie, Canberra, and Adelaide) and likely share many common planning and design standards with Sydney and Melbourne, influencing the neural network’s predictions.
Using the GS neural network, none of the evaluated locations for Sydney and only one location for Melbourne were predicted to be "Paris-like". From an overhead remote sensing point of view, there is therefore nothing about either Melbourne or Sydney that shares similar visual characteristics with Paris, or at least there are many other cities that are more similar to Paris than Melbourne and Sydney. The GS network is more strongly influenced by larger natural and topographical features than the GM network. Outside of the immediate city centres, both Melbourne and Sydney are highly vegetated, with large percentages of the built-form concealed under tree canopies and having to conform to topography. The colours of the vegetation and soils as well as how the urban form is mixed into the canopies, hillsides, waterways and oceans are highly influential. Melbourne is built around a bay and around a north–south spine of hills, while Sydney is built around the open ocean and ocean waterways as well as hilly terrain throughout the metro area. Some potential limitations in the dataset can be seen in
Figure 4. A strong north–south gradient through the plot of the Melbourne predictions suggests that the neural network detected some artefacts of the satellite imagery gathering process, such as different acquisition times of the imagery, that were not apparent to human observation. The satellite imagery also shows some disadvantages in discovering typologies through a confusion matrix approach. The final trained neural network was too accurate. Finding similarities requires some level of confusion; thus in order to find cities similar to Paris, the neural network had to be rolled back to an earlier, less accurate iteration.
Finally, as the GSV-BSV neural network only picked Paris (at over a 50% probability) for 0.01% of the evaluated locations for Melbourne and 0% for Sydney. We can be confident that from a visual street-level view, there is almost nothing about either Melbourne or Sydney that is visually similar to Paris using this type of imagery. Of the images for Melbourne, only two (out of 13) were picked with a probability of over 50% (and none out of six for Sydney). With the GSV-BSV network (galleries of “Paris-like” images for Melbourne and Sydney are shown in
Figure 8), smaller details of the cities will influence predictions. At this level of imagery, many of the natural features influential in the GS network (e.g., types and colours of vegetation or soil) will be important, but smaller details will also weigh in, such as building architecture, the width (or absence) of nature strips or sidewalks, and an overall density of streetscape features. Other influential characteristics are features that are in the urban areas but are not part of the permanent built form. For example, white vans feature in a number of images in the galleries of Paris-like predictions. At this level of imagery, the neural network will be potentially influenced as much by how the urban form is being used (especially how it is used at the time the images are captured) as the form itself. This shows the importance of taking steps in some circumstances to construct abstract features from the source images (e.g., road networks and green space for GM or image segmentation for GSV-BSV). Even with these measures, some caution should be taken with this type of imagery. The rather low accuracy rate for GSV-BSV (43.1%, top five: 69.8%) indicates that larger training datasets or perhaps fewer classification classes are needed with this type of complex imagery. When the low accuracy is analysed with a confusion matrix, it is found that the other cities that are confused for Paris are most often other French cities. This suggests that this group of similar French cities might form a better basis for a typology classification than the single city Paris. It also suggests that care is needed to balance the mix of accuracy and confusion. While the highly accurate GS neural network was unable to misidentify cities, the GSV-BSV neural network was better suited to find similarities through confusion.
Using the GM neural network approach, urban form can be evaluated. Map characteristics that are influential in grouping cities with a particular typology include extents and types of public transportation, urban green space, road network structure, water body inclusion and integration, amounts of informal unplanned open space, and density and topology influences on city structure. Some of the features included in the GM imagery that made cities “Paris-like” were a higher density of trains and trams, large broad sections of urban green space, and an integration of urban green space and waterways. Of course, while Paris was selected as the comparison city of choice, the technique makes it possible to typify the characteristics of any global city where similar imagery is available.
In Thompson et al. [
10] and Nice et al. [
33], highly abstract map imagery was found to lend itself to finding similarities between cities and identifying clusters of similar cities or inner-city neighbourhoods, as well as associations due to different urban typologies with outcomes to public health, such as road trauma or pollution levels. This type of imagery fits in well with the first urban typology scale of Argan and Rykwert [
18], the arrangements of streets and buildings in urban areas.
Using satellite imagery, natural features and the colour characteristics of rooftops, streets, soil, and vegetation feature predominantly in classifying locations within a particular typology. In
Figure 9A, satellite imagery of Melbourne shows a number of colour and terrain similarities with the GS top six predictions, namely Adelaide, Australia; Campinas, Brazil; Jundiaí, Brazil; Miami, USA; Provo, USA; and Wellington, NZ (all shown in
Figure 9). This perhaps shows that natural characteristics are more influential to what the GS neural network considers to make cities similar than the characteristics of built urban form highlighted by the GM model.
The satellite imagery fits less well into a particular Argan and Rykwert [
18] scale. The area covered by the imagery is identical to the map imagery (400 × 400 m), capturing the road and building structure of each area. However, the imagery also contains information suitable for generating typologies based on the constructed elements scale as well as the decorative elements of the area (from a vertical vantage point). To create stronger typologies, additional steps will likely be needed to push the typologies towards one or the other scales. Analysis of areas smaller than 400 × 400 m can allow details more relevant to the building or decorative scales to be emphasized. However, this might require higher resolution imagery than the 15 m resolution imagery provided by Google to ensure sufficient detail. To better capture the street and building structure, pre-processing could be applied (edge detection, mean shifts, or other computer vision techniques) to force a stronger emphasis on the urban structure rather than the details. Further additional work could be performed, such as deconstructing and reconstructing imagery, removing features (such as automotive traffic, leaving only structures), and, for targeted outcomes (such as health and social capital), using generative adversarial networks [
58], enabling comparative hypothetical typology scenarios.
Finally, in examining the results from the GSV-BSV neural network, this micro-scaled level of imagery would arguably capture the visual geography of the streetscape; what most people would say captures that which “makes Paris looks like Paris”. This imagery fits well into the detailed/decorative Argan and Rykwert [
18] scale. However, as Doersch et al. [
32] found in trying to answer the same question, overall this answer is not based on a small number of famous iconic landmarks (e.g., the Eiffel Tower or the Louvre), but on an array of widespread, smaller features. These features include elements such as cast-iron railings on balconies, grid-like balcony arrangements, distinctive street signs, streetlamps on pedestals, window balustrades, Parisian doorways, six-story Haussmann apartment buildings, and vegetation differences [
59]. Of all these micro-scaled visual elements, neither Melbourne nor Sydney contain enough to truly have a “Paris-like” district.
Additionally, as found in this study, the characteristics that make up a city on a visual street view level are a complex mix. This not only includes bigger structural details, such as buildings, roads, cars, vegetation, and street furniture, but also smaller less-apparent details, such as colours, weather conditions, road markings, and thousands of other small details. The complexity of this imagery and the subsequent low accuracy of the neural networks in identifying individual cities using it indicates that further steps are needed to use this type of imagery reliably. These steps can include training using a smaller pool of cities and using a smaller set of classifications to allow focus on more subtle differences. In addition, more aggressive preprocessing (beyond the mean shift segmentation preprocessing we used) might be needed to further reduce the complexity of the imagery, using techniques such as foreground/background subtraction or inpainting to remove detail from the images less relevant to the intended typology goal.
This project’s intended goal was to demonstrate the ability of this new methodology to compare and cluster entire cities based on the summation of smaller localised details of urban form. As such, the imagery sampling collected imagery from the entire wider city and was not restricted to the perhaps more distinctive city centres. The results reflect that focus and show one of the strengths of this technique, allowing comparisons between entire cities as a whole and allowing linkages to datasets (health, transportation, etc.) that exist at city levels. However, this also demonstrates that appropriate consideration needs to be paid to the goals of a particular analysis, and that, for example, identifying distinctive city centres in other cities might require adjustment of the sampling radius for the training data.
Future work is planned to vary these techniques and further evolve the insights gained. Inner-city comparisons will sample imagery from within cities and help answer questions such as does (wider) Paris look like (the iconic districts of) Paris? Conversely, removing all the other Australian cities from the training data will allow comparisons to be made on a strictly international basis. Cross-comparisons can also assess similarities between individual cities under different contexts (e.g., varying which other cities are included in the pool of comparison cities). Further work, based on Thompson et al. [
10], will use a confusion matrix/graph-based approach to find clusters of urban design types, based on their levels of similarity between the cities, and apply these typologies to outcomes related to public health (such as road trauma, pollution levels, and likeliness to engage in active transport.).