The field of artificial intelligence (AI) has made rapid progress in recent years. AI has received tremendous attention from academia, industry, and also the general population. Machine learning (ML) is a subfield of artificial intelligence [1
Machine learning encompasses learning without any type of supervision (unsupervised learning) and learning with full supervision (supervised learning). Unsupervised learning examines uncategorized data to discover patterns. No correct answers are given to the machine during learning. The natural patterns in data are expected to guide the machine in learning to detect key patterns and group data according to those patterns (i.e., unsupervised learning involves machines attempting to learn “on their own”, without assistance from categorized data). Unsupervised learning tasks can be solved as either clustering or association problems, depending on the application [2
]. Deep learning (DL) is a special type of machine learning that leverages multiple layers of nonlinear processing units or neurons. The rapid progress in ML can be attributed to three main reasons: vast quantities of available data, powerful computing capabilities, and improvements in algorithms, such as deep neural networks. The unsupervised deep learning method was selected for this study to find similar maps.
Recently the field of geography has turned towards AI. As a new term, GeoAI expresses the interdisciplinary combination of geography and artificial intelligence. The integration of geography with AI offers novel approaches in addressing a variety of problems in the natural environment and human society [1
Esri, a leading company in GIS software, has invested heavily in these emerging technologies and created a new research and development (R&D) center in New Delhi focused on AI and deep learning applied to satellite imagery and location data [3
]. Esri applies computer vision to geospatial analysis object detection and semantic segmentation, for example, to extract road networks from satellite imagery. The software ArcGIS includes built-in Python raster functions for object detection and classification workflows using CNTK, Keras, PyTorch, fast.ai, and TensorFlow, as mentioned by Rohit Singh, head of the R&D center [4
The use of AI in conservation, for example, is supported by Microsoft AI for Earth. The company also actively awards grants to support projects which use AI for Earth and apply AI in changing how people and organizations monitor, model, and manage the Earth’s natural systems [5
The Copernicus Land Monitoring Service Urban Atlas provides information about comparable land use across Europe and land cover data for functional urban areas (FUAs). A functional urban area consists of a city and its commuting zone. Functional urban areas, therefore, consist of a densely inhabited city and a less densely populated commuting zone whose labor market is highly integrated with the city [6
The older Urban Atlas data set is from 2006, the newer dataset is for 2012 reference year. The Urban Atlas 2018 is recently available, but it is not completed nor validated yet. The presented research used the Urban Atlas 2012 data set. The valid year is recorded in each metadata description of the downloaded set. The valid period varies from 2015 (2014 only for one set) up to 2018 (around 300 datasets).
The European Urban Atlas was designed to compare the land use patterns in major European cities and hence provide benchmarking of these cities. The data are frequently used in various kinds of research such as the identification of the relationship between human connectivity dynamics and land use [7
] or assessing green infrastructure connectivity of European city regions (Manchester, UK; Ruhr, Germany; Copenhagen, Denmark) [8
]. The Urban Atlas data was also used for comparison by patch perimeter metrics of four large metropolitan regions in southern Europe: Lisbon (Portugal), Barcelona (Spain), Rome (Italy), and Athens (Greece) [9
]. Without this cross border data set it is not possible to run these types of research.
Several attempts and methods were used during the investigation to find similar cities. Street networks can be thought of as a simplified schematic view of cities, which capture a large part of their structure and organization [10
]. Research by Louf and Barthelemy (2014) classified cities according to their street patterns [11
]. They created a dendrogram of clusters which identified at a lower level of the classification that most European and American cities in a sample of 131 cities fall in their sub-category; although, they encountered the problem of too high a similarity of street networks in most cities. They tried to solve this problem by focusing on the geometry of the city blocks that the street networks create. As a result of this work, cities were divided into four groups based on the shape and size of city blocks.
Research by Boeing (2018) was also oriented on the street network [12
]. They downloaded and processed 27,000 US street networks from OpenStreetMap at the metropolitan, municipal, and neighborhood scales. The same author subsequently developed a new indicator of orientation-order that quantifies how a city’s street network follows the geometric ordering logic of a single grid. The study examined street network orientation, configuration, and entropy in 100 cities around the world. It measured the entropy of street bearings in weighted and unweighted network models, along with each city’s typical street segment length, average circuity, average node degree, and the network’s proportions of four-way intersections and dead-ends [13
]. Moreover, Boeing suggested common dimensions for measuring the complexity of urban form and design [14
]. The dimensions are temporal, visual, spatial, scaling, and connectivity.
Courtat et al. (2011) made a static analysis of several French towns and hypothesized that the development of a city follows a logic of division or extension of space [15
]. Sandu prepared a comparison of only two cities: Iasi in Romania and Lyon in France. The exploration is based on Urban Atlas data [16
]. The comparison emphasized urban development of these cities under different political and socio-economical regimes after World War II. The urban areas from Romania tried to copy the Western model of the capitalist city, but without a solid legislative and financial back-up and confronted with socialist heritage, it resulted in a hybrid urban development [16
Koperski (1995) presented the use of spatial association rules as one of the data mine techniques for investigation of cities. He looked for association rules for cities in British Columbia, a province in Canada. Frequent occurrences describe the cities and their features like water, roads, and administrative boundaries. The frequent itemsets are based on topological relations (inside, contains, intersect, and equal) of features and the city [17
]. Cities that belong to frequent itemsets can be assumed similar.
The finding of similar cities based on Urban Atlas data was processed by Janousek’s diploma theses [18
]. The areas of towns were divided by circular sectors into eight sections and eight circles. The selected categories from Urban Atlas were intersected with circular sectors to calculate the area index to express the coverage by the category in each sector. Based on the 8 × 8 matrix (together 64) values of indexes, the correlation and hierarchical clustering were processed. The base set was 100 European cities from 50,000 to 200,000 inhabitants.
The article presents the use of the Painters pre-trained neural network as a new method for quick results in identifying similar cities according to land use images without considering the size of the cities, their histories, appearances, layouts or description. The benefit of Orange software is its simple application in this task. Validation of this new concept is expected in the future.
3. Results of Look-Alike Cities from Urban Atlas
After preliminary testing with maps from the map portal, the following experiment using land use of cities was accomplished. The second experiment also discovered that the existing pre-trained neural network Painters, which is outside cartography, is applicable for categorization of land use maps with data mining tasks such as clustering, similarity, and finding the nearest neighbor. A detailed description of the second experiment follows.
GIS software ArcGIS for Desktop v. 10.6 [31
] was used to process the downloaded data from Urban Atlas (shp format). The methodology from Section 2.2
was applied. The circular area was defined using proper central point and buffer operation in ArcGIS. Each city was clipped into a circular shape with a different diameter. The extent of these areas considered all parts of the cities, including continuously connected suburban areas. The definition of diameters was made with punctual consideration to cover an important part of cities including partial surrounding. If the shape of the city was elongated, the extent contained meadows, fields, and forests in the greater portion. The Urban Atlas joins the common unified color legend together with downloaded data. In ArcGIS, the same color legend (UrbanAtlas2012.lyr) for land use categories was applied to the previously prepared circular extracts of cities. Maps were exported into JPG images with a resolution of 300 dpi from ArcGIS. The final data set contained 787 images of circular maps. Then the data mining process continued in Orange software.
For processing in Orange, the same workflow used in preliminary testing was applied (Figure 4
). In Section 2.3
, the workflow was able to find the nearest neighbor pairs (triplets, etc.) of images; for example, map extracts in preliminary testing. This workflow’s advantage is that it may be used with any other data set in Orange, and it is therefore very effective. Only the working directory must be set to process new data in the Import Images widget. Subsequent widgets in the workflow calculate new data automatically. To process around 800 pictures of land use maps of cities, the software requires several minutes because the embedder sends data to a server containing the pre-trained neural network Painters. The image descriptors are then returned shortly after. Image descriptors is the matrix of 787 × 2048 cells where each row is descriptor for one picture.
Previous research investigated only a subset of 100 European cities [26
]. The results from this subset were taken and compared in the present study with a new, larger dataset. The previous processing applied hierarchical clustering with cosine distance and Ward’s method of linkage. The first resultant pair was the city of Maribor (Slovenia) and Bern (Switzerland). They were also the nearest neighbors in the larger dataset in this new investigation. The widget also offers a parameter which can find two or more neighbors. Results are pairs, triplets, quartets, etc., of look-alike cities. The nearest city to this pair was Pontevedra (Spain), where curve-shaped land use areas dominate. Image analysis revealed the curved land use shapes. The next pair of cities was Warwick and Cambridge in England, also found using the Neighbor widget. The next nearest city was Preston in England. These three cities were similar in shape due to their surrounding pastures. History and geographical conditions may also affect the similarity of cities. Frequently, cities from the same country are similar due to their historical evolutions and geographical locations, as well as the specific evolution of industry in that country. Another resultant pair was the Czech cities České Budějovice and Hradec Králové. These cities are typified by a regular mix of continuous urban fabric with discontinuous medium density urban fabric in isolated patterns (former small settlements which have merged to become city districts).
The present study followed with a systematic selection of cities and the identification of similar cities using the Neighbor widget (Figure 4
). The nearest neighbor analysis was selected as the main method in finding look-alike cities. The resultant pairs of cities from different countries were the most interesting. Examples are presented in Figure 8
and the other 12 pairs are in Appendix A
An interesting nearest neighbor pair was Zalaegerszeg (Hungary) and Žilina (Slovak Republic). Both cities contain several large and continuous industrial areas (violet color), and the urban edges touch many forest areas. A continuous urban fabric over an urban fabric with low density with low occurrence prevails in the centers (density > 80%) of these two cities (Figure 8
). Both cities were formerly under a communist regime, which is perhaps a contributory reason for their similarity. In addition, the size of both cities is from the same category under 100,000 inhabitants. Žilina has 81,000 and Zalaegerszeg has 61,600 inhabitants. Both are located near the river and absorbed many surrounding villages in the second part of the 20th century. It is visible by isolated kernels of the continuous urban fabric. Both cities have local importance by their industry.
shows also shows another interesting pair: Novi Sad (Serbia) and Tarbes (France). In both cities, a rectangular array of patterns in the urban fabric dominates. Both cities have only a single, dominant center with a high-density urban fabric (historical center with old buildings). A discontinuous urban fabric (density 50–80%) covers a greater area than a continuous urban fabric. Forested areas are few or absent in the surroundings of these cities. Both cities are surrounded mainly by arable land and fields.
The results for three look-alike cities are shown in Figure 9
. The cities Odense (Denmark), Metz (France), and Münster (Germany) represent cities with a typical sprawl towards several parts (suburbs at the peripheries) and a non-homogenous layout. The non-typical layouts of these cities are due to the existence of industrial and commercial areas in their centers and across the entire city. The industrial areas are not very continuous and are spread out.
A high occurrence of discontinuous medium density urban fabric can be seen in opposition to a continuous urban fabric. This is typical of Western European cities, but not Eastern European cities (e.g., cities of Zalaegerszeg and Žilina). Forests exist in small isolated pockets. The land use pattern is a mosaic of various types. Salzburg (Figure 3
) also approximates the third group. Neighbor cities can thus be explored by defining groups of similar cities according to the typical character of their land use arrangement.
In addition to the aforementioned pairs and triplets, some typical pairs are presented in Appendix A
. Firstly the hierarchical clustering was made using cosine distance and Ward’s method. The selection of 12 pairs was taken from the lowest level of the dendrogram. They have the shortest cosine distance from each other. The pairs are across European countries. The similar patterns of pairs are visible. Principally, the nearest neighbor could be found for each city, as presented in the previous paragraphs of the article. The distance between the nearest pair naturally varies and sometimes is larger than the 12 pairs shown in Appendix A
The findings of similar cities will be beneficial for governmental cooperation between partner cities or twinned towns, as cooperation is frequently established between cities in Europe. Finding new similar couples creates the opportunity to join new ones. The government of the cities share their experiences on how to plan and manage the cities. When the cities are similar from the land use form there is an expectation of similar urban planning tasks related to living, public, commercial, and industry areas. Urban designers can also utilize these findings.
Moreover, the presented research brings an idea to a new branch that is labeled ‘science of cities’ [32
]. This science has the aim of understanding and modeling phenomena taking place in cities. Urban morphology, the activity of inhabitants, the residence location choice, urban sprawl and the evolution of urban networks, and location of industry areas are important processes that have been discussed for a long time. The comparison of look-alike cities brings a new point of view to understand the city structure evolution.
The presented experiment try to find the nearest neighbor cities according to spatial arrangements of urban land use structure. The pre-trained neural network has no preliminary information about cities and classification categories. The neural network also does not need previous information about classification rules. Classification by neural network is titled “unsupervised learning”. It is a big advantage of the neural networks. Other investigations tried to state the metrics or one detailed feature, like streets or building blocks [12
], and evaluate cities according to these metrics and feature. It is evident, that presented findings of nearest neighbor are predominantly based on the size and shapes of areas, colors, and patterns of land use.
Other metrics like circular sectors by Janousek consider orientation [18
]. The circular sector has strict geographical orientation. Subsequently, similar cities are similar by the existence of the same land use in the same orientation. The advantage of using a neural network is that the evaluation does not depend on the direction in which a certain structure is oriented in the city (industrial areas in the south, green areas in the north, etc.). The presented pre-trained neural network does not have this type of limitation. This method takes into account multiple pattern distributions.
Boeing measured the complexity of urban form [14
]. In comparison with Boeing’s research, the presented method belongs to two dimensions offered by him: spatial and scaling dimension. The spatial dimension is described by land patterns and grain, particularly in terms of diversity, block size and shapes, the spatial distribution of urban forms, surface textures, and fractal urban forms. From this point, the land use data from Urban Atlas express static information about the city. Other dimensions for complexity measure, like temporal and visual dimensions, consider human behavior (population growth and decline, traffic jams, etc.) and perception of the built environment (sunlight patterns, tree canopy, etc.).
The reliability of findings by neural network and nearest neighbor depends on the quality of the input data. The first point is setting the center of the circular extract in the city in the data preparation phase. It is a user-defined setting. The same criteria were applied to historical centers, historical buildings, etc., for all cites. Therefore, the influence could be minimal. Higher influence on the reliability of the result considers the surroundings of the cities, especially when the shape of the town is prolonged. The future experiments could prepare data in a smaller diameter to cover only the inner inscribed circle.
Data mining software Orange has intuitive interface thanks to the visual form of design of workflow. The workflows are comprehensible. Orange software and the Image Analyst embedder Painters are very simple to use and useful in data processing. In addition to these tools, the investigative steps used in the present study may be employed as a practical lecture for university studies in geoinformatics and urban planning [33
]. Orange software is a good choice for specialists in geoinformatics that have base knowledge and skills in data mining. Data preparation by ArcGIS software is routine and could be assumed as the expectable advanced skill of geoinformatics.
The use of machine learning in discovering patterns of land use in cities opens up new knowledge. The processing of nearly 800 European cities revealed examples of look-alike cities that could be described according to a typical arrangement of land use as in the presented examples. Cities with mosaic patterns and cities with dominant land use categories (some types of urban fabric) were identified. The arrangement of commercial and industrial areas is also important. Some cities are typified by prevailing curve-shaped areas, while others demonstrate land use according to strict rectangular shapes. Some cities are compact, while others tend to contain more isolated districts and suburbs. The similarity in cities is sometimes determined by the surrounding nature and arrangement of green areas, forests, and arable land. In addition to the nearest neighbor pairs mentioned in the text, some typical pairs are presented in Appendix A
. These pairs were taken from the lowest level of dendrogram as a result of hierarchical clustering.
The presented experiment is an example of the democratization of data and software. The data from the Urban Atlas and pre-trained neural network are examples of freely available data and software (data mining Orange software). When the data and software are freely available, the original producers can suppose much wider and various areas of utilization for problem solutions than could natively imagine. Sometimes the domains of utilizations are unexpectable. The democratization of data and software is beneficial. In the case of Urban Atlas data, the primary aim is comparison and investigation of the land use change patterns. The findings of look-alike cities were not originally supposed.
The author will continue with a deeper investigation of the Copernicus data for Europe. The results of the present study show promise for additional research on urban structures. The author plans to validate these results. Identifying similar cities should be done with care, as some pairs were not especially similar. During analysis, the historical parameters, size, and natural conditions of the cities should be removed. Some results from the land use patterns could be used as an orientation for determining the clusters and groups of common characteristics in cities according to these land use patterns. Clustering, the classification of cities, and allocation of typical representative cities for each category need more work in combining other characteristics of cities. The author assumes that some cities will remain outside clusters because of their specific land use arrangement. The new method applied in the present study is only one method for identifying cities which may be similar.