Clustering Cities over Features Extracted from Multiple Virtual Sensors Measuring Micro-Level Activity Patterns Allows One to Discriminate Large-Scale City Characteristics

Muñoz-Cancino, Ricardo; Ríos, Sebastián A.; Graña, Manuel

doi:10.3390/s23115165

Open AccessArticle

Clustering Cities over Features Extracted from Multiple Virtual Sensors Measuring Micro-Level Activity Patterns Allows One to Discriminate Large-Scale City Characteristics

by

Ricardo Muñoz-Cancino

^1,*

,

Sebastián A. Ríos

²

and

Manuel Graña

¹

Computational Intelligence Group, University of Basque Country, 20018 San Sebastián, Spain

²

Business Intelligence Research Center (CEINE), Department of Industrial Engineering, University of Chile, Beauchef 851, Santiago 8370456, Chile

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(11), 5165; https://doi.org/10.3390/s23115165

Submission received: 6 March 2023 / Revised: 23 May 2023 / Accepted: 24 May 2023 / Published: 29 May 2023

(This article belongs to the Special Issue Computational Intelligence and Cyberphysical Systems in Sensing)

Download

Browse Figures

Versions Notes

Abstract

The impact of micro-level people’s activities on urban macro-level indicators is a complex question that has been the subject of much interest among researchers and policymakers. Transportation preferences, consumption habits, communication patterns and other individual-level activities can significantly impact large-scale urban characteristics, such as the potential for innovation generation of the city. Conversely, large-scale urban characteristics can also constrain and determine the activities of their inhabitants. Therefore, understanding the interdependence and mutual reinforcement between micro- and macro-level factors is critical to defining effective public policies. The increasing availability of digital data sources, such as social media and mobile phones, has opened up new opportunities for the quantitative study of this interdependency. This paper aims to detect meaningful city clusters on the basis of a detailed analysis of the spatiotemporal activity patterns for each city. The study is carried out on a worldwide city dataset of spatiotemporal activity patterns obtained from geotagged social media data. Clustering features are obtained from unsupervised topic analyses of activity patterns. Our study compares state-of-the-art clustering models, selecting the model achieving a 2.7% greater Silhouette Score than the next-best model. Three well-separated city clusters are identified. Additionally, the study of the distribution of the City Innovation Index over these three city clusters shows discrimination of low performing from high performing cities relative to innovation. Low performing cities are identified in one well-separated cluster. Therefore, it is possible to correlate micro-scale individual-level activities to large-scale urban characteristics.

Keywords:

activity patterns; clustering models; geotagged data; City Innovation Index

1. Introduction

The relationship between the activities conducted by city residents and large-scale city characteristics is a complex phenomenon that has received particular attention from researchers and policymakers [1,2]. Individual-level activities such as transportation preferences [3,4], consumption patterns [5,6], communication patterns and social interactions [7] can significantly impact large-scale city characteristics, such as innovation generation capability. Transportation preferences affect traffic congestion and pollution, affecting the ability of cities to innovate and to compete in a globalized economy. Shopping habits can determine which business areas flourish, thereby boosting the capacity for innovation in related industries. On the other hand, large-scale urban characteristics can constrain or facilitate the activity of their inhabitants. For example, a well-connected public transport network reduces the use of private vehicles. Policies implemented by city managers supporting specific industrial sectors can promote the technical specialization of inhabitants [8]. Furthermore, enhanced access to culture, networking and entertainment can determine social relationships [9,10]. The interplay between micro and macro factors is complicated by their interdependence. The increasing availability of digital data sources, such as social media, mobile phones and a wide variety of virtual sensors, has opened up new opportunities for the quantitative analyisis of the relationship between individual urban activities and large-scale urban characteristics. These data sources are like virtual sensors that capture information on individual behavior providing a more comprehensive and dynamic understanding of urban life [11].

This study investigates the relationship between the activity of the city inhabitants observed through their social media digital traces and large-scale characteristics of the city, such as its capacity to generate innovation. In particular, the following research questions are dealt with in this paper.

1.: Can meaningful city categories (clusters) be detected based on micro-level city activity patterns derived from individual citizen activity data?
2.: Is there any relationship between the clusters obtained and large-scale city characteristics such as the City Innovation Index [12]?

The study analyzes a worldwide city dataset, obtaining city features based on the decomposition of city spatiotemporal activity patterns of social interactions into latent topic patterns. The dataset covers 17 years of city residents’ digital activity gathered from geotagged social media digital traces. The city features are used to cluster cities with similar behaviors. We explore several state-of-the-art clustering algorithms, selecting the optimal one on the basis of the Silhouette clustering quality measure.

The article is organized as follows: Section 2 provides information on previous works and applications. Section 3 describes the data sources used to build the experimental dataset. Section 4 presents the methodology and experimental setup. Section 5 gives the details of the experimental setup. Section 6 presents and discusses the computational experiment results. Section 7 provides a discussion of results in relation with other recent publications. Finally, Section 8 gives our conclusions and some ideas for future research work.

2. Background and Related Work

2.1. Large-Scale Urban Characteristics, Innovation and Public Policies

Large-scale urban characteristics, such as the capability for the generation of innovations, are important for urban development and growth [1,13]. Cities that provide environments fostering innovation are more likely to attract investment and new businesses, which leads to higher job market growth and economic prosperity. Boosting the ranking of the city on these large-scale urban characteristics demands a strategic approach that involves public policies and regulation. For example, governments can create incentives and tax reductions [14,15] and invest in developing infrastructure encouraging innovation, such as high-speed internet access [16,17], co-working spaces [18,19], public transportation and social and recreational areas [20]. City Innovation Index rankings allow one to compare the innovation level of cities across the world [21]. These rankings usually involve indicators such as research and development, entrepreneurship, infrastructure, cultural assets, technology transfer and market quality as factors supporting innovation. These rankings make visible and highlight cities’ strengths and weaknesses regarding innovation that provide valuable information for policymakers in charge of developing public policies that help improve weak aspects of the city.

2.2. Data-Driven Policy-Making Using Digital Traces

Recent literature has shown the potential of multiple data sources to characterize the interaction between humans and their environments. This research uses social media geotagged digital trace data sources to describe individuals’ behavior. In order to understand their ability to identify novel human behavioral patterns, it is important to distinguish how ubiquitous social media platforms are. Mobile phones are the technology with the fastest adoption rate in history, even in developing countries, where ownership spans more than 91% of the population [22,23]. Over 60% of mobile phone users access social media platforms [24]. This percentage is rising in developed countries; for instance, 90% of the US population is actively engaged in social media [25]. Mobile phone and social media data can characterize activities such as transportation habits or the practice of outdoor sports.

Studies have used geo-crowdsourced data from multiple resources, such as Foursquare [26], geotagged tweets [27,28,29], cell phone records [30,31,32,33,34], geotagged Flickr photos [35] and geotagged Chinese social media messages [36], to understand complex human activity patterns. These patterns are important for public policy decisions related to urban phenomena such as public transport [37], traffic flow [38], flood risk management [39] and urban planning [40,41]. Human behavioral patterns have also been used to measure the effectiveness of pandemic policies [42,43].

However, there are methodological challenges in translating massive data sets into valuable insights, such as aggregating raw data into mobility patterns to monitor quarantine policies [43]. Despite these challenges, the potential of using these data sources to support policy-making is tremendous.

3. Data Description

This study uses city activity patterns from the social media dataset described in a recent PhD dissertation [44]. Table 1 gives details of the dataset. It contains over 32 million geotagged urban activity records (tweets, images, check-ins) from various social activity platforms collected over 17 years from 127 cities all over the world. Although each source dataset has a different schema, the available fields are used to avoid duplicate urban activities. The collated data used in this paper are available from the open data repository https://doi.org/10.5281/zenodo.7949307 (accessed on 22 May 2023).

City activity patterns were defined as the count of activity events in a period of a week with a time resolution of one hour; hence, each city has

17 \times 54

weekly activity pattern vectors of size

7 \times 24

. Topic analysis has been applied [44] to obtain topic representative activity patterns. Via optimal topic analysis, three such topic representative activity patterns were found for each time slice of 3 years. Each city was characterized by the mean and standard deviation of the parameters, distributing the linear decomposition of its activity patterns into the topic representative activity patterns. In this way, each city is characterized for a

k \times s

matrix, where

k = 3

is the number of city activity patterns

p e r

time slice and

s = 7

corresponds to the number of time slices.

Additionally, this study uses the City Innovation Index [12], an annual quantitative index to rank the most innovative cities worldwide. The quantitative index is based on cities’ cultural assets, human infrastructure and networked markets. Cultural assets refer to how culture is experienced within cities and considers arts districts, civic institutions, museums, music events, galleries, political protests, books, media, availability of information and sports. Human infrastructure includes the infrastructure deployed in the city for mass transit, finance, universities, hospitals, rail, roads, law, commerce, start-ups, healthcare and telecommunications. Finally, networked markets measure a city’s influence and connections in global markets, considering geography, economics, exports and imports, technology, market size, geo-political aspects and diplomacy.

Finally, this study use the definition of city/town provided by Simplemaps [53], which considers a city/town any inhabited place as determined by U.S. government agencies. The location of cities and their respective centers were obtained from the World Cities Database provided by the same company. We include cities with more than 1 million inhabitants or country capitals in the study. In addition, we define a limit of the ten most populated cities in the country for countries where many cities meet the above-mentioned conditions.

4. Methodology and Experimental Setup

Data Preprocessing

The main steps of the computational process leading to the city clustering and interpretation are graphically depicted in Figure 1. The process starts with the integration of real-world data from open data sources described above. A painfully careful and time-consuming effort has been devoted to curating the data, assessing non-duplicate records of events and transforming them into the weekly activity patterns that are the data used in the topic analysis that is used to identify topic representative activity patterns. Here, topics are equated to generic latent activity patterns which can be interpreted as modeling the behavior of some segment of the population. In this computational exploration, we found topic representative activity patterns for business/office work, leisure activities and shopping time. Figure 2 shows the topic representative activity patterns identified in this process in seven time slices of the data, accounting for time variability of the citizens’ behaviors. After the identification of the topic representative activity patterns, each weekly activity pattern can be expressed as a linear combination of them. The coefficients of this linear combination are the features in the latent space describing the weekly activity patterns. The aggregated city features are computed as the means and standard deviations of the weekly activity pattern features. This methodological approach is discussed in detail elsewhere [44]; here, we focus on the last steps of the process, namely the clustering of the cities on the basis of the aggregated city features.

5. Experimental Setup

The clustering stage seeks to find groups of cities based on their multi-temporal city activity patterns. State-of-the-art clustering models are evaluated based on multiple model training using bootstrap samples in order to select the best model based on its Silhouette Score [54]. In order to provide an interpretation for the identified city clusters, we study the distribution of the City Innovation Index for each cluster discussing the links between individual-level urban activities and large-scale urban characteristics. The sequence of computational processes is as follows.

The first step computes city features as the average and standard deviation of the coefficients of the linear decomposition of the weekly activity patterns and sorts them into the topic representative activity patterns extracted from each time slice of the dataset; thus, each city would be described by $2 \times k \times s$ features, where $k = 3$ is the optimal number of topics selected as described elsewhere [44] and $s = 7$ is the number of time slices. These features are further compressed by averaging over the time slices. Therefore, each city is described by $2 \times k$ features for the ensuing clustering experiments.
The second step is the comparison among state-of-the-art clustering algorithms, namely K-Means, Mini Batch K-Means, Agglomerative Clustering, Spectral Clustering, BIRCH and Gaussian Mixture model. Clustering is repeated 30 time with each algorithm and the number of clusters is set in the range between 2 and 20. The best clustering algorithm and optimal number of clusters are selected using the Silhouette Score over the 30 repetitions. The optimal number of clusters corresponds to the maximum of the average Silhouette Score. The best clustering algorithm is selected based on paired comparisons using the non-parametric Wilcoxon test over the Silhouette Scores achieved along the number of cluster explorations.
The third step is the computation of the distribution of the City Innovation Index and other large-scale urban indices over the cities included in each cluster of the optimal clustering solution found above. The boxplot visualization of the distribution per cluster shows that the clusters effectively discriminate between cities according to large-scale city indices.

6. Results

The presentation of results begins in Section 6.1 with an exploratory analysis of the geotagged digital traces discussing how they represent the human activity in the cities. Then, Section 6.2 reports clustering results based on city features discussed above. This section compares multiple clustering models, selecting the optimal one as discussed above. In Section 6.3, the optimal clusters are characterized based on their geographic location. Finally, Section 6.4 discusses the discrimination of large-scale city indices between the identified optimal clusters.

6.1. Data Exploration

For visual assessment, Figure 3 shows the density of the geotagged urban activities gathered in the experimental dataset for a sample of cities. For each city shown in the figure, only activities within a 30 km radius are considered to be associated with the city. It can be appreciated that registered activities have a high concentration close to the urban centers of each city. For example, in Amsterdam, we observe an area of high activity in the surroundings of the Amsterdam Centraal Railway Station and other locations of high concentration of people, such as Leidseplein Square, a buzzing nightlife hub surrounded by bars and restaurants. In the case of Manhattan, although the activities are distributed throughout practically the entire island, sectors such as Times Square, the Rockefeller Center and the One World Trade Center stand out as high-activity areas. Less touristic cities, such as Tampa, also show concentrations of activity in areas of importance to the city, as seen in the image: Downtown, The Florida Aquarium and Tampa International Airport.

It is necessary to highlight that the computed weekly activity patterns reflect only the behavior of the population that generated the digital traces. This research does not have demographic information on users to profile them. Therefore, it cannot be ensured that the sample of users is fully representative of the city population.

6.2. City Clustering

The study explored the clustering performance of state-of-the-art algorithms, namely K-Means, Mini Batch K-Means, Agglomerative Clustering, Spectral Clustering, BIRCH and Gaussian Mixture model. In order to determine the optimal number of clusters, clustering was carried out varying the number of clusters between 2 and 20 for each algorithm. Furthermore, to assess the robustness of the results, 30 repetitions of the clustering process were carried out for each algorithm and number of clusters.

Figure 4 shows the average and 95% confidence interval of the Silhouette Score obtained for each of the models. These results show that the optimal clustering is achieved with K-Means when the optimal number of clusters is set to three. Overall, K-Means showed the highest Silhouette Score values for all settings of the number of clusters. This result was verified by pairwise comparison against the other models using the Wilcoxon test, where K-Means showed statistically significant differences with p-values well below 0.05. The details of the comparison using the Wilcoxon test are presented in Table 2. The table shows the result of applying the test to compare the Silhouette Score of the K-Means against the other clustering algorithms. The comparison is presented for models with

k = 3

and

k = 2

and the rest are omitted because other scenarios are self-evident.

Additionally, Figure 5 (left) shows the Silhouette Score for each city displayed along the Y axis in decreasing order for each cluster. The Silhouette Score of a city measures its similarity to the cluster it belongs to compared to others. The score ranges from −1 to 1, where 1 indicates that the city is very well fitted to its own cluster and poorly matched to neighboring clusters. A score of −1 indicates the opposite, with the city poorly matched to its own cluster and well-matched to neighboring clusters. A score of 0 indicates that the city is equally similar to its own and neighboring clusters. The optimal K-Means clustering achieves an average score of 0.38, with only a few cities obtaining a score of zero or below. These results indicate that, with a few exceptions, the cities were optimally assigned to their corresponding clusters. On the other hand, Figure 5 (right) shows the spatial embedding of the cities computed via a two-dimensional t-SNE (t-Distributed Stochastic Neighbor Embedding) projection from the six-dimensional space of the city features computed from weekly activity patterns. The separation of the clusters demonstrated in the embedding visualization confirms the quality of the clustering results.

6.3. City Clusters

Figure 6 visualizes the plot of the cities in the convex region defined by three of the city features, namely the averages of the linear decomposition coefficients of activity patterns, discarding the standard deviations for this visualization. Each color indicates a different cluster. In addition, the names of some cities within each cluster are shown. Thus, it can be seen in Figure 6a that Cluster 0 (C0) includes cities such as Campinas (Brazil), Lahore (Pakistan), Jeddah (Saudi Arabia) and Lagos (Nigeria). Cluster 1 (C1) includes cities such as Porto Alegre (Brazil), Athens (Greece), Ecatepec (Mexico) and Atlanta and Boston (USA). In Cluster 2 (C2), we find cities such as Seoul (South Korea), Santo Domingo (Dominica Republic), Nairobi (Kenya), Stockholm (Sweden), Washington (USA) and Birstall (England). In this figure, separation of the red-colored cluster is clearer than the separation of green and blue clusters.

Enhanced cluster separation is shown in Figure 6b. In this figure, cities were filtered based on their Silhouette Score, leaving only those with a score higher than the median score of each cluster. Thus, we visualize the central core of each cluster, enhancing the separation between them.

A geographical representation of the clusters in Figure 7 shows the spatial distribution of the cities that belong to each cluster. The geographical location of Cluster 0 stands out, whose 20 cities are located mainly in the Middle East, South Asia and Africa. The cities corresponding to Cluster 1 and Cluster 2 are distributed in practically the same territories, except that we did not find any of the 32 cities of Cluster 1 in East Asia and Oceania. Finally, Cluster 2 stands out because many of its 92 cities are in central Europe. Cluster 1 appears to include big administrative centers, many of them the country’s capital city.

6.4. City Clusters and The City Innovation Index

Figure 8 shows the distributions of the City Innovation Index ranking for the year 2021 according to the clusters found in our analysis. Figure 8 (left) is a boxplot where each data point corresponds to a city, indicating the position in the innovation ranking (lower is better). Figure 8 (right) corresponds to the cumulative distribution where population percentage is equated to probability. In this figure, the difference between the cities that form Cluster 0 and the rest of the cities stands out. The cities in Cluster 0 are systematically in the last positions of the innovation ranking and half of the cities in this cluster are in the last quintile of the innovation ranking. Concerning the rest of the clusters, no significant differences between them are observed in the location of their cities in the ranking. In both groups, half of the cities are within the first 150 most innovative cities. Our conclusion from this analysis is that it is possible to discriminate poorly from highly performing cities in the City Innovation Index on the basis of the information provided by the activity patterns of the city.

Additionally, we present the distribution of the factors that make up the City Innovation Index for the cities under study. Figure 9 shows how the Cultural Assets (a), Human Infrastructure (b) and Networked Market (c) indexes are distributed. This disaggregation does not offer variations in interpretation. The cities of Cluster 0 are easily discriminated from the rest in relation to those indices. In this case, these cities have less cultural capital, their infrastructure is also far from world standards and their markets need improvement in order to be sufficiently connected and integrated with the rest of the world. On the other hand, Clusters 1 and 2 show little differences between them when compared based on any of these three indicators.

Finally, Table 3 shows the list of the core cities of each cluster (those with Silhouette Score above the median) ranked by their City Innovation Index. The Table shows the name and country for each city, its rank according to the City Innovation Index and the clustering features given by the average linear decomposition of city weekly activity patterns into topics (Columns T0, T1 and T2).

Regarding Cluster 0, the cities that belong to it have the last ranking positions and the coefficients for topic 0 dominate the other coefficients, i.e., T0 > T1 and T0 > T2. Recalling Figure 2, topic 0 has two different activity patterns of working days and weekends. During the week, the activity of this pattern increases as the day progresses and presents two clearly defined peaks. The first activity peak is observed at noon and then descends to a local minimum at 15:00 h. Afterward, the activity reaches its maximum peak around 21:00 h. The main cities of this cluster are located in Africa (South Africa and Nigeria) and India.

Regarding Cluster 1, the first six cities in the table are in the top 100 for innovation. From the seventh onwards, the cities fall in the center of the innovation ranking. However, some cities from developed countries and sectors such as Japan, the USA and Scandinavia already appear as core cities in this cluster. In this cluster, coefficients for topic 1 dominate the other coefficients, i.e., T1 > T0 and T1 > T2. This topic shows relatively low activity during the week and the most significant activity occurs during the weekend. During the week, the activity increases between 09:00 h and 21:00 h without significant variations in activity during this period. During the weekend, the activity increases from 09:00 h, peaking at 15:00 h. and then declining. This fall is more abrupt on Sunday, leaving little activity until dawn the next day.

Finally, Cluster 2 is made up of the world’s greatest cities. This cluster includes Tokyo, New York, London and Shanghai, among other cities of outstanding global importance. For this same reason, the main cities in this cluster are in the top places in the innovation ranking. In this cluster, coefficients for topic 2 dominate the other coefficients, i.e., T2 > T0 and T2 > T1. This topic also shows different activity patterns during the week working days and on the weekend. During the week, the activity is concentrated between 09:00 h and 18:00 h, with a slight drop in activity around noon. During the weekend, the pattern presents a similar structure. However, after the 09:00 h peak, the activity begins to decline over the rest of the day.

To extend the previous analysis, we delve into the relationship between clusters and city population. Table 4 contains, for each cluster, the joint distribution of population and innovation ranking. Both variables are split into quartiles fitted using the whole dataset. The population is shown for thousands of people, while ranking goes from first to last. It is confirmed that Cluster 0 is mainly composed of cities in the last positions of the ranking, with 76% of them in the last quartile of the ranking. Notice that 35.3% of these low ranking cities have between 3 and 8 million inhabitants. On the other hand, Cluster 1 has 66.6% of its cities positioned in the first two quartiles of the innovation ranking. However, most of these are medium/small cities. Finally, Cluster 2 contains the largest cities in the world, encompassing 83% of the cities in the study. Furthermore, although there are no large concentrations, the mega-cities with the highest level of innovation represent 12.8% of this cluster.

7. Discussion

The identification of urban features that can have an impact on the potential for innovation generation as a source for wealth and the general wellbeing of citizens and the financial support of city structures is a matter of concern for city managers, regardless of political affiliation labels [55,56]. The relation of the generation of patents and city wealth measured as the GDP per capita has been established [57] via careful geolocalization of the published patents and fine spatial grid GDP that allows one to compute the metropolitan area GDP with some accuracy. The strong interdependence of scientific production, patent generation and technological productions has already been shown [58,59]; hence, predicting innovation potential from citizen behavior patterns gives additional insights into city potential for sustained growth. The work reported in this paper is in line with works that relate the city innovation value (measured by number of patents) with the activity of people, specifically the mobility of researchers [60] in between companies and the cities that host them. However, we must emphasize that our work does not use information about patents, scientific publications or company localizations in order to produce a prediction of the innovation ranking of the city. As a secondary result, we find that population size induces further discrimination between cities’ innovation rankings in agreement with the literature that recognizes that large cities concentrate innovation resources and results [61,62]. Recent reflections [63] on the sources for sustainable urban innovation point to the salient role of social relation factors, which are indirectly modeled by the activity patterns analyzed in our study that capture the time structure of interactions.

8. Conclusions and Further Research

Understanding the interaction between micro-level urban activity patterns and large-scale city characteristics is crucial for policymakers and researchers in designing effective and adequate public policies. The availability of digital data sources, such as social media and mobile phones, has opened up new opportunities for the quantitative analysis of this relationship. This paper reports the clustering of cities on the basis of activity features extracted from topic analyses of micro-level activity data coming from multiple virtual sensors. The study uses a worldwide city dataset containing spatiotemporal activity patterns obtained from geotagged social media data. The Silhouette Score was used to select the optimal clustering results from the comparison of state-of-the-art clustering algorithms. The selected model found three well-defined clusters, achieving a 2.7% greater Silhouette Score than the next-best model. Finally, the paper explores the relation between the identified city clusters and large-scale city characteristics, i.e., the City Innovation Index.

The examination of the distribution of the City Innovation Index for each cluster reveals that it is possible to discriminate low performing cities for middle/top performing cities by their cluster assignment. Therefore, the study establishes a link between micro-level activity patterns and large-scale city characteristics that can be further exploited in order to improve discrimination accuracy and sort the items into finer categories of the city innovation ranking. These findings contribute to the growing literature on using digital traces to support urban planning and public policy, providing insights into the relationship between individual-level urban activities and large-scale city characteristics, such as the city ranking in the City Innovation Index. Future work will refine this study by using different areas within the city as the spatial analysis level instead of the entire city.

Author Contributions

Conceptualization, R.M.-C. and S.A.R. and M.G.; methodology, R.M.-C. and S.A.R.; software, R.M.-C.; validation, M.G. and R.M.-C.; formal analysis, R.M.-C.; investigation, R.M.-C. and S.A.R.; resources, M.G.; data curation, R.M.-C.; writing—original draft preparation, all authors; writing—review and editing, all authors; visualization, R.M.-C.; supervision, S.A.R.; funding acquisition, M.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work would not have been accomplished without the financial support of CONICYT-PFCHA/DOCTORADO BECAS CHILE/2019-21190345. The last author received research funds from the Basque Government as the head of the Grupo de Inteligencia Computacional, Universidad del Pais Vasco, UPV/EHU, from 2007 until 2025. The current code for the grant is IT1689-22. Additionally, the author participates in Elkartek projects KK-2022/00051 and KK-2021/00070. The Spanish MCIN has also granted the author a research project under code PID2020-116346GB-I00.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data sources are publicly accessible and this article details the references for each dataset. The collated data used in this paper is available from open data repository https://doi.org/10.5281/zenodo.7949307 (accessed on 22 May 2023).

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript or in the decision to publish the results.

References

Johnson, B. Cities, systems of innovation and economic development. Innovation 2008, 10, 146–155. [Google Scholar] [CrossRef]
Appio, F.P.; Lima, M.; Paroutis, S. Understanding Smart Cities: Innovation ecosystems, technological advancements, and societal challenges. Technol. Forecast. Soc. Chang. 2019, 142, 1–14. [Google Scholar] [CrossRef]
Guvensan, M.A.; Dusun, B.; Can, B.; Turkmen, H.I. A Novel Segment-Based Approach for Improving Classification Performance of Transport Mode Detection. Sensors 2018, 18, 87. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Cao, G.; Yan, Y.; Wang, J. Does high-speed rail stimulate cross-city technological innovation collaboration? Evidence from China. Transp. Policy 2022, 116, 119–131. [Google Scholar] [CrossRef]
Borrego-Jaraba, F.; Garrido, P.C.; Garcia, G.C.; Ruiz, I.L.; Gomez-Nieto, M.A. A Ubiquitous NFC Solution for the Development of Tailored Marketing Strategies Based on Discount Vouchers and Loyalty Cards. Sensors 2013, 13, 6334–6354. [Google Scholar] [CrossRef]
Schröder, P.; Vergragt, P.; Brown, H.S.; Dendler, L.; Gorenflo, N.; Matus, K.; Quist, J.; Rupprecht, C.D.; Tukker, A.; Wennersten, R. Advancing sustainable consumption and production in cities-A transdisciplinary research and stakeholder engagement framework to address consumption-based emissions and impacts. J. Clean. Prod. 2019, 213, 114–125. [Google Scholar] [CrossRef]
Bencardino, M.; Greco, I. Smart Communities. Social Innovation at the Service of the Smart Cities. TeMA-J. Land Use Mobil. Environ. 2014, 40–51. [Google Scholar] [CrossRef]
Leckel, A.; Veilleux, S.; Dana, L.P. Local Open Innovation: A means for public policy to increase collaboration for innovation in SMEs. Technol. Forecast. Soc. Chang. 2020, 153, 119891. [Google Scholar] [CrossRef]
Marasco, A.; Errichiello, L. The role of networking in the development of new city services-A framework for exploring smart public-private service innovation networks. Rev. Eur. d’Écon. Manag. Serv. 2016, 2016, 65–100. [Google Scholar]
Vrana, V.; Das, S. Dynamic Restructuring of Digital Media and Entertainment Sector: Role of Urbanization, Industrial Innovation, and Technological Evolution. In Digital Entertainment as Next Evolution in Service Sector: Emerging Digital Solutions in Reshaping Different Industries; Springer Nature: Singapore, 2023; pp. 15–34. [Google Scholar] [CrossRef]
Granell, C.; Ostermann, F.O. Beyond data collection: Objectives and methods of research using VGI and geo-social media for disaster management. Comput. Environ. Urban Syst. 2016, 59, 231–243. [Google Scholar] [CrossRef]
2thinknow Innovation Cities Index 2021. Innovation Cities Index 2021: Top 100 World’s Most Innovative Cities. 2021. Available online: https://innovation-cities.com/worlds-most-innovative-cities-2021-top-100/25477/ (accessed on 23 October 2021).
Acs, Z.J. Innovation and the Growth of Cities. In Urban Dynamics and Growth: Advances in Urban Economics; Contributions to Economic Analysis; Elsevier: Amsterdam, The Netherlands, 2004; Volume 266, pp. 635–658. [Google Scholar] [CrossRef]
Roessner, J.D. Incentives to innovate in public and private organizations. Adm. Soc. 1977, 9, 341–365. [Google Scholar] [CrossRef]
Qi, Y.; Peng, W.; Xiong, N.N. The effects of fiscal and tax incentives on regional innovation capability: Text extraction based on python. Mathematics 2020, 8, 1193. [Google Scholar] [CrossRef]
Naphade, M.; Banavar, G.; Harrison, C.; Paraszczak, J.; Morris, R. Smarter cities and their innovation challenges. Computer 2011, 44, 32–39. [Google Scholar] [CrossRef]
Zygiaris, S. Smart city reference model: Assisting planners to conceptualize the building of smart city innovation ecosystems. J. Knowl. Econ. 2013, 4, 217–231. [Google Scholar] [CrossRef]
Capdevila, I. Co-working spaces and the localised dynamics of innovation in Barcelona. Int. J. Innov. Manag. 2015, 19, 1540004. [Google Scholar] [CrossRef]
Berbegal-Mirabent, J. What do we know about co-working spaces? Trends and challenges ahead. Sustainability 2021, 13, 1416. [Google Scholar] [CrossRef]
Engel, J.S.; Berbegal-Mirabent, J.; Pique, J.M. The renaissance of the city as a cluster of innovation. Cogent Bus. Manag. 2018, 5, 1532777. [Google Scholar] [CrossRef]
Chen, Y.; Li, W.; Yi, P. Evaluation of city innovation capability using the TOPSIS-based order relation method: The case of Liaoning province, China. Technol. Soc. 2020, 63, 101330. [Google Scholar] [CrossRef]
Wigginton, C.; Curran, M.; Brodeur, C. Global Mobile Consumer Trends: Second Edition. 2017. Available online: https://www2.deloitte.com/us/en/pages/technology-media-and-telecommunications/articles/global-mobile-consumer-trends.html (accessed on 29 April 2023).
BankMyCell. How Many Smartphones Are in the World? 2022. Available online: https://www.bankmycell.com/blog/how-many-phones-are-in-the-world (accessed on 13 October 2022).
Datareportal. Global Social Media Statistics. 2023. Available online: https://datareportal.com/social-media-users (accessed on 29 April 2023).
DemandSage. Social Media Users in the World—(2023 Demographics). 2023. Available online: https://www.demandsage.com/social-media-users/ (accessed on 29 April 2023).
Noulas, A.; Scellato, S.; Mascolo, C.; Pontil, M. Exploiting Semantic Annotations for Clustering Geographic Areas and Users in Location-based Social Networks. In Proceedings of the International AAAI Conference on Web and Social Media, Barcelona, Spain, 17–21 July 2011; Volume 5, pp. 32–35. [Google Scholar] [CrossRef]
Frias-Martinez, V.; Soto, V.; Hohwald, H.; Frias-Martinez, E. Characterizing Urban Landscapes Using Geolocated Tweets. In Proceedings of the 2012 International Conference on Privacy, Security, Risk and Trust, PASSAT 2012, and 2012 International Conference on Social Computing, SocialCom 2012, Amsterdam, The Netherlands, 3–5 September 2012; pp. 239–248. [Google Scholar] [CrossRef]
Fujisaka, T.; Lee, R.; Sumiya, K. Exploring urban characteristics using movement history of mass mobile microbloggers. In Proceedings of the Eleventh Workshop on Mobile Computing Systems & Applications, HotMobile ′10, Annapolis, MD, USA, 22–23 February 2010; ACM: New York, NY, USA, 2010; pp. 13–18. [Google Scholar] [CrossRef]
Wakamiya, S.; Lee, R.; Sumiya, K. Urban area characterization based on semantics of crowd activities in Twitter. In Proceedings of the 4th International Conference on GeoSpatial Semantics, GeoS 2011, Brest, France, 12–13 May 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 108–123. [Google Scholar]
Ríos, S.A.; Muñoz, R. Land Use detection with cell phone data using topic models: Case Santiago, Chile. Comput. Environ. Urban Syst. 2017, 61, 39–48. [Google Scholar] [CrossRef]
Madhawa, K.; Lokanathan, S.; Maldeniya, D.; Samarajiva, R. Land use classification using call detail records. In Proceedings of the NetMob 2015 (Fourth Conference on the Scientific Analysis of Mobile Phone Datasets), MIT Media Lab, Cambridge, MA, USA, 8–10 April 2015. [Google Scholar] [CrossRef]
Soto, V.; Frías-Martínez, E. Robust Land Use Characterization of Urban Landscapes Using Cell Phone Data. 2011. Available online: https://www.researchgate.net/publication/228923406_Robust_Land_Use_Characterization_of_Urban_Lanscapes_using_Cell_Phone_Data (accessed on 29 April 2023).
Lenormand, M.; Picornell, M.; Cantú-Ros, O.G.; Tugores, A.; Louail, T.; Herranz, R.; Barthelemy, M.; Frías-Martínez, E.; Ramasco, J.J. Cross-Checking Different Sources of Mobility Information. PLoS ONE 2014, 9, 1–10. [Google Scholar] [CrossRef]
Lenormand, P.; Cantú-Ros, O.G.; Tugores, A.; Louail, T.; Herranz, R.; Barthelemy, M.; Frías-Martínez, E.; Ramasco, J.J. Influence of sociodemographic characteristics on human mobility. Sci. Rep. 2015, 5, 10075. [Google Scholar] [CrossRef] [PubMed]
Crandall, D.J.; Backstrom, L.; Huttenlocher, D.; Kleinberg, J. Mapping the world’s photos. In Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, 20–24 April 2009; ACM: New York, NY, USA, 2009; pp. 761–770. [Google Scholar] [CrossRef]
Wang, Y.; Wang, T.; Tsou, M.H.; Li, H.; Jiang, W.; Guo, F. Mapping Dynamic Urban Land Use Patterns with Crowdsourced Geo-Tagged Social Media (Sina-Weibo) and Commercial Points of Interest Collections in Beijing, China. Sustainability 2016, 8, 1202. [Google Scholar] [CrossRef]
Du, H.; Mulley, C. The short-term land value impacts of urban rail transit: Quantitative evidence from Sunderland, UK. Land Use Policy 2007, 24, 223–233. [Google Scholar] [CrossRef]
Dong, H.; Wu, M.; Ding, X.; Chu, L.; Jia, L.; Qin, Y.; Zhou, X. Traffic zone division based on big data from mobile phone base stations. Transp. Res. Part C Emerg. Technol. 2015, 58, 278–291. [Google Scholar] [CrossRef]
Dasgupta, A. Floods and poverty traps: Evidence from Bangladesh. Econ. Political Wkly. 2007, 42, 3166–3171. [Google Scholar]
Hong, L.; Lee, M.; Mashhadi, A.; Frias-Martinez, V. Towards Understanding Communication Behavior Changes During Floods Using Cell Phone Data. In Proceedings of the Social Informatics—10th International Conference, SocInfo 2018, Proceedings, Part II, St. Petersburg, Russia, 25–28 September 2018; Staab, S., Koltsova, O., Ignatov, D.I., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 97–107. [Google Scholar]
Darabi, H.; Choubin, B.; Rahmati, O.; Haghighi, A.T.; Pradhan, B. Urban flood risk mapping using the GARP and QUEST models: A comparative study of machine learning techniques. J. Hydrol. 2019, 569, 142–154. [Google Scholar] [CrossRef]
Gao, S.; Rao, J.; Kang, Y.; Liang, Y.; Kruse, J. Mapping county-level mobility pattern changes in the United States in response to COVID-19. SIGSpatial Spec. 2020, 12, 16–26. [Google Scholar] [CrossRef]
Muñoz-Cancino, R.; Rios, S.A.; Goic, M.; Graña, M. Non-Intrusive Assessment of COVID-19 Lockdown Follow-Up and Impact Using Credit Card Information: Case Study in Chile. Int. J. Environ. Res. Public Health 2021, 18, 5507. [Google Scholar] [CrossRef]
Muñoz-Cancino, R. On the Use of Multi-Sensor Digital Traces to Discover Spatio-Temporal Human Behavioral Pattems. Ph.D. Thesis, University of Basque Country, San Sebastián, Spain, 2023. [Google Scholar]
Cho, E.; Myers, S.A.; Leskovec, J. Friendship and Mobility: User Movement in Location-Based Social Networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’11, San Diego, CA, USA, 21–24 August 2011; Association for Computing Machinery: New York, NY, USA, 2011; pp. 1082–1090. [Google Scholar] [CrossRef]
Le Falher, G.; Gionis, A.; Mathioudakis, M. Where is the Soho of Rome? Measures and algorithms for finding similar neighborhoods in cities. In Proceedings of the Ninth International Conference on Web and Social Media, ICWSM 2015, Oxford, UK, 26–29 May 2015. [Google Scholar]
Yang, D.; Zhang, D.; Zheng, V.W.; Yu, Z. Modeling User Activity Preference by Leveraging User Spatial Temporal Characteristics in LBSNs. IEEE Trans. Syst. Man Cybern. Syst. 2015, 45, 129–142. [Google Scholar] [CrossRef]
Mousselly-Sergieh, H.; Watzinger, D.; Huber, B.; Döller, M.; Egyed-Zsigmond, E.; Kosch, H. World-Wide Scale Geotagged Image Dataset for Automatic Image Annotation and Reverse Geotagging. In Proceedings of the Multimedia Systems Conference 2014, MMSys ′14, Singapore, 19–21 March 2014; Association for Computing Machinery: New York, NY, USA, 2014; pp. 47–52. [Google Scholar] [CrossRef]
Cheng, Z.; Caverlee, J.; Lee, K. You Are Where You Tweet: A Content-Based Approach to Geo-Locating Twitter Users. In Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, ON, Canada, 26–30 October 2010; Association for Computing Machinery: New York, NY, USA, 2010; pp. 759–768. [Google Scholar] [CrossRef]
Lamsal, R. Design and analysis of a large-scale COVID-19 tweets dataset. Appl. Intell. 2021, 51, 2790–2804. [Google Scholar] [CrossRef]
Kejriwal, M.; Melotte, S. A Geo-Tagged COVID-19 Twitter Dataset for 10 North American Metropolitan areas over a 255-day period. Data 2021, 6, 64. [Google Scholar] [CrossRef]
Yelp Open Dataset. 2021. Available online: https://www.yelp.com/dataset (accessed on 26 October 2021).
Simple Maps. Basic World Cities Database. 2021. Available online: https://simplemaps.com/data/world-cities (accessed on 3 September 2021).
Starczewski, A.; Krzyżak, A. Performance Evaluation of the Silhouette Index. In Proceedings of the Artificial Intelligence and Soft Computing—14th International Conference ICAISC 2015, Proceedings, Part II, Zakopane, Poland, 14–28 June 2015; Springer International Publishing: Cham, Switzerland, 2015; pp. 49–58. [Google Scholar]
Markatou, M.; Alex, R.E. Urban System of Innovation: Main Agents and Main Factors of Success. Procedia-Soc. Behav. Sci. 2015, 195, 240–250. [Google Scholar] [CrossRef]
Zhang, R.; Ji, C.; Tan, L.; Sun, Y. Evaluation and construction of the capacities of urban innovation chains based on efficiency improvement. PLoS ONE 2022, 17, E0274092. [Google Scholar] [CrossRef] [PubMed]
Straccamore, M.; Bruno, M.; Monechi, B.; Loreto, V. Urban economic fitness and complexity from patent data. Sci. Rep. 2023, 13, 3655. [Google Scholar] [CrossRef]
Pugliese, E.; Cimini, G.; Patelli, A.; Zaccaria, A.; Pietronero, L.; Gabrielli, A. Unfolding the innovation system for the development of countries: Coevolution of Science, Technology and Production. Sci. Rep. 2019, 9, 16440. [Google Scholar] [CrossRef]
Miao, L.; Murray, D.; Jung, W.S.; Larivière, V.; Sugimoto, C.R.; Ahn, Y.Y. The latent structure of global scientific development. Nat. Hum. Behav. 2022, 6, 1206–1217. [Google Scholar] [CrossRef]
Bonaventura, M.; Aiello, L.M.; Quercia, D.; Latora, V. Predicting urban innovation from the US Workforce Mobility Network. Humanit. Soc. Sci. Commun. 2021, 8, 10. [Google Scholar] [CrossRef]
Ball, P.A.; Jara-Figueroa, C.; Petralia, S.G.; Steijn, M.P.; Rigby, D.L.; Hidalgo, C.A. Complex economic activities concentrate in large cities. Nat. Hum. Behav. 2020, 4, 248–254. [Google Scholar] [CrossRef]
Dubrovina, E. Concentrated cities. Nat. Phys. 2020, 16, 123. [Google Scholar] [CrossRef]
Zhang, J.X.; Cheng, J.W.; Philbin, S.P.; Ballesteros-Perez, P.; Skitmore, M.; Wang, G. Influencing factors of urban innovation and development: A grounded theory analysis. Environ. Dev. Sustain. 2023, 25, 2079–2104. [Google Scholar] [CrossRef]

Figure 1. Proposed methodology for discovering city activity patterns and then interpreting the results through city clustering.

Figure 2. Topic representative activity patterns obtained using Dynamic Topic Models [44] over virtual sensors of digital traces of people activities. Each row corresponds to a time slice of the data; the years are indicated in brackets. Each column corresponds to a topic that has consistent interpretation over time.

Figure 3. Example of cities included in the dataset and their spatial densities of geotagged digital traces. Yellow indicates a larger activity frequency, while purple indicates a smaller one. Each figure has an independent scale. Map tiles by Stamen Design are under CC BY 3.0, Data by OpenStreetMap contributors are under ODbL.

Figure 4. Silhouette Score over groups’ multi-temporal city activity patterns. Each line shows the average Silhouette Score and its 95% confidence interval.

Figure 5. Silhouette analysis for K-Means clustering results with

k = 3

. The left figure plots the Silhouette Score for each city in each city in each cluster. The right figure is the t-SNE embedding of the clusters.

Figure 5. Silhouette analysis for K-Means clustering results with

k = 3

. The left figure plots the Silhouette Score for each city in each city in each cluster. The right figure is the t-SNE embedding of the clusters.

Figure 6. Visualization of city clusters on the convex space determined via the average linear coefficients of each topic representative activity pattern.

Figure 7. Geographical representation of city clusters.

Figure 8. City Innovation Index 2021 distribution over the optimal clusters found via K-Means.

Figure 9. Distribution of innovation disaggregated indices over the optimal clusters found by K-Means.

Table 1. Description of the geotagged digital traces dataset. The Source column provides information about the origin of the geotagged digital traces. The dataset references indicate the sources from which the data were obtained and the Events column shows the total number of records.

Source	Dataset Reference	Events
Brightkite checkins	[45]	1,639,399
Foursquare checkins	[46]	7,515,201
Foursquare checkins	[47]	1,099,826
Geotagged Images	[48]	4,998,865
Geotagged Tweets	[49]	2,041,262
	[50]	187,802
	[51]	47,337
	[50] (Exact Location)	184,547
	[50] (Inferred Location)	2,604,233
Gowalla checkins	[45]	1,992,082
Weeplaces checkins	[45]	4,176,673
Yelp checkins	[52]	5,695,209

Table 2. Comparison of clustering models against the K-Means model with

k = 3

. Clusters indicate the number of clusters extracted. The p-value column results from the Wilcoxon test comparing the Silhouette Scores. The Difference column is the percentage increase in the Silhouette Score of K-Means against the comparison algorithm.

Table 2. Comparison of clustering models against the K-Means model with

k = 3

. Clusters indicate the number of clusters extracted. The p-value column results from the Wilcoxon test comparing the Silhouette Scores. The Difference column is the percentage increase in the Silhouette Score of K-Means against the comparison algorithm.

Algorithm	k	p-Value	Difference (%)
Mini Batch K-Means	3	$4.16 \times 10^{- 4}$	12.0%
Agglomerative Clustering	3	$1.83 \times 10^{- 3}$	4.3%
Spectral Clustering	3	$3.72 \times 10^{- 5}$	17.4%
BIRCH	3	$3.88 \times 10^{- 4}$	15.3%
Gaussian Mixture	3	$1.73 \times 10^{- 6}$	19.8%
K-Means	2	$9.71 \times 10^{- 5}$	5.2%
Mini Batch K-Means	2	$3.16 \times 10^{- 3}$	8.7%
Agglomerative Clustering	2	$5.71 \times 10^{- 2}$	2.7%
Spectral Clustering	2	$1.40 \times 10^{- 2}$	3.7%
BIRCH	2	$2.16 \times 10^{- 4}$	9.0%
Gaussian Mixture	2	$2.41 \times 10^{- 4}$	12.1%

Table 3. List of the core cities within each cluster ordered via the ranking established by the City Innovation Index. Only cities whose Silhouette Score is higher than the cluster median are included. T0, T1 and T2 columns correspond to clustering features as explained in the text.

CLUSTER C0					CLUSTER C1					CLUSTER C2
City	Ranking	T0	T1	T2	City	Ranking	T0	T1	T2	City	Ranking	T0	T1	T2
Campinas (Brazil)	134	0.63	0.16	0.19	Saitama (Japan)	1	0.31	0.40	0.28	Tokyo (Japan)	1	0.32	0.24	0.43
Vereeniging (South Africa)	372	0.40	0.23	0.35	Boston (USA)	2	0.25	0.47	0.26	New York (USA)	3	0.32	0.27	0.40
Lahore (Pakistan)	403	0.60	0.22	0.17	Atlanta (USA)	13	0.28	0.47	0.24	Sydney (Australia)	4	0.23	0.29	0.46
Pune (India)	436	0.45	0.29	0.24	Oslo (Norway)	25	0.32	0.48	0.19	Dallas (USA)	6	0.28	0.29	0.41
Pretoria (South Africa)	461	0.41	0.24	0.34	Helsinki (Finland)	41	0.33	0.43	0.22	Houston (USA)	8	0.24	0.31	0.44
Lagos (Nigeria)	468	0.45	0.20	0.33	Copenhagen (Denmark)	54	0.29	0.42	0.27	Chicago (USA)	9	0.24	0.31	0.44
Jaipur (India)	473	0.50	0.27	0.22	Guarulhos (Brazil)	134	0.37	0.41	0.20	London (UK)	11	0.24	0.33	0.41
Lucknow (India)	487	0.44	0.28	0.27	Lisbon (Portugal)	158	0.33	0.42	0.23	Shanghai (China)	15	0.30	0.30	0.39
Cawnpore (India)	495	0.44	0.27	0.27	Ecatepec (Mexico)	161	0.35	0.50	0.13	Los Angeles (USA)	20	0.23	0.33	0.42

Table 4. City distribution by population and innovation ranking. Each entry within each cluster row shows the distribution by population and innovation ranking quartiles.

		City Innovation Ranking
Cluster	City Population	(0, 41]	(41, 161]	(161, 311]	(311, 500]
C0	(0–1508]	-	5.9%	5.9%	11.8%
	(1508–3002]	5.9%	-	-	17.6%
	(3002–8154]	-	5.9%	-	35.3%
	(8154–37,977]	-	-	-	11.8%
C1	(0–1508]	11.1%	25.9%	7.4%	3.7%
	(1508–3002]	-	7.4%	7.4%	-
	(3002–8154]	18.5%	-	3.7%	-
	(8154–37,977]	3.7%	-	7.4%	3.7%
C2	(0–1508]	3.2%	6.4%	7.4%	2.1%
	(1508–3002]	2.1%	10.6%	5.3%	9.6%
	(3002–8154]	8.5%	5.3%	5.3%	3.2%
	(8154–37,977]	12.8%	4.3%	7.4%	6.4%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Muñoz-Cancino, R.; Ríos, S.A.; Graña, M. Clustering Cities over Features Extracted from Multiple Virtual Sensors Measuring Micro-Level Activity Patterns Allows One to Discriminate Large-Scale City Characteristics. Sensors 2023, 23, 5165. https://doi.org/10.3390/s23115165

AMA Style

Muñoz-Cancino R, Ríos SA, Graña M. Clustering Cities over Features Extracted from Multiple Virtual Sensors Measuring Micro-Level Activity Patterns Allows One to Discriminate Large-Scale City Characteristics. Sensors. 2023; 23(11):5165. https://doi.org/10.3390/s23115165

Chicago/Turabian Style

Muñoz-Cancino, Ricardo, Sebastián A. Ríos, and Manuel Graña. 2023. "Clustering Cities over Features Extracted from Multiple Virtual Sensors Measuring Micro-Level Activity Patterns Allows One to Discriminate Large-Scale City Characteristics" Sensors 23, no. 11: 5165. https://doi.org/10.3390/s23115165

APA Style

Muñoz-Cancino, R., Ríos, S. A., & Graña, M. (2023). Clustering Cities over Features Extracted from Multiple Virtual Sensors Measuring Micro-Level Activity Patterns Allows One to Discriminate Large-Scale City Characteristics. Sensors, 23(11), 5165. https://doi.org/10.3390/s23115165

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Clustering Cities over Features Extracted from Multiple Virtual Sensors Measuring Micro-Level Activity Patterns Allows One to Discriminate Large-Scale City Characteristics

Abstract

1. Introduction

2. Background and Related Work

2.1. Large-Scale Urban Characteristics, Innovation and Public Policies

2.2. Data-Driven Policy-Making Using Digital Traces

3. Data Description

4. Methodology and Experimental Setup

Data Preprocessing

5. Experimental Setup

6. Results

6.1. Data Exploration

6.2. City Clustering

6.3. City Clusters

6.4. City Clusters and The City Innovation Index

7. Discussion

8. Conclusions and Further Research

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI