Understanding Smart City—A Data-Driven Literature Review

: This paper systematically reviews the top 200 Google Scholar publications in the area of smart city with the aid of data-driven methods from the ﬁelds natural language processing and time series forecasting. Speciﬁcally, our algorithm crawls the textual information of the considered articles and uses the created ad-hoc database to identify the most relevant streams “smart infrastructure”, “smart economy & policy”, “smart technology”, “smart sustainability”, and “smart health”. Next, we automatically assign each manuscript into these subject areas by dint of several interdisciplinary scientiﬁc methods. Each stream is evaluated in a deep-dive analysis by (i) creating a word cloud to ﬁnd the most important keywords, (ii) examining the main contributions, and (iii) applying time series methodologies to determine the past and future relevance. Due to our large-scaled literature, an in-depth evaluation of each stream is possible, which ultimately reveals strengths and weaknesses. We hereby acknowledge that smart sustainability will come to the fore in the next years—this fact conﬁrms the current trend, as minimizing the required input of energy, water, food, waste, heat output and air pollution is becoming increasingly important.


Introduction
The journey of smart cities, for example , dates back to the mid 1970s, when Los Angeles launched the first large-scale urban data project [201]. According to Gartner, the world's leading research and advisory company, the concept of smart city develops holistic solutions in the field of urban ecosystems using collected data from different types of electronic internet of things (IoT) sources [202]. For this purpose, information about buildings, citizens, devices, and assets are processed to efficiently manage urban flows via real-time responses. The vast majority of literature defines the term "smart city" as applications and technologies which satisfy the following three characteristics: (i) Target group are cities and communities, (ii) the way of living and working in the region is improved, (iii) information and communication technologies (ICT) are deployed.
Since the mid 2000s, interest has significantly surged as consequence of technological enhancement as well an increasing number of people living in urban areas. It is an ever growing challenge to supply populations with basic resources such as clean water, secure food supply, and sufficient energy, while also ensuring overall economic, social and environmental sustainability [203,204]. Today, Google Scholar lists more than 2,500,000 studies in the field of "smart city", ranging from theoretical models to empirical frameworks. Considering the diversity and the dynamic of manuscripts in this domain, a systematic literature review is essential in order to obtain the current state of research including substantive findings as well as trends and upcoming innovations. It is surprising that no academic study conducts a large-scaled overview based on quantitative approaches. Cocchia [205] investigates how the concepts smart city and digital city were born and how they have developed. Finally, the author elaborates the common features as well as the significant differences. The questions what fundamental theories, models, and concepts in research reflect phenomena related to smart city are answered in Anthopoulos [206]. The author comments that this process is crucial to answer since interdisciplinary studies investigate the smart city and view this topic from different perspectives. Meijer and Bolivar [182] focus on smart cities and their governance by analyzing a corpus of 51 publications and their mappings.
In the present paper, we fill this void by systematically surveying the top 200 Google Scholar publications in the area of smart cities ( ) with the aid of data-driven methods from the fields natural language processing and time series forecasting. In the first step, the implemented algorithm crawls the textual information of the considered manuscripts and uses the created ad-hoc database to identify the most relevant streams "smart infrastructure", "smart economy & policy", "smart technology", "smart sustainability", and "smart health". The corresponding articles are automatically classified into these subject areas by means of an interdisciplinary field of scientific methods. In the second step, the streams are discussed in detail, that is, we (i) create a word cloud representing the most important keywords, (ii) perform a deep-dive analysis of the main contributions, and (iii) apply time series methodologies to evaluate the past and future relevance of the research area. Hereby, we recognize that the topic of smart sustainability will come to the fore in the next years-this fact is not surprising since minimizing required inputs of energy, water, food, waste, output of heat, and air pollution acquires more and more significance. Drawing from a large set of literature, consisting of the top 200 Google Scholar papers, an in-depth evaluation of each stream is possible, which ultimately reveals strengths and weaknesses. Consequently, we deliver a number of key takeaways and implications for theoreticians and practitioners.
The rest of the paper is organized as follows. Section 2 applies quantitative methods to categorize the most important smart city literature into the identified streams. In Section 3, we review smart infrastructure which aims to intelligently connect buildings and environments. Section 4 about smart economy & policy covers all actions in the field of sophisticated decision-making ecosystems and governance. Technical devices and systems in the context of smart technology are described in Section 5. We consider concepts for the effective use of commodities and resources by smart sustainability in Section 6. Smart health in Section 7 includes the digitization to adapt and evolve the way we live and work. Finally, Section 8 summarizes our paper and proposes directions for further research.

Categorization of Smart City Literature
This section analyzes the existing literature on the term "smart city", both by examining its historical relevance and by classifying the top 200 Google Scholar publications into the most important research streams. In contrast to existing review literature, we apply data-driven methods from the fields time series forecasting and natural language processing. This manuscript uses Google Scholar as underlying data base because it (i) provides a freely accessible web search engine, (ii) indexes the full text and metadata of scientific literature in a range of publication formats and disciplines, (iii) represents the world's largest academic search engine containing more than 380 million documents [207]. Furthermore, the underlying ranking is extremely meaningful and informative since it orders results with a combined order algorithm in a way researchers naturally do by weighting the author, the full text of each article, the publication in which the article occurs, and the number of citations in the scientific literature. In general, Google tries to understand what a page is about by analyzing and cataloging the content of a page. This information is stored in the Google importance index, which consists of a huge database. Moreover, Google processes each search query based on numerous factors to find the most appropriate answer.
The entire methodology and all relevant analyses are implemented in the programming language Python [208]. For computation-intensive calculations, we use both the general-purpose programming language C++ and on-demand cloud computing platforms with virtual computer clusters that are available 24/7 via the internet. Figure 1 represents the number of yearly published articles in the field of smart city. We observe that the first research on smart cities was conducted in the 1970s after the Community Analysis Bureau of Los Angeles used cluster analysis and infrared aerial photography to generate data reports on demographics and housing quality [201]. In the following decades, research in this area was still restrained, with approximately 5000 contributions annually. The next milestone was in 1994 when Amsterdam introduced the virtual digital city-for the first time internet access became available to a larger group of citizens [209]. Smart city research really gained momentum in the 2000s when multinational technology conglomerates spent hundreds of millions of dollars to gain profound knowledge and expertise on urban topics [210]. The first smart city expo world congress in 2011 focused on liveable cities, integrated vision, sustainable cities, and urban mobility-afterwards the 100,000 article barrier was broken. From that time, we notice exponential growth with currently about 200,000 publications per year. In 2019, the G20 global smart cities alliance on technology governance was founded in order to define principles for the responsible and ethical use of technologies for intelligent cities. This fact marks a strong sign that the disciplines sustainability and responsibility are increasingly coming into the spotlight. Finally, we analyze the future importance of smart city. For this purpose, the number of articles at Google Scholar is predicted based on a polynomial regression, that is, we model the relationship between the independent variable and the dependent variable as an n'th degree polynomial. Our forecast method predicts a doubling of the annual articles in the next 10 years. This prediction confirms the trend of urbanization: Up to 70% of the world's population will live in cities showing the growing importance to optimize the efficiency and sustainability of city operations and services [107].  Next, we classify the top 200 Google Scholar publications into the most important research streams with the aid of natural language processing. To be more specific, we (i) create a document-oriented database using a web crawler, (ii) transform the underlying manuscripts into a document-term matrix, and (iii) assign the publications to the identified streams. This subsection describes the three-step logic outlined above in detail [211].
The first step builds up a database based on different sources of information from the internet. For this purpose, we implement an algorithm to crawl the considered publications in the portable document format (PDF), the most common file to present text and images [212]. The gained content is stored, retrieved, and managed in a document-oriented database. This flexible NoSQL schema allows to save all information for a particular object in a single instance. Moreover, each saved object can be of a different type, which reduces the process of loading new documents into the database.
The second step focuses on transforming the database into a numerical context. Following Boiy et al. [213], unstructured text data usually contains a lot of noise. Consequently, the task of automatically analyzing text is very challenging that's why we build the document-term matrix by following the four steps: First, we extract the relevant text information from the PDF files. Specifically, we use the text of the abstract, keywords, the main text with figures and tables as well as references to gather as much information as possible. Our goal is to let the data speak for itself, that is, we do not manually set weights to give preference to certain parts of the data. Second, some data-preprocessing functions are applied by eliminating uniform resource locators, punctuation markers, and stop words. Third, we stem the adapted words using Porter's algorithm, a standard procedure in text processing [214]. Fourth, we create the document-term matrix which describes the frequency of terms that arise in the collection of our data. The rows characterize the documents and the columns describe the stemmed words. We use term frequency-inverse document frequency (tf-idf) weights to reflect how important a word is to a document in a collection or corpus. We only consider terms that occur in at least 5 documents to reduce the noise in the data. The whole document-term matrix consists of 200 rows (documents) and 26,534 columns (unique terms). It goes without saying that we also consider n-grams, that is, a coherent sequence of n words.
The third step identifies the most relevant streams in the context of smart city and classifies the corresponding publications into the found subject areas. For the first part, we apply the k-means algorithm which depicts an unsupervised learning method-the hyperparameter k defines the number of clusters (streams). This approach categorizes objects in multiple groups (clusters), such that we have in parallel a high intra-class similarity (objects within the same cluster are as similar as possible) and a low inter-class similarity (different clusters are as dissimilar as possible). The hyperparameter k is specified by applying the average silhouette approach, where a large width indicates good clustering. Therefore, the optimal number of clusters k* is the one that maximizes the average silhouette over a range of possible values for k. Specifically, the silhouette coefficient for one cluster is defined as follows: where a describes the average intra-cluster distance and b represents the average nearest-cluster distance. Figure 2 illustrates the average silhouette coefficients for different number of clusters k. We observe that the number of optimal clusters is given by k*=5. To categorize the top 200 papers, we choose the k-means clustering algorithm because each article should be assigned to one, and only one, cluster. However, another possibility would be topic modelling, which represents mixture models, that is, each document is allocated a probability of belonging to a latent theme or "topic".  Figure 3 represents the output of the algorithm outlined above, with each bubble characterizing one of the top 200 Google Scholar manuscripts . The number inside corresponds to the respective one in the bibliography. In our context, the name of each stream is given by the most common topic of each cluster. Specifically, our data-driven review algorithm (i) divides the publications into the streams "smart infrastructure", "smart economy & policy", "smart technology", "smart sustainability", and "smart health" and (ii) assigns each manuscript to the stream based on the highest pairwise similarity. We find that smart infrastructure is the largest stream with 47 publications, followed by smart economy & policy (46), smart technology (45), smart sustainability (42), and smart health (20). In order to carry out an in-depth analysis, we take the following three principles into account when creating Figure 3:

1.
Bubbles which are closer together show a higher similarity to each other. Consequently, Bellini et al. [25] and Leccese et al. [119] possess a high resemblance, whereas Nowicka [1] and Li et al. [2] have no connection with each other.

2.
The size of the bubbles indicates how often the underlying manuscript is cited. Therefore, the largest bubble is Hollands [54] with more than 2500 citations, the smallest one is given by Deren et al. [22] (20 citations).

3.
Bubbles that are closer to the center of this graph show a higher similarity to the term smart city. Highest pairwise resemblance is provided by Paskaleva [66], Sanchez-Iborra et al. [17] is relatively far away.
The following lines describe the characteristics and attributes of the five streams. We pay particular attention to the special features of each stream in order to gain an overview and first insights.
Smart infrastructure connects buildings and environments in an intelligent manner. We observe that the density of manuscripts around [6,43] is particularly high indicating that these publications represent the centre of a research subarea. This statement is further confirmed by the above average size of both publications. Finally, manuscripts with a high affinity to smart city, for example, Su et al. [38], Al Nuaimi et al. [44], Dameri [29] are quoted more often than those with less similarity, for example, Bhati [15], Sanchez-Iborra et al. [17], Rios [18].
Smart economy & policy encompasses all strategies in the field of sophisticated decision-making ecosystems and governance. It seems that this stream possesses a strong impact on smart city research-Albino et al. [48], Chourabi et al. [49], Bakici et al. [51], and Hollands [54] are among the most cited manuscripts of all considered 200 publications. Furthermore, Paskaleva [66] shows an extraordinarily high similarity to smart city. As expected, publications by the same author are closer to each other: Anthopoulos and Reddick [53] and Anthopoulos and Fitsilis [55] exhibit a short distance.
Smart technology focuses on technical devices and systems. The overall similarity to the term smart city as well as the number of citations is on average for the vast majority of the manuscripts. We observe widely distributed bubbles-only exception is given by Jin et al. [100] which provides a medium cluster centre. This confirms the existing literature-Smart technology offers a large variety of research topics ranging from smart computing to smart phone [94,138].
Smart sustainability uses commodities and resources in an efficient and responsible way. We observe widely spread and equally large bubbles, indicating that no flagship manuscript exists to date . This fact is not surprising since smart sustainability research represents the youngest stream with a lot of dynamic and diversity. Notably, this stream possesses the highest affinity to smart city, for example, Kitchin [154] and Dameri and Rosenthal [175].
Smart health considers the digitization to adapt and evolve the mode we live and work. The bubbles, all small to medium-sized, are relatively far away from the center smart city. Although Nam and Pardo [181] provide the largest bubble with more than 2000 citations, it is at the edge of this stream. We may carefully conclude that this manuscript covers several topics, including smart health.
Finally, we analyze the historical relationship between the five streams by considering the average publication year of the articles. We find that smart technology is the oldest research area with an average release year of 2013. This statement is confirmed by the fact that the oldest publication [124] is located in this stream. This is not surprising, since smart city highly relies on technology as a tool to make cities smarter and more sustainable. The more modern fields of research are smart infrastructure (2014) and smart economy & policy (2014). As expected, smart sustainability and smart health describe the latest topics with 2015 as average release year. On the basis of the knowledge gained so far, we can cautiously conclude that the more recent streams use the knowledge and insights of the older streams.
The following sections conduct a deep-dive analysis of the streams "smart infrastructure", "smart economy & policy", "smart technology", "smart sustainability", and "smart health".

Smart Infrastructure
Smart infrastructure describes a hybrid physical-digital and intelligent infrastructure that uses data-driven technologies to adapt to changes in the environment [215,216]. Common topics in the smart infrastructure literature include smart grids and mobility systems [217][218][219].
Analyzing the word cloud ( Figure 4) suggests that smart infrastructure can be segregated by its application site, namely indoor and outdoor applications. Detailed weights of the words can be found in Table A1. The indoor appliance comprises of words such as "home", "living", "metering", and "meter". For the outside category, the most important terms are "grid", "environment", "mobility", and "urban". Overall, smart infrastructure tends to focus more on outdoor appliances as our framework highlights more terms from that category.
The most influential papers, that is, those with the highest number of citations, cover a broad range of topics. Kitchin [6] focuses on the application of smart infrastructure to produce real-time data for cities. Furthermore, the authors analyze real projects that utilise big data and real-time analytics, such as Dublinked that provides operational data from Dublin. Nam and Pardo [5] discuss the smart city in the context of management and policy by developing a framework of smart city innovations that spans across the dimensions innovation, risk, way to success, technology, organization, policy, and context. Infrastructure is seen as a vital aspect for successful innovations in this framework. Centenaro et al. [3] and Su et al. [38] look at smart infrastructure from a technology point-of-view. Centenaro et al. [3] examine low-power wide area networks, which is a low-rate, long-range transmission technology in unlicensed sub-GHz frequency bands, and their application for IoT and the smart city. Su et al. [38] study the construction of application systems for different aspects of smart city, for example, wireless city, smart tourism and smart transportation, and look at key features and difficulties in the development of smart cities.
Looking at Figure 3, it seems that Su et al. [38], Manville et al. [37], Monzon [41], Komninos et al. [42], Hashem et al. [43], and Al Nuaimi et al. [44] form the dominant cluster within smart infrastructure. Those publications shed light on infrastructural topics from two perspectives-analyzing existing concepts based on a variety of frameworks and from a technological point-of-view with focus on big data. Monzon [41] describes the ASCIMER (Assessing Smart Cities in the Mediterranean Region) concept that was developed by the Universidad Politecnica of Madrid and analyze existing smart city projects in line with this framework. Manville et al. [37] examine, in a stock taking exercise for the European Parliament, smart cities in the EU and explains how existing smart cities work. In a similar manner, Komninos et al. [42] look at improving the effectiveness of smart city applications by creating an ontology for existing applications. Al Nuaimi et al. [44] and Hashem et al. [43] focus on big data topics within this cluster. Al Nuaimi et al. [44] examine the use case of big data in smart city and identify general benefits of deploying big data in the design and support of smart city applications. Similarly, Hashem et al. [43] explore visions of big data analytics to support smart cities and propose a future business model that can manage big data for smart cities.
Finally, we analyze the trend of this stream over the last years. For this purpose, we consider the most important paper over the last three years based on the number of citations and the relevance to smart city. Sharma et al. [19] propose a vehicle network architecture based on block chain in the smart city, which describes a reliable and secure architecture that operates in a distributed way to build the new distributed transport management system. In general, a block chain makes it possible to transmit information in a forgery-proof manner using a decentralized database shared by many participants, so that copies are impossible.
The standardized Google Scholar trend index (Figure 4), which depicts the ratio of yearly publications to the number of articles in the year 2000, indicates that the interest for smart infrastructure picked up since 2000 and eventually outpaced smart city in 2005. In the following years, the number of published articles for smart infrastructure grew substantially stronger than for smart city, resulting in a divergence in the number of articles. However, our forecast expects that the academic interest in smart city will remain strong until 2030 while the number of new articles about smart infrastructure will level off.

Smart Economy & Policy
Smart economy & policy are two main building blocks of smart city. Smart economy can be described as the use of ICTs in companies' activities, new smart business processes, and a smart technology sectors. This economy is characterized by business growth, job creation, improvement of the staff's qualification, and efficiency gains [220]. Smart policy represents a strategic agenda that supports the development of smart city and ultimately contributes to urban welfare [221].
This smart city sub dimension deals mainly with smart city topics concerning "project(s)", "governance", "tourism". "region(s)" and "government(s)", "investment(s)", and "indicator(s)" ( Figure 5). This is expected as most of the policy discussion are about setting the right governance and incentive structures on a government level to support smart city projects. Indicators are often a topic in smart economy paper as those study the real-time management of smart economies which is based on different indicators, for example, gross domestic product (GDP) and unemployment.
Our algorithm classifies [48,49,54,67] as the most important papers in smart economy and policy. Albino et al. [48] examine the main dimensions and elements that characterize smart city through a literature review of academic studies and official documents of international institutions. The authors see smart economy as a vital component of a smart city. Chourabi et al. [49] propose an integrative framework based on eight critical factors of smart city initiative, namely management and organization, technology, governance, policy context, people and communities, economy, built infrastructure, and natural environment, that helps to envision smart city endeavours. What is really behind the smart city label is examined by Hollands [54] through analyzing self-designated smart cities. The early publication of the paper and its around 2600 citations makes it one of the most important articles in the smart economy & policy category. Compared to the other important researcher that focus on frameworks, labeling, and the main dimensions of smart city, Neirotti et al. [67] conduct an empirical analysis of the diffusion of smart initiatives across 70 cities. The paper explains the distinct coverage of smart initiatives with a regression analysis that considers a variety of different economic and environmental regressors, such as GDP per capital and CO2 emissions.
The prevailing group consists of Albino et al. [48], Chourabi et al. [49], Bakici et al. [51], Gasco-Hernandez [52], Anthopoulos and Reddick [53] (see Figure 3). Similar to Albino et al. [48] and Chourabi et al. [49] that are discussed in the previous paragraph, Anthopoulos and Reddick [53] conduct an extensive literature review with focus on government and policy-making within their theoretical framework. Bakici et al. [51] and Gasco-Hernandez [52] have a closer look at Barcelona's smart city initiative that transformed the cities strategy from being purely focused on e-government and e-governance to having smart city at the center. Key lessons from Barcelona's smart city strategy are outlined in Gasco-Hernandez [52].
Lytras and Visvizi [80] characterize the trend in the last years. The authors conduct an interdisciplinary smart city research by looking at the smart cities debate from different perspectives.
On the one hand, the manuscript considers citizens' awareness of smart applications and solutions and, on the other hand, their ability to use these applications and solutions.
Examining the Google Scholar trend ( Figure 5), it becomes apparent that smart economy & policy is a niche research area of smart city. Solely from 2009 to 2014, the academic community seemed more interested in smart economy & policy than in smart city. After 2014, the number of publications in smart economy and smart policy plateaued and we expect the interest in that area to slow down further until 2030.

Smart Technology
Smart technology refers to a product, condition or motion of technology that possess the ability to be aware of current circumstances and react-in an intelligent way-to changes in its environment. Those technologies can for example adapt its functionalities to enhance performance, efficiency, endurance, or reduce operating costs [222].
Research in the smart technology space is often conducted with focus on "digital", "device(s)" and "phone(s)" (see Figure 6). Nowadays, most of our devices and phones use digital networks to communicate with each other. Therefore, is not astonishing that those words are the highest ranked terms in our word cloud. Surprisingly, "material" and "sensing", which made the emergence of smart technology possible in the first place, play only a subordinated role in smart technology articles.
Among the most influential papers in the smart technology area, Giordano et al. [121] and Mitton et al. [133] examine tech architectures for smart city applications while Jin et al. [100] develop a framework for the development of smart cities with the aid of technology. Giordano et al. [121] present the Rainbow architecture, which consists of the three layers physical, distributed middleware and cloud, for the employment in smart city applications. The decentralized approach uses a combination of multi-agent systems and fog computing to create smart services that exploit the principles of swarm intelligence. A new technology architecture that is based on sensor web enablement standard specifications and makes use of the Contiki operating system for accomplishing IoT is proposed by Mitton et al. [133]. This new structure would rely on the software as a service business model that could thrive innovative, ubiquitous, and value-added applications. Jin et al. [100] represent a framework for the development of smart cities through IoT technology. The authors focus on the technology aspect of smart city and consider in their framework from the sensory level to cloud-computing and data management multiple facets of IoT in a smart city context.
Jin et al. [100] is also part of the dominant category in smart technology that includes Jin et al. [100], Al-Hader et al. [101], Szabo et al. [99], Jin et al. [102], Menouar et al. [98]. Jin et al. [100], Szabo et al. [99], and Jin et al. [102] deal with the topic of IoT while Hader et al. [101] and Menouar et al. [98] focus on technological topics related to infrastructure systems. Szabo et al. [99] introduce a framework, based on the publish-subscribe communication model and the use of XMPP, as the foundation of a unifying open architecture for crowd-sourcing based smart city applications and provide use-cases for their framework. Jin et al. [102]present four IoT network architectures and defines the corresponding network quality of service requirements. Geographic information systems (GIS) operational platforms for managing infrastructure related systems with focus on available utility networks are discussed in Hader et al. [101]. Menouar et al. [98] study the next generation of smart transportation systems-UAV enabled intelligent transport system-from an technological standpoint by highlighting applications and challenges.
Interestingly, Menouar et al. [98] also depict the trend in this stream. The performance analysis of potential and risk becomes more and more important in the world of smart city. This preventive measure is used to evaluate existing or new processes, process changes, and equipment.
The topic of smart technology experienced a similar research interest up to 2020 as smart cities (see Figure 6). From 2010 onwards, the growth in publications related to smart technology was even stronger than in smart city. In both areas, academic interest will grow in the future. Nonetheless, we expect that smart city will outperform the sub dimension technology going forward.

Smart Sustainability
In the context of smart city, smart sustainability is often characterized as the usage of ICT in smart urban structures in order to achieve a sustainable development, meaning that the needs of today are meet without sacrificing the needs of future generations with respect to economic, social and environmental aspects [223].
Research in smart sustainability often deals with commodities and resources, such as "energy", "electricity", "water", "lighting", and "waste", but also with sustainable "growth", "initiative[s]" and "concept[s]" (Figure 7). Another prevailing topic seems to be "security" and "surveillance" that mostly relates to the conservation of resources for a sustainable future. However, the "security" of infrastructure and urban developments is also in focus, for example, Rodríguez-Gaviria et al. [224] construct a vulnerability indicator for low-income flood-prone urban areas that can be used for disaster risk management and enhance infrastructure protection.
Our algorithm deemed Söderström et al. [149], Ahvenniemi et al. [140], Zygiaris [143] as most relevant within smart sustainability. All three publications deal with different aspects of sustainability. Following the officially registration of "smarter cities" as trademark for IBM 2011, Söderström et al. [149] analyze IBM's smarter city campaign, which focused on efficient and sustainable cities. The campaign is ultimately aimed at establishing the company as key player in the implementation of urban technology. Ahvenniemi et al. [140] closely examine the differences between sustainable and smart cities. The authors analyze 16 city assessment frameworks, 8 for sustainable cities and 8 for smart cities. In total 958 indicators are classified in 12 sectors, such as transport and energy, and three impact categories (environment, economic, and social). Ahvenniemi et al. [140] find that smart city frameworks focus more on modern technologies and social and economic aspects while sustainable city structures emphasis the environment and sustainability. A smart city reference model to conceptualize the development of smart city innovation ecosystems is proposed by Zygiaris [143]. The planning framework consists of seven layers, including the "green city layer" which focuses on the environment and considers factors such as water conservation and green building policies. Sustainability is the focal point of the introduced model.
In smart sustainability, the dominant category includes Taylor and While [147], Söderström et al. [149], Colding and Barthel [152], Dameri and Cocchia [150], Falconer and Mitchell [148], Anthopoulos et al. [145], Zygiaris [143], Crivello [142], Ahvenniemi et al. [140], Hollands [141], and Aoun [144]. Two of the three most influential papers are present in that cluster ( [140,149]). Taylor and While [147] and Crivello [142] analyze different smart city initiatives, that is, Crivello [142] studies the implementation of smart city in Turin, Italy by examining the involved actors, processes and networks while Taylor and While [147] examine initiatives by the UK government, which promotes sustainable business growth, to facilitate urban technological innovation through a variety of initiatives, such as the Technology Strategy Board Future Cities Demonstrator Competition. Different smart city concepts are examined by Anthopoulos et al. [145] and Dameri and Cocchia [150]. Conceptualization, benchmarks and evaluations of smart city frameworks, in which sustainability and resilience often play key roles, are reviewed by Anthopoulos et al. [145]. Dameri and Cocchia [150] investigate the evolution of smart city and digital city concepts through conducting a deep literature survey and design the contents and the boundaries of each of these urban development concepts. Smart city visions and models are criticised by Hollands [141] and Colding and Barthel [152]. Colding and Barthel [152] address the need to enhance the current smart city model as it often does not meet the needs of urban sustainability and inadequately considers resilience, power relations, and health issues. Hollands [141] critiques the corporate smart city vision, which incorporates sustainable and technological elements, and sheds light on sociological questions. Finally, Falconer and Mitchell [148] and Aoun [144] outline strategies to make cities smarter. A 5-step approach for converting urban developments into more efficient and sustainable places to live is presented by Aoun [144]. The road map consists of setting the vision, deploying technology, bringing in integration, adding innovation, and driving collaboration. Falconer and Mitchell [148] propose a smart city framework that helps stakeholders and city participants to understand the operations of cities, define objectives and stakeholder roles, and understand the role of ICT to ultimately help cities to become smarter and more sustainable.
Since there is a huge dynamic in research, Ahvenniemi et al. [140] discuss the differences between sustainable and smart cities. The authors mention that there has been a shift from sustainability assessment to smart city goals in the 21st century. There is a much stronger focus on modern technologies and smartness in the smart city frameworks compared to urban sustainability frameworks.
Publications about smart sustainability substantially increased in the last years (see Figure 7). Smart sustainability, together with smart technology, is the only sub dimension that persistently outperforms smart city. The exponential growth in academic interest in sustainability even picked up from 2015 to 2020. One main driver for this outstanding increase could be the progressing debate about climate change and the newly established green movement "Friday for futures" that imprinted the topic of sustainability in the public's mind. We expect a continuation of the explosive growth in interest in sustainability. By 2030, publications about smart sustainability are estimated to be 175 times that of 2000.

Smart Health
The provision of health services through the sensing and adaptive infrastructure of smart cities is often defined as smart health. Smart health is seen as a subset of electronic health, which uses ICT-not necessarily related to smart city-to reduce costs and efficiency [225]. Figure 8 reports the word cloud with the most important terms in smart health. On the first ranks are "person", "community", "citizen", and "healthcare". Those terms are clearly expected in an health context. Notably, "education", "culture", "school" are also common topics that are discussed in smart health. "Nation", "society", and "public" are on place sixth to eight.
Both of the important classified articles [181,194] examine the core components of smart city. Nam and Pardo [181] identify the smart city core component factors technology, human and institutional and offer strategic principles to make a city smart. The human factor focus on creativity but also considers the accessibility of education irrespective of language, culture and disabilities. Gil-Garcia et al. [194] propose an integrative and comprehensive conceptualization based on their analysis of smart city core components. Gil-Garcia et al. [194] divide the components in three categories-physical environment, society, and government-that are all influenced by technology and data in a smart city. Health and social services are attributed to the government component as government agencies are in most countries accountable for those services. The accessibility of hospitals is of particularly high importance that requires an extensive data driven analysis to consider dynamic changes in transport costs and travel time in order to provide universal access to hospital services for the whole population [226].
The prevailing smart health subcategory ( [194,[197][198][199]) also includes one of the most important papers ( [194]). In a similar manner to Gil-Garcia et al. [194], Monfaredzadeh and Krueger [197], Kumar et al. [198] divide smart city into multiple dimensions. Monfaredzadeh and Krueger [197] discuss the topic of people and communities in the context of smart city by exploring the dimensions sustainability and social, and human. All possible services among various city dimensions, which enable cities to be smart, are explored by Kumar et al. [198]. Calzada and Cobo [199] critically challenge smart city with a focus on social aspects based on a ten dimension framework.
Finally, we consider the trend of the last years. Here, Brauneis and Goodman [191] test the limits of transparency of large data analyses by contributing to the literature on algorithmic accountability with a thorough investigation of the opacity of state prediction algorithms. The authors focus on the use of prediction algorithms by local and state authorities using open records processes.
The Google Scholar trend index illustrates that the attention for smart health stalled since 2016. Around 2014, the sub dimension health enjoyed a stronger growth in publications compared to smart city. This relationship reversed in the following years with smart city growing stronger than its building block smart health. Similar to all other sub dimensions except sustainability and technology, the research interest in smart health is expected to slow down in future compared to smart city. Smart city is expected to substantially outgrow smart health by 2030.

Conclusions
This manuscript systematically reviews the top 200 Google Scholar publications in the area of smart cities ( ) with the aid of data-driven methods from the fields natural language processing and time series forecasting. In this respect, we provide three main contributions to the existing literature.
The first contribution refers to the novel developed algorithm, which crawls the text information of the regarded publications and uses the created ad-hoc database to identify the most relevant streams "smart infrastructure", "smart economy & policy", "smart technology", "smart sustainability", and "smart health". The corresponding manuscripts are automatically classified into these subject areas using an interdisciplinary scientific methodological approach.
The second contribution deals with the deep-dive analysis of each stream in order to obtain an holistic view at the research field from different angles. Therefore, we create a word cloud representing the most important keywords and analyze the main contributions. Furthermore, time series methodologies are applied to assess the past and future relevance of each research area.
The third contribution relies on the data-driven identification of dynamic changes in the relevance of the streams. Hereby, we acknowledge that the topic of smart sustainability will come to the fore in the next years-this fact is not surprising, as minimizing required inputs of energy, water, food, waste, output of heat, and air pollution acquires more and more significance.
For further investigations in this research area, a supervised machine learning approach may be set up to classify documents based on a trained model. Second, the number of citations could be considered together with the number of years after the paper published. Next, a multivariate framework could be implemented in order to account for common interactions between the manuscripts. Finally, the implemented algorithm might be applied to other research areas, such as climate changes or water pollution.
Author Contributions: J.S. conceived the research method. The experiments are designed and performed by J.S. The analyses were conducted by J.S. and reviewed by L.S. The paper was initially drafted and revised by J.S. and L.S. It was refined and finalized by J.S. and L.S. All authors have read and agreed to the published version of the manuscript.

Funding:
We are grateful to the "Open Access Publikationsfonds", which has covered 75 percent of the publication fees.

Acknowledgments:
We are further grateful to three anonymous referees for many helpful discussions and suggestions on this topic.

Conflicts of Interest:
The authors declare no conflict of interest.