Machine Learning Technologies for Sustainability in Smart Cities in the Post-COVID Era

: The unprecedented urban growth of recent years requires improved urban planning and management to make urban spaces more inclusive, safe, resilient and sustainable. Additionally, humanity faces the COVID pandemic, which especially complicates the management of Smart Cities. A possible solution to address these two problems (environmental and health) in Smart Cities may be the use of Machine Learning techniques. One of the objectives of our work is to thoroughly analyze the link between the concepts of Smart Cities, Machine Learning techniques and their applicability. In this work, an exhaustive study of the relationship between Smart Cities and the applicability of Machine Learning (ML) techniques is carried out with the aim of optimizing sustainability. For this, the ML models, analyzed from the point of view of the models, techniques and applications, are studied. The areas and dimensions of sustainability addressed are analyzed, and the Sustainable Development Goals (SDGs) are discussed. The main objective is to propose a model (EARLY) that allows us to tackle these problems in the future. An inclusive perspective on applicability, sustainability scopes and dimensions, SDGs, tools, data types and Machine Learning techniques is provided. Finally, a case study applied to an Andalusian city is presented.


Introduction
By 2030, the number of people living in cities is estimated to reach 5 billion, and by 2050, two-thirds of humanity will live in cities [1].
In recent years, trends such as the lack of natural resources, overcrowding in urban areas, the enormous production of waste and the pollution generated in these areas have been highlighted [2]. Technological developments can help alleviate these effects by taking data and analyzing them to provide recommendations for improving sustainability parameters. The digital transformation is helping cities to become smart spaces.
The United Nations has proposed a working agenda known as the Sustainable Development Goals (SDGs) [3]. The SDGs emerged from the relative success of the Millennium Development Goals [4]; they have a time horizon ranging from 2015 to 2030 [3]. The SDGs are a first-level challenge, and require the coordinated participation of multiple actors (public administration, businesses, non-governmental organizations) to achieve them [4,5].
Sustainable Development Goal number 11 (SDG 11) focuses on the sustainability of communities and cities [3]. The unprecedented urban growth of recent years requires improved urban planning and management to make urban spaces more inclusive, safe, resilient and sustainable [3]. One of the keys to the SDG agenda is integrative nature. It is not possible to move forward with SDGs in isolation, because they are interconnected [5]. The standardization carried out by the AENOR technical The reality of cities will be very marked by the management of the pandemic and compliance with the SDGs. An analysis has been carried out of what has been done so far to elucidate, prospectively, where it is possible to continue working on promoting sustainability in cities. The context in which we do it is that of managing a pandemic that undoubtedly marks the functioning of cities.
The main objective of this work is to propose a conceptual framework that helps researchers to address sustainability problems in Smart Cities in the post-COVID era, making use of current Machine Learning techniques.
As far as we know, this approximation is not widespread enough. We detected an opportunity gap in the investigation of the state of the art, with the aim of proposing a conceptual framework that can help in tackling these types of problems.
It is intended to analyze experiences of application of machine learning techniques to improve sustainability, and address the research question: How can machine learning tools contribute to the sustainability of Smart Cities in the current pandemic context?
In this work, an exhaustive study of the relationship between the Smart Cities and the applicability of Machine Learning (ML) techniques is carried out with the aim of optimizing sustainability. To this end, the ML models analyzed are studied from the point of view of models, techniques and applications. The areas and dimensions of sustainability addressed are analyzed, and the SDGs addressed are discussed.
An integrative perspective on applicability, sustainability domains and dimensions, SDGs, tools, data types and ML techniques is provided.
In the following sections, the methodology followed is detailed, the results obtained are shown, the results are discussed, including a case study, and conclusions are drawn.
A comprehensive review of the literature available in the Web of Science (WOS) database was conducted. Specific criteria were used, which aimed to examine all publications that carried out the development of a case study applied to cities with Machine Learning techniques applied to city sustainability.
The results show a bibliometric analysis that aims to analyze the network of keywords used in the articles to analyze the most used terms, a co-citation and co-occurrence analysis that reveals the most cited authors and organizations with the greatest impact on publications, as well as an analysis of the WOS categories in the publications database that show the areas in which the research is carried out. The reality of cities will be very marked by the management of the pandemic and compliance with the SDGs. An analysis has been carried out of what has been done so far to elucidate, prospectively, where it is possible to continue working on promoting sustainability in cities. The context in which we do it is that of managing a pandemic that undoubtedly marks the functioning of cities.
The main objective of this work is to propose a conceptual framework that helps researchers to address sustainability problems in Smart Cities in the post-COVID era, making use of current Machine Learning techniques.
As far as we know, this approximation is not widespread enough. We detected an opportunity gap in the investigation of the state of the art, with the aim of proposing a conceptual framework that can help in tackling these types of problems.
It is intended to analyze experiences of application of machine learning techniques to improve sustainability, and address the research question: How can machine learning tools contribute to the sustainability of Smart Cities in the current pandemic context?
In this work, an exhaustive study of the relationship between the Smart Cities and the applicability of Machine Learning (ML) techniques is carried out with the aim of optimizing sustainability. To this end, the ML models analyzed are studied from the point of view of models, techniques and applications. The areas and dimensions of sustainability addressed are analyzed, and the SDGs addressed are discussed.
An integrative perspective on applicability, sustainability domains and dimensions, SDGs, tools, data types and ML techniques is provided.
In the following sections, the methodology followed is detailed, the results obtained are shown, the results are discussed, including a case study, and conclusions are drawn.
A comprehensive review of the literature available in the Web of Science (WOS) database was conducted. Specific criteria were used, which aimed to examine all publications that carried out the development of a case study applied to cities with Machine Learning techniques applied to city sustainability.
The results show a bibliometric analysis that aims to analyze the network of keywords used in the articles to analyze the most used terms, a co-citation and co-occurrence analysis that reveals the most cited authors and organizations with the greatest impact on publications, as well as an analysis of the WOS categories in the publications database that show the areas in which the research is carried out.

Materials and Methods
A systematic review of the literature [31,32] was carried out taking into account the main areas of the study: "Smart City"/"Smart Cities", "Machine Learning" and "Open Data" (Figure 2).

Materials and Methods
A systematic review of the literature [31,32] was carried out taking into account the main areas of the study: "Smart City"/"Smart Cities", "Machine Learning" and "Open Data" (Figure 2) The literature search was developed in four phases, as can be seen in Figure 3.  The literature search was developed in four phases, as can be seen in Figure 3.

Materials and Methods
A systematic review of the literature [31,32] was carried out taking into account the main areas of the study: "Smart City"/"Smart Cities", "Machine Learning" and "Open Data" (Figure 2) The literature search was developed in four phases, as can be seen in Figure 3.   In phase 1, "Initial Research", the design parameters are defined. The WOS database was selected as the ideal database because of its high scientific impact, the diversity of databases that make it up and the thematic versatility of formats or types of documents that it supports, ranging from scientific articles, books and book chapters to presentations at conferences or bibliographic reviews. There are numerous works that use the WOS as a source of information [33].
The search performed in this phase was carried out as follows: A first search with the terms "Smart City" OR "Smart Cities" yielded 12,589 records. Then, two associated searches were made refining with the term "machine learning" and on the other hand with the term "open data". Both refinements were then screened with the time period of the last five years (2015-2019), with the format type "article" and from the "science technology" database. The results obtained were: A. Smart City/Cities + Machine Learning + 2015-2019 + articles + science technology = 170 publications. B. Smart City/Cities + Open Data + 2015-2019 + articles + science technology = 91 publications.
After a first analysis and reading of the abstracts of the results of search B, most of the publications of this search were discarded for being out of scope. Thus, the 170 publications of search A constituted the basis of our work.
For the second phase, 1st restriction, the in-depth reading of the titles and abstracts of all publications was carried out with the following discard criteria: • − The content of the article was outside the scope of the study. • − The paper did not describe a data application.
After this analysis, 60 publications were discarded because they did not meet the requirements, 110 articles were considered.
Subsequently, the full texts of the 110 publications were analyzed, using the same exclusion criteria as in the previous stage. Eleven were discarded, and 99 moved on to the next phase (in-depth analysis). It is necessary to point out that among these 99 articles there were some for which there was doubt as to whether they met the requirements of the study. Therefore, a second analysis by the research team was carried out to continue to include and/or discard criteria (such as, for example, exclusion of health issues, methodological developments with simulations that do not have real data, etc.).
Finally, 65 publications were considered for phase 4, called "final papers". In this phase, a more in-depth analysis of four areas of research is carried out: the SDGs that the study refers to, the Machine Learning techniques they use, the data used in the case study, and the application of sustainability in the Smart Cities to which it refers. This aspect was the last study to be carried out in this phase of the review methodology, as a wide variety of application areas were obtained that needed to be classified. A Pareto diagram [34] was developed for them, with the aim of grouping the documents into clusters.
In phase 1, "Initial Research", a search was carried out with the terms "Smart City" or "Smart Cities", refining with the term "machine learning" with the time period of the last five years (2015-2019), with the "article" format type and the "science technology" database. The results obtained were 170 publications that formed the basis of our work and on which the bibliographic analysis has been developed, the results of which are shown in Section 3.1.
In the following phases, publications were analyzed discarding those whose content was outside the scope of the study or whose document did not describe a data application, considering 65 publications in phase 4 called "final articles". Figure 3 shows a summary of the results of each of the stages of the comprehensive review. The following section shows the development of the analysis of the 65 papers of the final stage that allow the development of the proposal of the tool under investigation.

Results
In this section, each of the blocks of results of the research carried out was developed, starting with the bibliometric analysis of the publications database, with the aim of analyzing the keyword networks that were most applied, the organizations and countries with the greatest impact, and the WOS categories in which the publications were classified on the basis of a co-citation and co-occurrence analysis.
Next, the EARLY model is presented. The objective of this model is to provide an analysis structure that will allow us to address the problems (environmental and health) in Smart Cities in the future. It provides an inclusive perspective on the applicability, domains and dimensions of sustainability, SDGs, tools, data types and machine learning techniques.
After this, the analysis of the three main groups of results of the 65 final papers analyzed is carried out: SDG-Sustainability Relationship, Application-Sustainability-Data and Application-Sustainability-Machine Learning Techniques.
Finally, a case study using the proposed model will be applied for the city of Malaga.
The main contributions of this work, in the context of the post-COVID era, can be summarized as:

1.
A bibliometric analysis of the publications database.

2.
A conceptual framework that provides an inclusive perspective on the applicability, domains and dimensions of sustainability, SDGs, tools, data types and machine learning techniques.

3.
A case study with the proposed model applied to the city of Malaga.

Bibliometric Analysis
A bibliometric analysis of the database selected in phase 1 "Initial research" was performed. This analysis allowed us to analyze the current state of publications in order to obtain a map of the most used keywords, the most representative authors, the organizations with the greatest number of publications, and the greatest impact on them, as well as the WOS categories into which the articles being analyzed were classified.
The methodological structure was based on the proposal of Zhao and Strotmann [35]. This strategy consists of four stages: definition of search keywords and the database, data cleaning and formatting, initial data analysis, and in-depth analysis (networks and results).
As a basis for the bibliometric study, the 170 articles obtained in the first phase of the methodology described in the previous section were used. The search for publications in this phase was restricted to the last five years (2015-2019) after analyzing previous work on bibliometrics in the field of Machine Learning [36,37] due to the growing interest in this field in recent years, as well as the direct relationship between the publication of the SDGs in 2015 [38] and their applications to date.
In the next step, the data was refined. Some of the fields were not standardized, which could affect the reliability of the analyses, since variations in the nomenclature of an author, for example, would be interpreted as being two independent authors. Given the volume of the sample, it was necessary to perform the data refinement with the support of the open source software OpenRefine© [39], version 3.3, through which the author and organization variables were normalized.
After refining the data, the study carried out corresponded to a bibliometric analysis that allowed us to know the incidence and established networks of key words, authors and their reference organizations, sources and countries.
Data processing was performed using the VOSviewer© [40] software, version 1.6.15. This software, focused on the analysis of bibliometric and sociometric networks, allows the analysis of co-authorship of the sample based on authors, organizations or countries, co-occurrence of key words, bibliographic coupling, citation and co-citation of documents, sources, authors, organizations or countries, mapping of co-authorship networks, co-citation, bibliographic coupling and co-occurrence of key words of the selected bibliographic sample. The visualization of bibliometric and sociometric networks is often done using one of three basic approaches: distance-based, graphical and time-based approaches [41]. The VOSviewer© tool uses the approach based on distance and force of association, approximating those nodes that are closer, i.e., those with a smaller geodesic distance [42]. In general, the smaller the distance between two nodes, the greater their relationship, i.e., their similarity. For the calculation of the network, the input is a normalized co-occurrence matrix, on which the association force index or proximity index is calculated based on the co-occurrence variables between the nodes or references and the expected number of co-occurrences, understanding that these are independent variables [43]. Thus, the Sikh similarity between two nodes i and j can be calculated using Equation (1).
where C ij is the number of co-occurrences of nodes i and j and where W j is the total of the number of occurrences of nodes i and j or the total of co-occurrence numbers of these nodes. The result of this analytical process was a complete map of scientific production related to the search terms, in which the resulting network of keywords can be seen ( Figure 4). The visualization of bibliometric and sociometric networks is often done using one of three basic approaches: distance-based, graphical and time-based approaches [41]. The VOSviewer© tool uses the approach based on distance and force of association, approximating those nodes that are closer, i.e., those with a smaller geodesic distance [42]. In general, the smaller the distance between two nodes, the greater their relationship, i.e., their similarity. For the calculation of the network, the input is a normalized co-occurrence matrix, on which the association force index or proximity index is calculated based on the co-occurrence variables between the nodes or references and the expected number of co-occurrences, understanding that these are independent variables [43]. Thus, the Sikh similarity between two nodes i and j can be calculated using Equation (1).
where Cij is the number of co-occurrences of nodes i and j and where Wj is the total of the number of occurrences of nodes i and j or the total of co-occurrence numbers of these nodes. The result of this analytical process was a complete map of scientific production related to the search terms, in which the resulting network of keywords can be seen ( Figure 4). The following analyses were carried out [40,44]: • Co-occurrence analysis: used to measure the co-occurrence of "keyword authors", it bases its analysis on determining the number of documents in which they appear together. • Bibliographic coupling analysis: this type of analysis was applied to the authors of publications. The relationship between authors is measured according to the number of bibliographic references they share among their publications. • Co-authorship analysis: this analysis measures the relationship of the items based on the number of co-authorships in the documents. This type of analysis was applied to the reference organizations of the authors of the publications analyzed.

•
Citation analysis: this analysis, performed for the "source" variable, i.e., journals and publishers in which the documents are published, determines the relationship based on the number of times they are cited.

•
Co-citation analysis: applied to the "sources", this analysis explains the relationships according to the number of times the "sources" are cited together.
The result of this analytical process shows a complete map of the scientific production related to the search terms where, first of all, the resulting network of keywords can be seen ( Figure 4). The following analyses were carried out [40,44]: • Co-occurrence analysis: used to measure the co-occurrence of "keyword authors", it bases its analysis on determining the number of documents in which they appear together. • Bibliographic coupling analysis: this type of analysis was applied to the authors of publications. The relationship between authors is measured according to the number of bibliographic references they share among their publications. • Co-authorship analysis: this analysis measures the relationship of the items based on the number of co-authorships in the documents. This type of analysis was applied to the reference organizations of the authors of the publications analyzed. • Citation analysis: this analysis, performed for the "source" variable, i.e., journals and publishers in which the documents are published, determines the relationship based on the number of times they are cited.

•
Co-citation analysis: applied to the "sources", this analysis explains the relationships according to the number of times the "sources" are cited together. The result of this analytical process shows a complete map of the scientific production related to the search terms where, first of all, the resulting network of keywords can be seen ( Figure 4).
As can be inferred from the Figure 4, those keywords used in the publications with the highest occurrence coincide with the search terms: "Machine Learning" with 62 documents, "Smart City" with 43 documents, and "Smart Cities" with 39 documents. After this first group of influence, there was another group of keywords related to the internet and digitalization, with a similar occurrence: "internet" appears in 24 documents, "big data" in 23, and "internet of things" in 17 publications. In this first network, it is possible to analyze the linking strength between the terms and those that have the greatest linking strength, and which are therefore are more focused and connected, are again the search terms, followed by the second group of words related to industry 4.0 and digitization [45,46].
With regard to the analysis of the organizations carrying out research in the field of Smart Cities and the application of Machine Learning techniques in case studies with available data, there are no networks between them, that is, they are disconnected and the research is isolated.
In the following tables (Tables 1 and 2) two analyses of the organizations can be seen: a ranking by number of documents, and a second analysis by number of quotations. According to the number of documents, Ottawa University is at the top with 6 documents, followed by Sejong University with 4. The disconnection between organizations means that there are focuses of influence in the USA, Europe and Asian countries, and it is not possible to describe a research trend related to the organizations' countries of origin. With regard to the number of citations of published works, there were two universities located in North America and Australia, positioning it as the region of the world with the greatest influence on their work: La Trobe University with 93 citations, and the University of Ottawa with 88 citations. They are far higher than European and Asian organizations: Imperial College of London (65 citations), King Abdulaziz University (57 citations), Canadian University of Dubai (45 citations), Lulea University Technology, and the University of Edinburgh with 44 citations.
Subsequently, the categories, according to the Web of Science, were analyzed. Figure 5 shows that there are two main areas in which the largest number of publications are concentrated: "Engineering, Electrical & Electronic" and the area of computer science with the categories "Computer Science, Information" and "Telecommunications". Likewise, we can observe annexed nuclei with little connection related to "Chemistry, Analytical", transport and civil engineering and "Environmental Sciences". Finally, with regard to the years of publication, there was a greater concentration of the main categories in the years 2017 and 2018, but categories related to sustainability were observed as the most current study focus, creating an emerging research niche with the WOS categories "Environmental Sciences" and "Green & Sustainable Science & Technology".

Bibliographic Analysis for the Conceptual Framework Proposal
Articles were grouped according to the application to which they refer. From an analysis of these applications, we found that most of them focus on a few topics ( Figure 6).  Finally, with regard to the years of publication, there was a greater concentration of the main categories in the years 2017 and 2018, but categories related to sustainability were observed as the most current study focus, creating an emerging research niche with the WOS categories "Environmental Sciences" and "Green & Sustainable Science & Technology".

Bibliographic Analysis for the Conceptual Framework Proposal
Articles were grouped according to the application to which they refer. From an analysis of these applications, we found that most of them focus on a few topics ( Figure 6).

Bibliographic Analysis for the Conceptual Framework Proposal
Articles were grouped according to the application to which they refer. From an analysis of these applications, we found that most of them focus on a few topics ( Figure 6).  After applying a Pareto analysis (see Figure 6), it was concluded that these 65 papers were mainly grouped into five large clusters: 1. Transport 2. Energy 3.
Water and Air 4. Location 5.
Social transformation.

A New Conceptual Framework for Implementing Machine Learning Techniques in Smart Cities. EARLY
As mentioned in the previous sections, a possible solution is sought to address these two problems (environmental and health) in Smart Cities through the use of Machine Learning techniques.
The main objective will be to propose a model (EARLY) (see Figure 7) that allows us to address these problems in the future. An inclusive perspective on applicability, sustainability applications, scopes and dimensions, SDGs, tools, data types and machine learning techniques is provided. After applying a Pareto analysis (see Figure 6), it was concluded that these 65 papers were mainly grouped into five large clusters: 1. Transport 2. Energy 3. Water and Air 4. Location 5. Social transformation.

A New Conceptual Framework for Implementing Machine Learning Techniques in Smart Cities. EARLY
As mentioned in the previous sections, a possible solution is sought to address these two problems (environmental and health) in Smart Cities through the use of Machine Learning techniques.
The main objective will be to propose a model (EARLY) (see Figure 7) that allows us to address these problems in the future. An inclusive perspective on applicability, sustainability applications, scopes and dimensions, SDGs, tools, data types and machine learning techniques is provided. The process of optimizing sustainability in Smart Cities can be long and complex if the preparatory techniques are not properly used. The EARLY method is a graphical tool whose approach aims to help with the translation of the needs of the Smart City in the field of sustainability into technical specifications on data, applications, Machine Learning techniques and SDGs.
The EARLY model in a graphic representation with several zones to represent the results of the previously commented study.
In principle, it consists of four zones, each of which is used for a function, although its structure can vary greatly depending on the type of use and the amount of data available.
The central area corresponds to the application or application of sustainability in the Smart City addressed. Cross-cutting with the application we can think, in this initial state of definition of the optimization problem, in terms of the SDGs that we seek to promote.
The list of data must include the aspects that we would need to know (or those that we know) to apply the techniques of artificial intelligence successfully. At this point, the more data the better, without forgetting or neglecting any aspect, should be included, since the less relevant data will be discarded later.
Next, the appropriate Machine Learning techniques must be defined to address the optimization problem with the data considered.

Relationship SDG-Sustainability
The Sustainable Development Goals emerge from the relative success of the Millennium Development Goals [4,5]. SDGs constitute a working agenda that aims to set the course for world The process of optimizing sustainability in Smart Cities can be long and complex if the preparatory techniques are not properly used. The EARLY method is a graphical tool whose approach aims to help with the translation of the needs of the Smart City in the field of sustainability into technical specifications on data, applications, Machine Learning techniques and SDGs.
The EARLY model in a graphic representation with several zones to represent the results of the previously commented study.
In principle, it consists of four zones, each of which is used for a function, although its structure can vary greatly depending on the type of use and the amount of data available.
The central area corresponds to the application or application of sustainability in the Smart City addressed. Cross-cutting with the application we can think, in this initial state of definition of the optimization problem, in terms of the SDGs that we seek to promote.
The list of data must include the aspects that we would need to know (or those that we know) to apply the techniques of artificial intelligence successfully. At this point, the more data the better, without forgetting or neglecting any aspect, should be included, since the less relevant data will be discarded later.
Next, the appropriate Machine Learning techniques must be defined to address the optimization problem with the data considered.

Relationship SDG-Sustainability
The Sustainable Development Goals emerge from the relative success of the Millennium Development Goals [4,5]. SDGs constitute a working agenda that aims to set the course for world governments in the time period between 2015 and 2030 [3]. SDGs are a set of 17 Goals with 169 targets and 263 indicators [3].
One of the objectives, SDG 11, in particular, is specifically dedicated to building sustainable cities and communities [3]. Although the concept of SC started out as being properly linked to the development of technological features. The concept soon incorporated the sustainability dimension into the CS concept itself [47]. In this way, SCs can contribute to the development of Sustainable Development Goals [48].
One of the characteristics of Agenda 2030 is its indivisible nature [3]. The SDGs are closely related to each other [5,49]. The achievement of one has consequences on others, this may be the case of education [49] which, apart from being related to SDG 4, is also related to others such as SDG 5 (Gender Equality) or SDG 12 (Sustainable Consumption and Production), etc. Something similar occurs with the construction and implementation of CS [48]. The multidisciplinary nature of SCs means that their development is linked to other aspects such as the promotion of renewable energies and energy efficiency (SDG 7), air quality (SDG 3) and water quality (SDG 6), or the reduction of inequalities (SDG 10).
In line with previous studies on the relationship between SDGs and SCs. Machine learning methodologies can contribute to the development of SDGs. We believe that all of the articles analyzed have a relationship with SDG 11. In addition, we found articles related to SDG 3 (five articles), SDG 6 (two articles), SDG 7 (eleven articles) and SDG 10 (three articles).
Among the latter SDGs, SDG 7 had the largest number of items associated. There is no doubt that the development of SC is closely linked to energy saving and the promotion of renewable energies [48]. The articles related to this SDGs deal with topics such as the study of energy needs and consumption [50][51][52][53][54], thermal comfort in buildings [55,56], simulations [57] and measurements by intelligent electrical systems [58].
Sustainable Development Goal 3 seeks to ensure a healthy life and promote universal well-being. Among its goals is to reduce the number of deaths due to air and water pollution. A large number of the studies analyzed deal with air quality [59][60][61][62][63], while others seek to promote physical activity [64] or use machine learning techniques for health assessment [65].
Sustainable Development Goal 10 attempts to reduce the level of inequality. Among the studies that have been carried out, one analyses the risk of chronic social exclusion [66], the search for inequalities in schools according to their location [67]. NGDOs play a very important role in reducing these inequalities, and Gong et al. (2019) analyzed the participation of NGDOs in the construction of CS [68]. SDG 6 seeks to ensure access to clean water and sanitation. Among the articles analyzed, two of them study water quality [69,70].

Application of Sustainability-Data Relationship
As shown in [46], data is of great importance in the implementation of machine learning techniques. There are many jobs where you need to request data, due to a stark data environment. The study carried out in [71] used data on materials, types of buildings and their geometry, as well as various data in the field of structures. In [72], data were taken from a business database to assess the environmental impact of roads, and in [73], mobility studies were carried out in the city of Chicago.
In [74,75], randomly collected data related to agriculture (seeds, fertilizers or pesticides) were used to estimate the environmental impact of rice and sugarcane farms, respectively.
The use of data from simulations is also very common. In [76], it was done with various factors on existing buildings. It is also possible to work with data generators, validated with experimental data, as was done in [77].
For both products and buildings, data can be collected at the conceptual design [78] or initial design [79] stages, for use in subsequent environmental assessment [77,78].
In view of the articles studied, it was considered essential to take into account the necessary data. In a first approximation, the data can be numerical (discrete or continuous) or categorical (hierarchical or not). Additionally, it is necessary to determine the necessary sample size and the number of characteristics to consider.
Images taken from videos were used as the source for the analyses in [80], being non-hierarchical categorical data. The initial sample size was 72,012 images of 224 * 224 * 3 pixels. In this work, seven publicly accessible videos were used for testing.
Sometimes, four groups of audio files have been used [81]. The first included files collected in a crowded city center (24, 109, 15 classes). The second data set represented the sounds of household appliances (18,615, 7 classes). The third data set contained sounds from homes (20,238, 33 classes), representing human actions at home. Finally, the fourth data set included sounds of human actions in different places (17,221, 20 classes).
At other times, a previously published dataset was used [50]. The data set used contained 35 different variables of meteorological information (temperature, humidity, pressure, wind speed, visibility, dew point), household appliances and light energy consumption, and temporal data. These data were collected inside and through a network of exterior sensors from a two-story building and a nearby airport.
Public data from a bus company and the Alibaba platform have been used to predict the use of public transport [82]. Finally, the electricity consumed in homes by LCD monitors, heaters, lamps, refrigerators, printers, and smart TVs was studied by Rodríguez-Fernández et al. [54].

Relationship between Sustainability Applications and Machine Learning Techniques
The different works studied address the problem with different machine learning tools. These tools can be very diverse. For example, depending on the application, these can be supervised learning (univariate, multivariate regression, classification, etc.), unsupervised (for example, dimensionality reduction, segmentation) or reinforcement. Based on the models, one can speak of linear regression, logistic regression, Support Vector Machine (SVM), k-means and neural networks, among others.
Convolutional neural networks have been used to detect fire [80]. They were used for an image classification tool in 1000 categories. In this case, they only had to obtain four categories: no fire, no fire with mist, fire with mist, and fire without mist. The results were compared with those of AlexNet [83] and GoogleNet [84].
Logistic regression, GBDTs and Random Forest were used as baselines. For passenger flow forecasting in [82], Autoregressive-moving Average (ARMA), a single-layer Artificial Neural Network (ANN) and linear regression Average (ARMA) were used.
On other occasions, the techniques of Naïve Bayes, Random Forest and Decision Tree [58] are used. The work presented in [85] was based on the use of Least Square (LS), Multilayer Perceptron (MLP) and k nearest neighbor (kNN).
In Reference [86] ANN, k-nearest neighbor (KNN) and RF have been employed for classification.

Contribution Discussion
This work is based on a detected opportunity gap. The objective of optimizing sustainability has not been, as far as we know, sufficiently explored through the use of Machine Learning algorithms. This paper aims to explore this path in cutting-edge research.
This work starts from the triple coincidence between the sectors of "Smart City"/"Smart Cities", "Machine Learning" and "Open Data" (see Figure 2). During the refinement of the searches carried out, the search for "Open data" was eliminated, since the results obtained from the searches of these pairs of terms were very numerous, and not all of them were relevant.
The main contributions of this work, in the context of the post-COVID era, can be summarized as:

1.
A bibliometric analysis of the publications database.

2.
A conceptual framework that provides an inclusive perspective on the applicability, domains and dimensions of sustainability, SDGs, tools, data types and machine learning techniques.

3.
A case study with the proposed model applied to the city of Malaga.
The analysis of the existing bibliography shows that, in the last 5 years, there has been an exponential growth in studies with practical applications aimed at the application of ML techniques to improve aspects of the sustainability of cities. Studies have been developed for application in the areas of energy management [87][88][89] that predict consumption and optimize its use thanks to the use of IoT, creating a sustainable and efficient city; there has been research focusing on traffic management to improve the lifestyle and pollution of vehicles in cities [90,91]; and there have been applications of sustainable constructions within cities promoting the use of IoT to significantly promote and accelerate the development of smart buildings with low energy consumption in the future, which implies the creation of sustainable construction environments [92][93][94]. Likewise, there is a line of research development focused on the use of ML techniques that help the planning of the main lines in a Smart City project in line with the spatial and urban peculiarities and the interactions of the subsystems creating a sustainable city since its conception [94][95][96].
From the analysis of published studies, it was extracted that there is a deficit in the development of research that addresses the relationship between Smart Cities and the applicability of Machine Learning (ML) techniques in order to optimize sustainability from the applicability, the domains and dimensions of sustainability, the SDGs, the types of data and the ML techniques used to process the data. That is why the need arises to create a model that encompasses all these factors in order to provide smart solutions for cities that optimize sustainability in any area of the urban structure. Likewise, derived from the current pandemic situation, where the intelligent management of health issues associated with society and cities, the proposed model could lead to an improvement in the development of useful applications for the need for data management and strategy development to improve the situation that affects us.
This work aims to satisfy the need for the existence of a holistic model for the development of a Smart City where all aspects derived from sustainability (dimensions and applications), data (types of data, obtaining these and ML techniques exist) and the implication of the SDGs in their development. Currently, there are no publications that encompass all these aspects, which represents an important gap, given the new scenario that significantly influences the concept of city and society that we currently have: COVID-19. This model could be a starting point for the early management of future situations with characteristics similar to the current pandemic, developing strategies and methods that help include intelligence as an active part in cities that allow the development of solutions that help the management of new applications in society and its urban structure.
The bibliographic study carried out delved deeply into those investigations in which the use of machine learning tools and their application to Smart Cities converge, and the results obtained are shown in the corresponding section.
The main contribution of this work is the proposal of a model (EARLY) that will be useful when addressing problems in Smart Cities in the future. As mentioned in the introductory section, these problems encompass environmental and health issues, gaining special relevance in the post-COVID era.
The proposed model aims to provide an inclusive perspective on the applicability, scope and dimensions of sustainability, SDGs, tools, data types and machine learning techniques.
To illustrate the applicability of the proposed model, a specific application case is shown, taking as a case study the Smart City of Malaga (Spain). This developed case study can be found in the next section.

Case Study. Malaga City
The Strategic Plan for Technological Innovation of the city of Malaga [97] establishes that, although in a Smart City new technologies are essential, it is necessary to highlight that neither these technologies nor the large volumes of information they generate can do by themselves that a city is smart, but these elements will be able to contribute intelligence to the extent that they are able to effectively satisfy the needs of its citizens. Therefore, an approach is established by which the city and all actions carried out in it comply with a scheme articulated in six strategic axes (sustainable and safe habitat, citizen services, smart mobility, digital transformation, ICT infrastructures and innovation economy).

Application
The application of the proposed case study will be described below. This application arises from the need to integrate a higher level of renewable energy resources to achieve the objectives set by the EU. This objective meets with some challenges, such as the need to actively control devices connected to a grid, the flattening of the charging curve and the incorporation of electric vehicles. In this case study, the medium voltage lines in the city of Malaga (Spain) were analyzed for the last five years, with the aim of optimizing the charging of electric vehicles and thus maintaining the stability of the distribution network. The actual implementation within the Smart City Malaga Project [97] included the installation of smart meters for all customers and new automation and communication systems through the network connected by a broadband power line communication network (see Figure 8). New management, control and integration systems were developed for all elements of energy consumption, production and storage through the distribution network. This case study is an example of how, through automation, the installation of new systems of innovation communication technologies (ICT), the use of smart sensors, distributed energy resources (DER) and demand control-In response, solutions are provided towards the European 20-20-20 target.
More specifically, potential is found in the areas of: • Use of infrastructure.

•
Fast charging at authorized points with exhaustive control. • Energy optimization by using night hours to recharge electric vehicles. • Active demand management, both for electric vehicle recharging processes and for other types of consumption.

•
Quality of supply and service and guarantee of system stability.
One of the conclusions of this case study obtained from the analysis of the charging curves is the maximum capacity potentially available to charge electric vehicles at night. The remaining capacity is estimated to be sufficient to load almost 5000 vehicles, in optimal conditions [99,100].
Another of the objectives addressed in this case study is the reduction of carbon dioxide emissions. Likewise, it works with smart meters and the data flows are adapted to the different use cases, in such a way that the data management is favored, taking into account their heterogeneity and variety [101]. Sustainability 2020, 12, x FOR PEER REVIEW 16 of 26

SDGs
Analyzing the alignment of the main municipal plans that were aligned with the city's strategy and their link with the SDGs [98], it is worth noting the more comprehensive nature of the Urban Agenda, which includes 15 of the 17 SDGs, as well as the program of municipal government (with 13 of 17 SDGs). The general urban planning plan follows them transversally, with nine integrated SDGs.
Once all the information from the different agents of the city, both public and private, had been collected and the progress report on the SDGs had been prepared, the city model of the Strategic Plan was aligned, the priorities of the existing plans were reviewed and those that really exist when carrying out actions, and yet an alignment of the Operational Program 2018-2021 of the city strategy was proposed, as well as other general criteria for the rest of municipal planning [98].
The results of the surveys were used to determine what the current priorities of the different agents and groups are, in which information was requested on the actions that were carried out on a regular basis and that were part of their essential competencies or objectives. Tabulating this information, we have an overview of the SDGs that are currently a priority for all the agents, and also for each one of them [98] (see Figure 9). capacity is estimated to be sufficient to load almost 5000 vehicles, in optimal conditions [99,100].
Another of the objectives addressed in this case study is the reduction of carbon dioxide emissions. Likewise, it works with smart meters and the data flows are adapted to the different use cases, in such a way that the data management is favored, taking into account their heterogeneity and variety [101].

SDGs
Analyzing the alignment of the main municipal plans that were aligned with the city's strategy and their link with the SDGs [98], it is worth noting the more comprehensive nature of the Urban Agenda, which includes 15 of the 17 SDGs, as well as the program of municipal government (with 13 of 17 SDGs). The general urban planning plan follows them transversally, with nine integrated SDGs.
Once all the information from the different agents of the city, both public and private, had been collected and the progress report on the SDGs had been prepared, the city model of the Strategic Plan was aligned, the priorities of the existing plans were reviewed and those that really exist when carrying out actions, and yet an alignment of the Operational Program 2018-2021 of the city strategy was proposed, as well as other general criteria for the rest of municipal planning [98].
The results of the surveys were used to determine what the current priorities of the different agents and groups are, in which information was requested on the actions that were carried out on a regular basis and that were part of their essential competencies or objectives. Tabulating this information, we have an overview of the SDGs that are currently a priority for all the agents, and also for each one of them [98] (see Figure 9).  It can be seen that SDG 5, on gender equality, is the one with the highest number of actions by all agents, followed by SDG 17 (alliances to achieve the objectives), and SDG 4 (quality education). It is closely followed by actions linked to SDG 16 (peace, justice and strong institutions) and SDG 10 (fight against inequalities) and SDG 3 (healthy life) [98].
Rereading the city model under the prism of the 2030 Agenda demonstrates how the proposed development model is sustainable, since it incorporates the 17 SDGs as a whole. In addition, it establishes as a priority the axis linked to the sustainable and coastal nature of the metropolis, giving prominence to the Urban Agenda 2050 of the Malaga City Council.
Each of the star projects designed to execute the strategies of this four-axis model, affects different SDGs, so that until 2030, these are the priorities of the Strategic Plan (see Figure 10). Rereading the city model under the prism of the 2030 Agenda demonstrates how the proposed development model is sustainable, since it incorporates the 17 SDGs as a whole. In addition, it establishes as a priority the axis linked to the sustainable and coastal nature of the metropolis, giving prominence to the Urban Agenda 2050 of the Malaga City Council.
Each of the star projects designed to execute the strategies of this four-axis model, affects different SDGs, so that until 2030, these are the priorities of the Strategic Plan. This process of locating the 2030 Agenda has led to the establishment of an Operational Program of the Strategic Plan that already incorporates these SDG priorities. The Program, designed until 2021, establishes the set of actions to be carried out. Of the total of 42 actions collected, SDG 8 (for economic growth and decent work) and SDG 11 (for a resilient and sustainable city) are the ones that will mark the actions of the next three years in the metropolis [98].
The plan makes a final proposal for monitoring and promoting this localization process in such a way that each year progress is reported, and everyone is gradually involved so that no one is left behind.

Data
The ideal Smart City model needs to collect all the data necessary to make the city a Smart City. They are a fundamental part of keeping the city connected and informed, and making each subsystem fulfill its function [101].
The project of making Malaga a Smart City is focused on energy management. The integration of renewable sources into the electricity grid has been chosen, with the aim of increasing efficiency This process of locating the 2030 Agenda has led to the establishment of an Operational Program of the Strategic Plan that already incorporates these SDG priorities. The Program, designed until 2021, establishes the set of actions to be carried out. Of the total of 42 actions collected, SDG 8 (for economic growth and decent work) and SDG 11 (for a resilient and sustainable city) are the ones that will mark the actions of the next three years in the metropolis [98].
The plan makes a final proposal for monitoring and promoting this localization process in such a way that each year progress is reported, and everyone is gradually involved so that no one is left behind.

Data
The ideal Smart City model needs to collect all the data necessary to make the city a Smart City. They are a fundamental part of keeping the city connected and informed, and making each subsystem fulfill its function [101].
The project of making Malaga a Smart City is focused on energy management. The integration of renewable sources into the electricity grid has been chosen, with the aim of increasing efficiency and reducing carbon dioxide emissions. The intention is that this energy control system also reaches homes.
The installation of more than 17,000 smart meters has been carried out, and a sample of 50 of these users have energy efficiency solutions for the home. Emblematic buildings in the area have energy efficiency solutions installed at their headquarters, with which they can monitor their consumption and control some of their loads [101].
Advanced automation systems have been installed in more than 20 transformation centers, and a total of 72 centers are communicated thanks to a broadband PLC (PowerLine Communication) network, which connects any point of the electrical network to the control center of network, where these assets will be monitored [101].

Machine Learning
An example of using Machine Learning to promote sustainability in Malaga is the PASTORA Project (Preventive Analysis of Smart Grids with Real Time Operation and Renewable Assets Integration). This project [100] combines artificial intelligence and Big Data to create intelligent networks that allow improved real-time control and preventive maintenance of the distribution network that reaches homes [99].
In this project, big data technologies and machine learning techniques based on deep learning and artificial intelligence are used to exploit the millions of data offered by the intelligent network. This information will allow the development of predictive models of how the network will behave in order to improve its operation, avoid incidents and increase the quality of service to end customers [102].
The goal of these massive data analysis and machine learning tools is to enable the integration of renewables and electric vehicles by predicting the state of the grid and anticipating where incidents may occur to direct maintenance work and investments to points most needed.
Smart transformers with integrated sensors, real-time information processing tools thanks to big data, analysis of historical data series to prevent and predict incidents, etc. All these tools will be tested through the PASTORA Project in the SmartCity Malaga Living Environment Lab [99].
In summary, deep learning and all the projects that include it in the energy sector seek intelligent and innovative solutions for the development of smart grids, while trying to respond to new needs with the integration of renewable energies, new self-consumption models and the progressive incorporation of the electric car [103].

Evaluation
SmartCity Malaga is recognized worldwide as one of the largest projects in the field of Smart Grids, both due to its size and the multiplicity of work areas involved. This is demonstrated by the numerous visits by authorities and organizations, public and private, who have wanted to be participants in this pioneering initiative and leader in the sector.
From the start of the SmartCity Malaga project in 2009 to its completion in 2013, the main advances have been: a saving of more than 25% in the electricity consumption of the implantation area thanks to the use of energy efficiency systems with monitoring, control and active demand management for industrial and residential users. In the case of residential collaborators, 42% decreased energy consumption by more than 10% thanks to the use of domestic energy efficiency kits that allowed them to manage their spending from anywhere in the world through a smartphone. In addition, 4,500 tons of CO2 emissions per year have been avoided [97].

Conclusions
Humanity is facing the SARs-COV-2 (COVID-19) pandemic, which is especially linked to the management of Smart Cities. It is necessary to analyze the post-COVID era from an urban point of view and move forward on how Smart City networks should work for greater data sharing in the event of outbreaks or disasters, leading to better globalization, understanding and management of the data.
In this work, the link between the concepts of Smart Cities, Machine Learning techniques and their applicability was exhaustively analyzed. As far as we know, there are no previous works on this. A comprehensive study of the relationship between Smart Cities and the applicability of machine learning (ML) techniques has been carried out with the aim of optimizing sustainability. With this intention, Machine Learning models were studied from the point of view of models, techniques and applications. The areas and dimensions of sustainability were analyzed, and the SDGs were discussed.
A model (EARLY) was proposed that allows us to address these challenges in the future. These issues encompass environmental and health issues, taking on special relevance in the post-COVID era. An inclusive perspective on applicability, sustainability scopes and dimensions, SDGs, tools, data types and machine learning techniques was provided. Finally, a case study applied to an Andalusian city was presented.
It was detected that there is a disconnect between organizations when conducting research, which, given the current situation of difficulty in obtaining data, opens the possibility for an improvement in research with the connection between various institutions that can provide analysis techniques in the data provided by others, creating clusters of work and specialization.
On the basis of the bibliographic analysis carried out, it was concluded that the main areas of work on this subject according to the WOS categorization are "engineering, electrical & electronic", "computer science, information" and "telecommunications". There were other clusters that were not very connected and with a lower density of publications. After performing the temporal analysis of these small clusters, it was concluded that the categories "environmental sciences" and "green & sustainable science & technology" (related to sustainability) were those found in recent years (2018-2019) that were emerging and cross-cutting themes in the field of Smart Cities.
On the other hand, many of the documents studied were discarded, since there are few papers that contain an applicable theme. Additionally, some have little to do with Smart Cities.
The Pareto analysis carried out detected five main clusters in the topics addressed: transport, energy, water and air, location and social issues, highlighting the main areas of research interest at this time.
The importance of having quality data is confirmed, many of the papers work with public data, data already published, simulated or without data. It is crucial that the data is available, and it is important that future Smart City projects take this into account as a priority in the future.
This work is part of a broader line of research, regarding optimization under sustainability criteria in Smart Cities through Machine Learning technologies. We trust that this work will serve as a basis for future research related to this topic. Specifically, this work is part of a broader line of work that aims to promote the SDGs and the design and development of products (especially focused, although not exclusively) on Smart Cities for the promotion of sustainability. One of the main future tasks, on which we are working, may be the application to the design of sustainable products, possibly framed in Smart Cities, using Machine Learning techniques.