Application of Quantitative Methods to Identify Analogous Cities: A Search for Relevant Experiences in the Development of Smart Cities for Implementation in Kazakhstan

Urdabayev, Marat; Digel, Ivan; Kireyeva, Anel

doi:10.3390/smartcities8030092

Open AccessArticle

Application of Quantitative Methods to Identify Analogous Cities: A Search for Relevant Experiences in the Development of Smart Cities for Implementation in Kazakhstan

by

Marat Urdabayev

¹

,

Ivan Digel

²

and

Anel Kireyeva

^3,*

¹

Department of Economics, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan

²

Integrated Energy Systems, FB 16 Electrical Engineering/Computer Science, University of Kassel, 34121 Kassel, Germany

³

Department of Information and Implementation of Research Results, Institute of Economics of the Ministry of Science and Higher Education of RK, Almaty 050010, Kazakhstan

^*

Author to whom correspondence should be addressed.

Smart Cities 2025, 8(3), 92; https://doi.org/10.3390/smartcities8030092

Submission received: 7 April 2025 / Revised: 14 May 2025 / Accepted: 27 May 2025 / Published: 29 May 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

Highlights

What are the main findings?

A three-method quantitative approach (PCA, cluster analysis, t-SNE) reliably pinpoints the most similar smart cities to Almaty and Astana.
Ottawa and Denver emerge as the closest matches, with Ankara and Phoenix also forming a second tier of comparable cities.

What is the implication of the main finding?

Focusing on these analogous cities can streamline the transfer of their successful smart city strategies to Almaty and Astana.
The combined use of these statistical methods can serve as a replicable framework for other cities seeking evidence-based reference points for smart city development.

Abstract

Rapid urban growth and the spread of the concept of smart cities force an increasing need to understand how cities become “smart” and apply their experience where it will best take root. Understanding which experience will be most suitable is not a trivial task and requires labor-intensive analysis. This study aims to identify smart cities that are most similar to Almaty and Astana in terms of key indicators by applying quantitative methods. Using a sample of smart cities, this paper successively employs three methods—principal component analysis, hierarchical cluster analysis, and t-distributed stochastic neighbor embedding. The results showed that Denver and Ottawa are the closest to Almaty and Astana, followed by Ankara and Phoenix. The proposed methodology allowed us to assess the similarity of urban development conditions, with an assumption that similar development conditions determine approaches to the development of smart cities, and thus the relevance of experiences from other smart cities worldwide could be applied to Almaty and Astana. This approach is intended to contribute to the effectiveness of transferring advanced solutions of smart city development to the context of Kazakhstan. The obtained conclusions can be used to form recommendations for the development strategy of Almaty and Astana, as well as other cities facing similar challenges.

Keywords:

smart cities; quantitative methods; cluster analysis; PCA; t-SNE; city similarity; Almaty; Astana

1. Introduction

Modern cities are rapidly changing under the influence of global urbanization, digitalization, and social transformation processes. One of the most promising areas of development is the concept of smart cities, where emphasis is placed on the efficient use of resources, the introduction of innovative technologies, and the improvement of residents’ quality of life [1]. However, despite the high popularity of this concept, there is no unified interpretation of a smart city in the scientific environment: different schools and directions distinguish technological, social, environmental, and organizational aspects, relying on different assessment methods and sets of criteria [2,3]. Nevertheless, common to most approaches is that the focus is on people and their well-being, and technological and administrative infrastructures are considered tools to achieve this goal [4,5].

In the context of Kazakhstan, interest in smart cities has been growing in recent years, especially concerning the two largest cities—Almaty and Astana [6,7]. Both cities have high growth rates and developed infrastructure and are interested in the application of digital technologies to solve urban problems [8,9]. At the same time, they do not have access to the sea, which affects logistics and foreign economic opportunities [10,11], and also forms the specificity of spatial development and tourist attractiveness [12]. Another factor distinguishing them from many other megacities is the continental climate and peculiarities of intra-urban migration [13,14]. All these circumstances create the need to search for reference smart cities whose development conditions are most comparable to the realities of Kazakhstan’s megacities.

The development of smart cities in Kazakhstan represents a strategic priority with multifaceted practical significance that extends beyond technological modernization. Economically, smart city initiatives are critical for diversifying Kazakhstan’s resource-dependent economy, with Almaty and Astana serving as innovation hubs that can drive the growth of knowledge-intensive sectors. OECD data indicate that Kazakhstan’s economy remains heavily concentrated in extractive industries (over 29% of the GDP) [15], making urban technology ecosystems essential for economic diversification. Smart urban management systems can significantly enhance resource utilization efficiency in these cities, where utility costs consume approximately 5–10% of average household budgets [16]. Socially, smart solutions address pressing challenges in Kazakhstan’s rapidly growing urban centers, where internal migration has increased urban populations by 16–23% in the past decade [17], straining infrastructure and public services. Digital public service delivery is particularly crucial, given Kazakhstan’s vast territory and dispersed population centers, with potential to reduce administrative burdens that currently cost businesses an estimated 42 h per month in bureaucratic procedures [15]. From an environmental perspective, smart technologies can mitigate the specific challenges facing Kazakhstan’s cities—high per capita carbon emissions (about 2.2 times the EU average) [18], water scarcity issues affecting sustainable development [19], and severe air quality problems (with Almaty regularly exceeding WHO pollution standards by 15–17 times during winter months) [20]. Furthermore, these developments align with Kazakhstan’s national strategic programs, including “Digital Kazakhstan 2025” and the “Kazakhstan-2050” strategy, which explicitly prioritize smart urban development as a vehicle for sustainable growth, improved governance, and enhanced quality of life [21,22]. By identifying analogous smart cities for Kazakhstan’s urban centers, this research provides a foundation for evidence-based policy transfer that can accelerate progress across these economic, social, and environmental dimensions while avoiding costly trial-and-error approaches to urban innovation.

In the world, there are numerous projects that focus on smart cities, reflecting different aspects of “smart” solutions [23,24,25]. They cover such areas as the digitalization of public services, the development of public transport, environmental monitoring, intelligent energy management systems, and many others [26,27]. However, the successful transfer of such solutions directly depends on the similarity of the transfer conditions, since even the most effective cases can be irrelevant in the presence of significant differences in the size and configuration of the urban area, population dynamics and structure, or natural and climatic features [28,29].

The question arises of how to understand which successful smart cities’ experiences are suitable for others that are just trying to become smart cities. There are several obstacles in the way of answering this question. The first question is how to define smart cities. There are many definitions and discussions around them, and it is necessary to choose one approach. Second, there is a need to define criteria for analyzing similarities. How can we understand that two cities have similar development conditions that determined their strategies for becoming smart cities? How can we separate indicators that reflect these conditions from indicators that demonstrate the results of implementing smart city strategies? Third, how can these indicators be analyzed objectively? Qualitative analysis of a relatively large sample of cities, although capable of producing good results, will be limited by a small sample (it is difficult to process a large one) and will not be able to avoid researcher bias in determining similarities.

To answer the first question, it was decided to take a sample from the International Institute for Management Development. Their interpretation is multidimensional—they take into account various directions of city development as smart city development. To answer the second question, a set of indicators relevant for the development of smart cities was defined, which was then divided into two groups: resulting indicators that reflect the implementation of smart city strategies and defining indicators that set the conditions for the formation of these strategies. The similarity of cities will be determined on the basis of the latter. To answer the third question, it was decided to resort to quantitative analysis, since it allows us to simultaneously process many variables in a large sample. And to compensate for the problems of individual quantitative methods, it was decided to use several such methods, mutually compensating for each other’s weaknesses.

This approach is intended to serve as a first step in analyzing the experience of smart cities with a view to transferring it to other cities. This step would allow us to select, from a large sample of cities, the most promising ones for a deep, comprehensive analysis of their strategies.

The research question of this article is the following: which smart cities are most similar to Almaty and Astana in terms of the conditions of their development?

To answer this question, a sample of smart cities was formed, along with a set of variables defining the conditions of their development, to which the following three statistical methods were then applied:

Principal component analysis (PCA), which captures the overall structure of multivariate data.
Hierarchical cluster analysis (Ward’s method), which helps to group cities into clusters based on similarities in selected attributes and thus form logical groups for further detailed study.
t-distributed stochastic neighbor embedding (t-SNE), which provides visualization of nonlinear relationships and reveals local “clusters” of cities that may not be obvious when using purely linear algorithms.

The use of three methods at once allows for revealing patterns in the distribution of cities more reliably: if the results of different algorithms converge, it strengthens confidence that the selected groups have many standard features. This approach is intended to facilitate the selection of similar cities for experience analysis, save the effort spent on “manually” analyzing each smart city, and provide a solid basis for determining the similarities between cities.

This article consists of an introduction, literature review, materials and methods, results, discussion, and conclusion.

2. Literature Review

The concept of smart cities is central to modern urban planning. Despite the term’s widespread popularity, there is still no single universally accepted definition, and approaches to assessing the success of smart cities vary depending on the academic school and research paradigm [1]. Researchers consider smart cities through the prism of technology, sustainable development, resource management, and citizen engagement, emphasizing the need for a human-centered approach to improve the population’s quality of life [2].

The literature identifies several main approaches to defining smart cities: technological, sustainable, component, and socio-technical approaches [3]. The technological approach focuses on application of the Internet of Things (IoT), real-time data analysis, and other advanced technologies [26]. The sustainable approach emphasizes the rational use of resources and environmental responsibility, and the component approach involves analysis of interconnected domains of the urban environment (transportation, energy, economy, management, buildings, citizens) and their integration into a single ecosystem [27]. The sociotechnical approach, in turn, allows social factors to be considered, including the needs of different groups of citizens and their interaction with urban services [30].

Based on these different approaches to understanding smart cities, researchers also identify key criteria for the successful development of smart cities. While the approaches set the perspective of consideration and the methodological basis for developing smart cities, the criteria allow assessment of how effectively implementing these approaches in practice leads to the desired results [31]. One of the most important criteria for the success of smart cities is quality of life (QoL), where citizen satisfaction is directly related to social connections, environmental and material well-being, and the level of digital literacy [4]. In addition, citizen engagement, governance efficiency, information and communications technology (ICT) utilization, and intellectual capital formation play key roles [32]. Active public participation in decision-making and transparent governance mechanisms are considered the most crucial components in the structure of smart cities [5].

Studies examining city-level cases in Zurich and Canberra emphasize the critical role of public participation in the successful development of smart cities. These studies identify significant contextual urban factors, including governance models and demographic characteristics, which strongly affect public engagement strategies and the effectiveness of smart city initiatives [28]. Comparative research among Zurich, Oslo, and Copenhagen also illustrates that advanced technological deployments alone do not guarantee increased social equity or heightened civic involvement. Instead, successful smart city strategies require the integration of inclusive, locally adapted social frameworks alongside technological innovations [29]. Additionally, fostering digital literacy across urban communities has been recognized as a vital component for the effective implementation and sustained success of smart city initiatives [33]. Strategic investment in digital skills and comprehensive educational programs can significantly enhance citizen participation and the utilization of smart technologies in urban areas [34]. Consideration of successful practices demonstrates how specific criteria are reflected in real smart city projects. For example, Barcelona implements innovative solutions aimed at increasing social inclusion [23]; Amsterdam uses a business-oriented approach [24]; and Dubai and Abu Dhabi emphasize technological optimism and leadership initiatives [24]. In addition, cities such as Warsaw, Gdynia, Copenhagen, and Malmö confirm that the degree of citizen involvement and governance features significantly influence the success of projects [25].

In Kazakhstan, the study of smart cities has received considerable attention, as evidenced by the growing volume of works devoted to digital transformation, sustainable development of urban environment, and integration of modern technologies in managing municipalities. The main topics of these studies are analyzing the potential of cities for introducing innovative technologies, the state of digital infrastructure, and the impact of public policy on the development of smart cities [35,36,37,38]. The results of existing studies show that Almaty and Astana have the most tremendous potential for the development of smart cities in Kazakhstan [6,39]. Another study applying quantitative methods demonstrates that these cities have the highest potential for transformation into smart cities, unlike other municipalities that require additional investments in digital infrastructure [7]. Regional analysis confirms their leadership, revealing disparities in innovation capacity among different areas [8,9]. Despite existing obstacles, such as lack of qualified personnel and lack of funding, successful projects implemented in these cities (e.g., implementation of smart grids in Astana) demonstrate that overcoming these difficulties is possible [35,36]. Moreover, studies on the development potential of smart cities in Central Asia emphasize that Almaty and Astana are leaders in introducing sustainable management technologies, which serve as a catalyst for long-term growth focused on effective digital transformation and rational use of resources [37,38]. It is for Almaty and Astana, first and foremost, that it makes sense to conduct an analysis of foreign experience in the development of smart cities. Given their socio-economic and geographical specifics, it is important to find examples of smart cities that are maximally similar in terms of key indicators affecting the development and manageability of urban space.

In the context of the above, when there is a need to analyze and adapt foreign experience of “smart cities”, especially for such specific conditions as those in Kazakhstan, quantitative approaches begin to play a key role. Indeed, the smart city concept is attracting increasing attention, but methodological approaches to systematic comparison and classification of cities—which is necessary to identify relevant analogs—vary significantly [40,41]. Despite this diversity, in recent years researchers have increasingly applied quantitative methods to more objectively categorize, evaluate, and identify peer cities [6,40,42], moving away from exclusively qualitative descriptions, especially when working with different emphases and methodological frameworks. The inherent complexity and saturation of smart cities with multidimensional data make the application of such quantitative methods not just desirable, but often a prerequisite, for their in-depth analysis and objective evaluation [42].

The literature presents a number of studies that use quantitative methods to analyze the level of “smartness” of cities and create urban typologies [42,43,44]. Cluster analysis methods such as k-means and hierarchical clustering are often applied. For example, the k-means method has been used in combination with principal component analysis (PCA) to classify Chinese smart cities [42] and to evaluate the information security of smart cities out of a sample of 38 cities [40]. Hierarchical clustering, including Ward’s method, was used to group the cities of Kazakhstan according to the potential of smart city development [41].

PCA is often used as a preliminary step before clustering to reduce the dimensionality of multidimensional data and identify key factors that determine differences between cities. This approach, where PCA is applied first, followed by cluster analysis (k-means or hierarchical), is demonstrated in a number of works [44,45,46]. This allows the formation of clusters based on the fundamental underlying smartness dimensions, reducing noise and increasing the interpretability of the results [47]. Cities are grouped based on a wide range of indicators covering technological sophistication (e.g., ICT infrastructure), governance (e.g., public investment), economy, and environment [6,42].

Other studies focus on the construction of composite indices of smart cities. Such indices integrate various indicators into a single value for ranking using a variety of normalization, weighting (e.g., equal weights approach, I-distance statistical approaches, expert judgment), and KPI aggregation methods [48,49].

Machine learning techniques are also finding application. Algorithms such as random forest and support vector machines (SVM) are used to classify cities according to their level of development or potential as smart cities, as well as to predict their level of smartness [6,43]. Ensemble methods like random forest often show high accuracy due to their ability to handle multidimensional data and complex interactions between features [43,50].

To visualize complex, nonlinear relationships in multivariate urban data, researchers are turning to techniques such as t-distributed stochastic neighbor embedding (t-SNE). This method complements linear approaches by identifying local structures and clusters and is particularly important for data lying on several different but connected low-dimensional manifolds [51,52,53]. t-SNE focuses on preserving the local neighborhood structure of data points, which allows identification of subtle groupings or manifolds corresponding to different types of cities or urban settings [54].

Despite the valuable contributions of these works, several key gaps and limitations can be identified in the existing quantitative literature on city classification and smartness analysis:

-: Focus on outcome measures: Most analyses focus on assessing the outcomes of smart technology adoption or the level of smartness achieved (e.g., level of technological adoption, governance performance indicators), rather than the underlying development conditions that shape the preconditions and determine the specificity of smart city strategies. This makes it difficult to identify truly comparable cities for experience transfer, especially if their current level of “smartness” is different.
-: Limited use of integrated methodologies: While individual quantitative methods are widely used, few studies use a combination of several complementary methods to validate results and compensate for the limitations of individual approaches.
-: Lack of attention to specific contexts: Most studies focus on large, often metropolitan or seaside, cities in economically developed regions of Europe, North America, and East Asia. Cities located inland, especially in developing countries and regions with low population densities and specific geographical contexts, remain understudied.
-: Weak links to practical recommendations for transfer of experience: Often, studies end up constructing rankings or typologies without offering clear mechanisms or criteria for identifying the most relevant peer cities for targeted exchange of specific policies and strategies.

This study aims to address these gaps and contributes to the existing literature as follows:

-: Focus on fundamental development conditions: A key original aspect of our approach lies in shifting the focus from the resulting indicators of “smartness” to the analysis of defining variables. We focus on the underlying socio-demographic, geographic, and climatic characteristics that form the baseline conditions for smart city development. This approach allows us to identify peer cities based on the fundamental preconditions for their development, which is necessary for correct and effective transfer of experience, since these conditions determine the applicability of certain smart city strategies and the associated infrastructure costs. This avoids distortions associated with comparing cities that have already achieved different levels of “smartness” or cities with fundamentally different starting opportunities.
-: Application of a comprehensive three-stage quantitative methodology: We sequentially use three complementary quantitative methods—principal component analysis (PCA), hierarchical cluster analysis (Ward’s method), and t-distributed stochastic neighbor embedding (t-SNE). PCA is used to reduce dimensionality and identify global patterns in the data; hierarchical cluster analysis is used to group cities in detail and identify the most similar pairs; t-SNE is used to visualize nonlinear relationships and confirm local structures. The use of such a combination of methods allows for more reliable, robust, and comprehensively valid results than using a single method, increasing confidence in the objectivity of the identified similarities.
-: Adaptation to the specific context: The methodology is tested on a specific task—search for relevant foreign experience for the development of “smart cities” in Kazakhstan (Almaty and Astana), which are continental cities with specific development conditions. In doing so, we control for geographical factors such as landlockedness, which makes our approach particularly valuable for regions with similar constraints.
-: Creating a basis for targeted knowledge and policy transfer: By identifying peer cities based on similarities in fundamental conditions, our study lays the groundwork for a more effective and targeted exchange of experiences and specific strategies for building smart cities. This enables a shift from general rankings to a practice-oriented search for relevant solutions.

Thus, this study not only offers a methodological tool for identifying peer cities, but also contributes to the development of approaches to comparative urban analysis focused on the practical needs of shaping smart city development strategies in a variety of contexts.

3. Materials and Methods

This methodology includes analysis of international smart city rankings, pre-filtering of selected megacities by a geographic criterion, and in-depth analysis using indicators reflecting characteristics most influential for the development of smart city strategies. Data sources, city selection stages, and in-depth analysis methods are described in detail below. The IMD Smart City Index 2024 (hereinafter—IMD Index), which annually generates a list of leading smart cities worldwide, was used as a starting point. Initially, we considered all 142 cities included in the index [55]. To increase comparability with the development conditions of Almaty and Astana, we applied a multi-stage procedure for selecting peer cities.

In the first stage, cities with a Human Development Index (HDI) lower than Almaty and Astana were filtered out, as a significant difference in the level of socio-economic development would make it difficult to apply their experience in the Kazakhstani context [56,57].

The second filtering step was based on access to the world’s oceans. According to several studies, access to the ocean provides a range of economic benefits, including access to international maritime trade, increased tourist attraction, port infrastructure development, etc. [10,11,12]. Consequently, the use of the experience of cities known to have significantly different conditions of development will not be relevant to the “continental” cities of Astana and Almaty, as they cannot rely on benefits provided by the access to sea routes. Therefore, cities with direct access to the ocean were excluded from the list. Additionally, during the preliminary data analysis, it became clear that some cities have extremely high values of several indicators, making them “statistical outliers” and distorting the results, even after different transformations. In particular, Medina, Mecca, Beijing, and Riyadh stand out significantly in terms of their population size, growth rates, and precipitation, which are not typical for other cities under study. In order to minimize the risk of anomalies and to ensure more uniform comparison conditions, it was decided to exclude these cities from further analysis.

Of the 142 originally selected cities, 32 remained (see Appendix A), which are more comparable in terms of development conditions to Kazakhstan’s megacities.

As mentioned earlier, the indicators related to the development of smart cities were divided into resulting and defining ones. The resulting ones included those that reflect implementation of the strategies of these smart cities. They include both economic (for example, GRP per capita) and technological (access to the Internet, its speed, availability of digital services, etc.) indicators. Comparison of cities by resulting variables allows us to assess how “smart” a city is, but does not provide an understanding of the initial conditions in the city before and during its development into a smart city. An attempt to apply quantitative methods to these indicators leads to grouping smart cities separately from “non-smart” ones and does not provide any useful information. On the other hand, if it were possible to define a set of indicators that are independent of implementation of the smart city strategy, but at the same time influence its formation to a significant degree, then it would be possible to analyze all cities together and identify their similarities, and thereby find a set of those smart cities whose experience is most similar to the “non-smart” ones and can be more effectively transferred to them. In the case of this article, these indicators were the following:

1. The area of a city, defined as the territory within the city’s administrative boundaries, is an important factor affecting its development [58]. A larger city area provides more opportunities for infrastructure development, the location of industrial plants and residential areas, and the creation of green areas and recreational spaces [59]. Rosenthal and Strange (2004) found that doubling the area of a city while maintaining the same population density leads to an increase in labor productivity by 2–10% [60]. Duranton and Puga (2013) emphasize that a city’s area affects housing affordability and transportation infrastructure, which in turn affect the city’s economic growth [61].

2. The population of a city, defined as the number of residents living within the administrative boundaries of the city, is an essential determinant of its potential [62]. A large population creates demand for goods and services, stimulates business development, and attracts investments [63]. Organisation for Economic Co-operation and Development (OECD) studies have shown that, for every doubling of a city’s population, the level of productivity increases by 2–5% due to better organization of work, education, entrepreneurship, and dissemination of ideas [64]. According to the World Bank, the concentration of the population in cities promotes economic growth through agglomeration effects that increase productivity and create jobs [65].

3. An agglomeration area includes a city and its suburbs, which are functionally linked [66]. A large agglomeration area promotes transport infrastructure development, expands the labor market, and creates new economic opportunities [67]. Urban agglomerations become important spatial carriers of economic development [68]. Fu et al. (2022) empirically analyzed the impact of urban agglomeration on urban economic development in China, emphasizing the significance of agglomeration effects and their spatial scale [69].

4. The agglomeration population includes the combined population of a city and its suburbs [70]. According to a study [71], a large agglomeration population not only creates a large-scale market for business, attracts investment, and promotes infrastructure development, but also has an impact on economic growth and sustainable economic development. Agglomeration reduces transportation costs, develops local markets, and creates a larger and more specialized workforce [72].

5. Average annual population growth rate in percent. High population growth rates can indicate economic growth and the attractiveness of a city for migrants [73]. The United Nations Department of Economic and Social Affairs (UN DESA) emphasizes that rapid population growth increases the need for investment in social and economic infrastructure [74]. Population growth can both stimulate economic growth and create resource supply problems [75]. Kremer (1993) argue that population growth promotes technological progress and productivity [76]. It also increases tax revenues, which allows cities to invest in infrastructure, transportation, education, and social programs [77].

6. City density is an important factor in sustainable urban development. High density promotes economic growth through agglomeration effects, enhancing innovation and creating specialized labor markets [78]. It also contributes to improving the energy efficiency of buildings and reducing carbon emissions through compact development and public transportation [79]. Dense cities demonstrate more efficient use of resources: centralized water supply, heating, and other infrastructures work better with high population concentrations [80]. In addition, density promotes the development of social infrastructure and public spaces, improving access to schools, health facilities, and recreational areas [81]. Indicators of successful development show that, if well managed, high density can be combined with a high quality of life [82]. However, it also comes with challenges: overcrowding, rising housing prices, transportation congestion, and social tensions [83]. The key to successful density is good urban planning and integration with infrastructure, public spaces, and affordable housing [84].

7. Agglomeration density. High agglomeration density promotes public transportation, reduces transportation costs, and improves infrastructure utilization [85]. A study of the Guangzhou agglomeration showed that agglomeration population density plays a significant role in economic growth, especially in areas with transportation infrastructure, which enhances the effect of density on neighboring areas [86]. High urban agglomeration density has increased household income, creating significant economic benefits in Africa. The density elasticity of income was 0.6, emphasizing the key role of density in transforming the economy [87]. A study of the Yangtze River Delta showed that increasing population density reduces CO₂ emissions, confirming the benefits of agglomeration in reducing environmental costs and increasing resource efficiency [88].

8. Physical topography has a significant impact on urban development, determining opportunities and constraints of a city’s infrastructure and resource management strategies [89]. The importance of considering topographical features such as minimum and maximum elevation and height difference is emphasized in the context of sustainable urban development [90]. Infrastructure planning is closely related to topography, which affects the economy and efficiency of projects, especially in transportation infrastructure and water management [91]. Cities at low altitudes face flood risks and require specialized engineering solutions for water protection and management, as shown in Guangzhou and South Asian cities [92]. High maximum altitudes increase construction complexity and costs, especially for water supply and communication systems, such as in mountainous cities in China [93]. In smart cities, topography also influences sensor placement, network architecture, and infrastructure operation, making geospatial technologies an important part of urban planning [94].

The performance of city topography will be determined by the following characteristics:

-: Lowest point (m)—the minimum absolute height within the administrative boundaries of the city;
-: Highest point (m)—the maximum absolute height within the administrative boundaries of the city;
-: The ratio of the difference between the highest and the lowest point to the city area is an indicator that characterizes the intensity of height change per unit area of the urban area.

9. Climatic conditions play a key role in the formation and development of cities, having a significant impact on urban planning, architectural design, energy consumption, quality of life, and urban infrastructure [95]. Mean annual temperature, annual precipitation, and temperature amplitude act as the most important climatic indicators that determine the need for urban adaptation to climate change [96,97]. Rising temperatures increase the demand for air conditioning and reduce the need for heating, while exacerbating health problems and the availability of urban services due to the urban heat island effect [98]. Intense precipitation events require infrastructure upgrades, including roads, bridges, and water and electricity systems to minimize damage from extreme weather events [99]. Thus, integrating climate factors into urban development strategies is a prerequisite for sustainability and improving the quality of life in cities [100]. In this study, the climatic parameters of the city will be determined by the following characteristics:

-: Average annual temperature (°C)—the average value of air temperature for the year within the urban area;
-: Average annual precipitation (mm)—total annual precipitation within the urban area;
-: Temperature difference (°C)—the difference between the average temperature of the warmest and the coldest month of the year within the urban area.

The selection of variables and exclusion criteria for this study was guided by their differential impact on urban development conditions and smart city strategies. It is important to clarify why access to seaports was used as an exclusion criterion, while others (such as topography and climate) were included as analytical variables. Access to a seaport is a binary trait that puts a city in a different economic class: maritime logistics, customs zones, and coastal spatial patterns create a growth path unavailable to landlocked agglomerations [101]. Comparing such cities with Almaty or Astana would therefore import a structural advantage we cannot control for, so port cities were excluded. Topography and climate, by contrast, vary along continuous scales that every city must manage. Their effects are incremental—steeper slopes raise infrastructure costs; colder winters increase energy demand—but not categorical. Retaining these variables lets the model show how otherwise similar cities adapt smart-city solutions to local physical constraints, such as calibrating sensor networks for snow load or optimizing water reuse in arid zones [102,103]. Removing seaport access thus eliminates an incomparable advantage, while keeping topography and climate as criteria preserves the natural heterogeneity shared by all landlocked cities in our sample and remains directly relevant to Kazakhstan’s continental context.

All variables with outliers were log-transformed to remove the outliers. The variables for the highest point and population growth were not transformed, because they had no outliers.

Limiting our analysis to key socio-demographic and geographic indicators is an attempt to identify precisely those factors that influence decision-making and smart city strategy formation but are not (or only in a very limited way are) influenced by these strategies [104]. This will create a bridge between the conditions in already established smart cities and those that are only planning to become smart cities. The chosen geographic, spatial, and demographic variables that form the initial conditions of a city’s development significantly influence, but are not the outcome of, smart technology adoption opportunities. Our two-stage methodological approach—first determining similarities in the baseline conditions of development (socio-demographic and geographic), then analyzing the results of the implementation of smart technologies—allows us to avoid the methodological paradox in which cities that have already successfully implemented smart city strategies would be recognized as “different” from Almaty and Astana precisely because of the differences that we seek to overcome through studying their experience.

The indicators with their values used for analysis in this study are listed in Appendix A. Demographic and spatial data were obtained from the City Population database [105], climate were sourced from Climate-Data [106], and topographical characteristics were collected from Topographic-map [107].

The analysis was carried out using R software with a set of libraries. The complete algorithm used for the calculations is given in Appendix B. Three quantitative methods were applied to the sample to identify the most similar cities to Astana and Almaty. The first method is principal component analysis (PCA). PCA was employed as a dimensionality reduction technique to explore city data structure. Let X represent the n×p data matrix, where n is the number of observations and p is the number of variables. Each variable is standardized such that the elements of X are expressed in standard units with mean zero and standard deviation one.

The sample covariance matrix Σ in that case is given by

Σ = \frac{1}{n - 1} X^{T} X

(1)

PCA proceeds by solving the eigenvalue problem for the covariance matrix:

Σ v_{j} = λ_{j} v_{j}

(2)

where λ_j is the j-th eigenvalue, and v_j is the corresponding eigenvector. Eigenvalues are non-negative, indicating the variance explained by each principal component. Eigenvectors represent the directions of the principal components. Eigenvalues are arranged in descending order λ1 > λ2 > … > λp > 0. The first k eigenvectors corresponding to the largest eigenvalues define the k-dimensional subspace capturing the most significant variance in the data. In our case, we used the first two dimensions to be able to map them conveniently.

Mapping the cities in the space defined by the leading principal components enables assessment of their relative positions based on underlying patterns in their development characteristics. Cities located in proximity within this reduced space exhibit structural similarities in their development conditions, while those positioned further apart reflect more divergent profiles. This “spatial” representation of cities assists in identifying groups with comparable attributes.

The next method was agglomerative cluster analysis. It is widely used in social and economic research and allows to combine objects (cities) into groups based on similarity in terms of some indicators [108]. Agglomerative clustering represents a hierarchical approach to clustering, wherein data points exhibiting similarity are successively grouped. The procedure initiates with each data point constituting an individual cluster. Subsequently, clusters are iteratively merged according to a similarity measure until a single cluster encompasses all points. This approach relies on a proximity measure that quantifies the similarity or dissimilarity between clusters. The similarity can be determined using various distance metrics, including Euclidean distance or the correlation coefficient. Ward introduced a merging criterion in 1963 [109], which remains widely applied in agglomerative clustering. His method aims to minimize the within-cluster variance by merging the pair of clusters that yield the smallest increase in the sum of squared deviations from the mean of the newly formed cluster.

The function D(X,Y) that calculates the distance between clusters measures the increase in the “sum of squared errors” (SSE) after merging two clusters.

D (X, Y) = E S S (X Y) - [E S S (X) + E S S (Y)]

(3)

where ESS(.) takes the form

E S S (X) = \sum_{i = 1}^{N_{X}} {|x_{i} - \frac{1}{N_{x}} \sum_{j = 1}^{N_{X}} x_{j}|}^{2}

(4)

where N_X is the number of elements in the cluster, and x_i and x_j are the cluster elements. The method’s goal is to choose such a sequence of clustering steps that minimizes D(X,Y) (increase in SSE at each step).

The final method used was t-distributed stochastic neighbor embedding (t-SNE). It is a nonlinear dimensionality reduction technique designed primarily to visualize high-dimensional data. It aims to map the data into a lower-dimensional space, typically two or three dimensions, while preserving the local structure of the data by maintaining the relative similarities between neighboring points. Unlike linear techniques such as PCA, t-SNE is particularly well-suited for datasets where the structure is nonlinear or when the goal is to reveal clusters that are not necessarily aligned with directions of maximum variance.

The algorithm models pairwise similarities between data points in the original high-dimensional space using conditional probabilities based on a Gaussian distribution. It constructs a similar probability distribution in the lower-dimensional space but employs Student’s t-distribution with a single degree of freedom to measure pairwise similarities. The objective is to minimize the Kullback–Leibler divergence between the two distributions, ensuring that points that are close in high-dimensional space are also mapped closely in lower-dimensional space, while distant points are allowed to remain apart.

All three methods have limitations, and using them together allows us to compensate for them to some extent, resulting in a more stable picture. PCA is limited by its assumption of linear relationships among variables. It prioritizes directions of maximum variance, which may not always correspond to meaningful group structures in the data. Furthermore, the method may fail to reveal complex, nonlinear patterns. However, PCA excels at preserving global relationships and providing a continuous representation of similarities among observations. Its goal is thus to identify global patterns of similarity.

Ward’s hierarchical clustering, in contrast, is sensitive to noise and outliers, as it progressively merges clusters in a deterministic manner without the ability to correct initial misclassifications. Since the method seeks to minimize within-cluster variance at each step, early-stage merging decisions strongly influence the final structure. Despite these limitations, Ward’s method offers an explicit cluster structure, making it suitable for identifying distinct groups within the data. The goal of cluster analysis is to identify detailed similarities to explicitly group cities based on the outcome.

t-SNE overcomes PCA’s limitation of assuming linear relationships, as it preserves local neighborhood structures and can reveal nonlinear manifolds in the data. However, t-SNE struggles with maintaining global relationships, meaning that distant points in the original space may not be mapped proportionally in the lower-dimensional representation. The method is also computationally expensive, requires tuning of hyperparameters, and does not provide a formal clustering structure. The purpose of this method is to control for the other two missing significant nonlinear relationships between cities.

Thus, the three selected methods should cover the basic needs of similarity determination—searching for global patterns, detailed group-wise comparison, and control of nonlinear relationships. If all three methods give similar results, we can say that the cities are indeed similar; if different methods give different results, then it is necessary to reconsider the approach, be it a set of variables or a set of methods. The motivation for using the three methods is to double-check and correct the results, but deeper synergy is not expected.

In the case of this paper, if the analysis reveals cities that are consistently similar to Almaty and Astana, they can be confidently used for further analysis of their experience and the formation of recommendations for the development of Almaty and Astana as smart cities.

4. Results

The results of the PCA allowed us to draw initial conclusions about similarities among the cities (see Figure 1). It is easy to see that Almaty and Astana (marked red) are close to each other, i.e., one’s experience may be suitable for the other in the future. Based on global patterns, the closest ones are Ottawa, Zaragoza, and Denver (marked green). One can also discuss Phoenix and Ankara as similar to Astana, and Sofia, Prague, Zagreb, and Vilnius as similar to Almaty (marked blue). They form a “second tier” of similarities.

According to the results, the first two principal components explain 58% of the sample variance, i.e., they cover the dynamics somewhat, but might need further detailed analysis. Adding a third component improves the indicator to 70% but also complicates the visualization. One can also note the empty space around these cities, which can be interpreted as a kind of “cluster”. Detailed clustering, however, comes next.

After the main direction of the analysis was marked with the PCA, cluster analysis was conducted to confirm or reject the results of the PCA. The results of the cluster analysis somewhat clarify the OCA grouping, but do not contradict it. The closest city to Astana is Ottawa, and for Almaty is Denver. Ankara and Phoenix are also in the same cluster as the previous two. Similarities with other clusters are found at a fairly large distance. The cluster analysis does not show proximity to Zaragoza, which means that the PCA missed important differences between the cities of interest and Zaragoza.

Thus, two of the three approaches converge in terms of their results to a certain extent. The remaining method—t-SNE—was used to identify possible similarities of a nonlinear nature among cities, which would allow us to select additional data points for analysis. Also, if it turned out that the previously selected cities were very different, then it would be necessary to reconsider our approach to the analysis.

The results from the t-SNE are approximately the same as those of the two previous approaches. Once again, we can talk about the proximity of Ottawa, Denver, and Phoenix to Almaty and Astana. Proximity of Vilnius to the two cities of interest is observed, and Ankara, Madrid, Zaragoza, and Canberra also appear as “second-tier” cities. When comparing the PCA and t-SNE results, their assessment of “second-tier” cities differs somewhat, but the closest cities remain the same. This may indicate possible nonlinear similarities at longer distances or similarities between different combinations of variables, since these methods work differently.

The quantitative analysis results reveal that Denver (USA) demonstrates the greatest similarity with Almaty, according to all three methodological approaches. The PCA plots (Figure 1) position Denver and Almaty in close proximity along both principal component dimensions, while the hierarchical cluster analysis (Figure 2) places them in the same cluster, with minimal joining distance. This similarity is further confirmed by t-SNE visualization (Figure 3), which preserves local neighborhood structures while accounting for potential nonlinear relationships.

Similarly, our multi-method analysis identifies Ottawa (Canada) as the most analogous city to Astana. The PCA results position Ottawa and Astana in the same quadrant of the two-dimensional space, with the cluster analysis confirming this proximity by grouping them in the earliest merging branches of the dendrogram. The t-SNE visualization further substantiates this similarity by placing these cities as nearest neighbors even after accounting for potential nonlinear relationships in the data.

5. Discussion

Transferring one city’s experience to another’s conditions is always a non-trivial task, since the conditions of city development are so diverse that it is difficult to identify critical similarities that would allow us to confidently assert the applicability of one city’s experience in the context of another.

Existing approaches to such transfer usually come down to either the trial and error method or “manual” selection of practices from the world according to a chosen criterion. The use of quantitative statistical methods in this article is aimed at facilitating the selection of targets for analyzing experience from a different point of view: the general similarity of development characteristics.

It can be said unequivocally that such an approach allows us to process a larger number of “candidates” and identify those that have comparable geographic and socio-demographic conditions. Thoroughly analyzing each of the 142 cities and comparing them with Almaty and Astana “manually”, in theory, could provide a deeper understanding of the similarities and differences, but this task would require much more time than the selection of variables for quantitative analysis. In addition, the use of statistical methods allows us to identify patterns that might otherwise escape the attention of an observer.

The results of our three-method approach yield findings that both build upon and differ from previous quantitative studies on smart cities. Unlike works by Zhu et al. [42] and Cantuarias-Villessuzanne et al. [44], which clustered cities based on current “smartness” levels or sustainability outcomes, our approach identifies analogues based on fundamental development conditions. This methodological distinction directly addresses a key limitation we identified in the literature—focusing on outcome variables rather than defining conditions that shape smart city development.

The identification of Denver and Ottawa as the most appropriate analogues for Almaty and Astana challenges regional assumptions in smart city research. Previous studies have rarely connected continental North American cities with Central Asian contexts, despite the similar development conditions revealed by our analysis. This extends Urdabayev et al.’s [6] work on Kazakhstan’s cities by providing specific international reference points.

Our methodological triangulation represents an advancement over single-method studies like that by Han [41], which relied solely on k-means clustering. By confirming our findings through three complementary approaches, we demonstrate the robust nature of the identified similarities—something Da Silva Lopes et al. [53] highlighted as crucial for complex urban data analysis.

The observed similarities in spatial, demographic, and climatic characteristics between Denver/Almaty and Ottawa/Astana provide empirical support for Akande et al.’s [47] theoretical framework on the relationship between urban form and technological adaptability. The continental climate similarities suggest that smart solutions for infrastructure resilience in extreme temperature variations could be particularly transferable.

Unlike composite index approaches [48,49] that typically produce rankings without clear knowledge transfer pathways, our methodology creates direct connections to reference cities with concrete experiences potentially applicable to Kazakhstan’s urban centers. This addresses the “weak link to practical knowledge transfer” gap we identified in the literature, providing a foundation for targeted policy learning rather than generic benchmarking.

The most controversial point here is the selection of variables. To what extent does this set reflect the conditions of urban development, especially in cities on their way to becoming smart cities? There is no perfect answer to this question, but we tried to find as much evidence as possible of the relevance of each variable for the development of a smart city and conducted several rounds of variable selection. It would be interesting to enter into a discussion in general about which parameters reflect the conditions of urban development and how these parameters can be expressed and analyzed. The approach of this article suggested that the use of the most global, complex, and long-changing parameters can serve as some support for the conclusions of the analysis. Of course, the set of variables for quantitative analysis can be expanded further, but this will require additional arguments.

In addition, some variables were quite difficult to classify into one of two groups: resulting or defining. For example, where does the quality of infrastructure belong? It can be both a result of the implementation of a smart city strategy and a determining factor for this strategy.

A separate interesting group of defining parameters are cultural parameters. Cultural practices can certainly influence the priorities of a smart city strategy and determine attitudes towards technologies and urban innovations. Using cultural parameters, however, raises difficult questions. How can attitudes toward innovation be quantified? For example, how can the level of techno-optimism or techno-pessimism of a given city’s society be determined? Which of the many cultural characteristics of a given society are relevant for smart city development?

The next point of discussion was the choice of methods. Since the goal of the study was to determine similarities, the focus was on methods that allow one to determine different “distances” between data points according to some set of variables. Therefore, the PCA method arose first. It is (relatively) simple, has no stochastic components, and gives a good first impression of the data patterns. It does not take into account nonlinear relationships, and the 2D visualization somewhat hides useful information, so it would be unwise to limit ourselves to it. Factor analysis based on it does not fit the purpose, because we are not interested in latent factors. MANOVA, regression models, or canonical correlation analysis do not answer the question of similarities at all, and we are not interested in the interaction among the variables that we selected—we are interested in the similarity of data points on these variables.

Therefore, the next logical step was to find a method capable of comparing larger groups of data points with each other without losses of information. The best candidate for this was hierarchical cluster analysis, which decomposes the union of data points at all levels, starting from the bottom. In addition, it is as visual as PCA, but does not hide information due to dimensionality reduction. Finally, t-SNE emerged as an answer to the question—what if there are nonlinear relationships between data points that are not captured by the other two methods? Its added benefit is that it allows comparison of the image with the PCA results. There are risks in using t-SNE—the stochastic nature of the method implies some instability of the results, which can be partially compensated by selecting the right parameters. The values of the parameters were selected so that the result was relatively stable, i.e., no abrupt movements of the points relative to each other. On top of that, the values were based on the logic of the study. For example, for perplexity, a value of 10 (quite common) seemed too large for a dataset of 32 cities, and it did not allow us to obtain stable results. A value of five was chosen as the first that gives stable results, but also does not make the comparison too local. The number of iterations (max_iter) and learning rate (eta) were chosen so as to ensure stability of the outcomes. Selection of the number of iterations began with 100 in increments of 100, and the first suitable value was 1000. The learning rate started with 100 and was sufficient right away.

Using multiple methods increases the reliability of the results—the results from one approach can help validate or refute the results from another. Choosing methods with different strengths and weaknesses allows them to “cover up” each other’s weaknesses and increase the amount of useful information. For example, PCA and t-SNE showed proximity to Zaragoza, but more detailed cluster analysis indicated more noticeable differences, which reduced the priority of this city for further analysis.

At this point, the question arises—how to translate the results of the analysis into specific recommendations or strategies for Almaty and Astana? That is, the analysis showed the closest “neighbors” for each of the two cities as a whole, and this is already a contribution to the transfer of useful experience. However, is it possible to draw some additional knowledge from the obtained results?

On the one hand, yes. For example, one could study the loadings of the variables into the principal components in the PCA in more detail and try to interpret each group of variables in order to then, based on this interpretation, pay closer attention to the related smart city implementation experience. This idea has limitations. For example, Figure 4 shows the loadings of all variables into principal component 1. This component mainly contains area, terrain, and climate variables, and it would be reasonable to look for solutions related to these conditions in smart city strategies and translate them into recommendations for policy development. However, population size also has an important loading, and is not easily interpretable with other variables of PC1.

On the other hand, the approach of this work does not imply the development of recommendations for policy development. The aim of the work was to first identify variables important for the development of a smart city, which at the same time would not depend on the stage of development of the city as a smart city, and then analyze the similarity of cities as a whole, using a set of these variables and the discussed methods. Only then, based on the results of the quantitative analysis, is qualitative analysis is in order, which is assumed to reveal the most promising experience of the most similar cities in different directions.

This leads to the following discussion—how true is the assumption that similar urban development conditions create similar problems and lead to similar solutions, which can then be transferred to similar cities, so that if one of two similar cities becomes “smart”, the other can more easily use its experience, going down the same road? Confirmation or refutation of this assumption will occur at the next stage—qualitative analysis of the selected cities and the formation of recommendations. An improvement or an alternative to the approach used in this paper seems to be the use of neural network algorithms to determine the similarity of cities. Such an approach would allow taking into account the entire diversity of conditions for the development of smart cities and processing even larger data samples. In theory, it could completely replace the approach used in this paper, although it would be somewhat less transparent.

6. Conclusions

The purpose of this study was to identify smart cities that are closest in their development conditions to Almaty and Astana in order to facilitate the transfer of relevant experience. The main challenge for this was to determine such similarities between the cities. Three quantitative methods were used to solve this problem: PCA, cluster analysis, and t-SNE. Each of the methods allowed us to identify similarities to a certain extent, but also has weaknesses. The simultaneous use of three methods was designed to compensate for these weaknesses and increase the reliability of the results. The analysis results identified three cities with the greatest similarity to Almaty and Astana: Ottawa and Denver. Ankara and Phoenix are also suitable as a second cohort. Study of the experience of these cities, as well as detailed analysis of the digital and management solutions implemented there, seem to be the most promising for further adaptation of Kazakhstan’s megacities.

The most significant limitation of the approach is the difficulty of substantiating the selected variables, which can be improved in further studies. The results from this study can be used for further in-depth analysis of the selected smart cities to identify the best practices for their development as smart cities, the formation of recommendations for decision-makers, and the development of strategies for turning Almaty and Astana into the first smart cities in Kazakhstan.

Thus, this study contributes to development of the methodology for selecting reference cities by applying a comprehensive statistical approach. Such a strategy can be useful not only in the context of Almaty and Astana, but also in any other case when it is necessary to formally determine similarity between cities or regions using a multidimensional set of indicators. The results obtained can become a basis for making more informed decisions in the field of urban planning, management, and implementation of “smart” technologies, considering local specifics and, at the same time, based on the best international practices.

Author Contributions

Conceptualization, M.U.; Methodology, M.U. and I.D.; Software, I.D.; Validation, M.U., I.D. and A.K.; Formal analysis, M.U. and I.D.; Investigation, I.D.; Resources, A.K.; Data curation, M.U. and I.D.; Writing—original draft, M.U. and I.D.; Writing—review & editing, M.U. and I.D.; Visualization, I.D.; Supervision, M.U. and A.K.; Project administration, A.K.; Funding acquisition, A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science Committee of the Ministry of Science and Higher Education of the Republic of Kazakhstan (Grant “Strategy for sustainable regional development based on the principles of forming a smart and digital ecosystem of cities in Kazakhstan” No. AP19574739).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Demographic and spatial characteristics of the cities.

City	Population 2023	Area (km²)	City Density	Agglomeration Population 2023	Area of Agglomeration (km²)	Average Annual Population Growth Rate in %	Density of the Agglomeration
Almaty	2,191,300	684	3206	3,357,100	9395.00	2.93	357
Ankara	5,186,002	3991	1299	5,803,482	24,521	1.3	237
Astana	1,423,726	797	1786	1,596,600	7100.00	4.67	225
Berlin	3,596,999	891	4037	4,679,500	1189.00	0.81	3936
Birmingham	1,166,049	267.8	4354	2,927,631	1027	0.99	2851
Bologna	390,518	140.9	2772	1,018,346	3702	0.34	275
Bratislava	478,040	367.6	1300	732,757	2053.00	0.18	357
Brussels	196,828	33	5948	1,884,358	826.90	1.70	2279
Bucharest	1,719,958	237.9	7230	2,303,505	1821	−0.74	1265
Budapest	1,685,342	525	3210	3,019,479	6917	−0.23	437
Canberra	452,670	393	1152	503,402	517.00	1.50	974
Denver	716,577	396	1810	2,963,821	21,763.67	1.36	136
Dusseldorf	631,217	217	2903	11,300,000	7110.00	0.38	1589
Geneva	206,635	16	12,971	628,478	536.50	0.45	1171
Krakow	806,201	326.9	2466	1,498,499	4065.11	0.25	369
Lausanne	144,160	41	3484	449,874	773.50	0.93	582
Ljubljana	284,293	164	1736	537,893	2334.00	−0.46	230
Luxembourg	134,697	51	2618	207,650	238.50	2.40	871
Madrid	3,340,176	606	5512	6,871,903	8028.00	0.39	856
Milan	1,371,850	181.7	7550	3,247,764	1575	0.81	2062
Munich	1,510,378	311	4857	6,200,000	27,700.00	0.84	224
Ottawa	1,114,316	2790	399	1,488,307	6767.41	2.10	220
Paris	2,087,577	105	19,806	10,896,433	2853.00	−0.72	3819
Phoenix	1,650,070	1342	1230	5,186,958	37731	0.79	137
Prague	1,301,432	496	2623	2,264,690	4822.00	0.25	470
San José	352,381	44.62	7897	2,158,898	2044	0.82	1056
Santiago	5,220,161	651,5	8013	7,112,808	15,403	0.77	462
Sofia	1,196,806	492	2433	1,667,314	10,738	−0.04	155
Vienna	2,005,760	415	4836	2,339,538	1110.00	1.87	2108
Vilnius	602,430	401	1502	747,864	2529.00	2.70	296
Warsaw	1,861,599	517	3599	3,269,510	6100.00	0.03	536
Zagreb	663,592	306	2169	1,189,279	4930	−0.35	241
Zaragoza	691,037	973.8	710	739,788	2288.8	0.47	323
Zurich	433,989	88	4938	1,460,999	1305.00	0.95	1120

Table A2. Climatic and topographic characteristics of the cities.

City	Average Annual Temperature (°C)	Average Annual Precipitation (mm)	Temperature Difference (°C)	Elevation Difference-to-Area Ratio	Lowest Point (m)	Highest Point (m)
Almaty	6.5	650	28.4	1.76	500	1700
Ankara	12.6	407	23.4	0.06	850	1088
Astana	3.5	297	35.1	0.08	347	407
Berlin	12	640	24	0.11	28.1	122
Birmingham	23	1343	32.2	0.61	152	315
Bologna	17	671.3	21.9	1.92	29	300
Bratislava	11.1	565	20.5	1.06	126	514
Brussels	10	837.2	21.6	3.51	13	129
Bucharest	11	648.1	35.6	0.15	55.8	91.5
Budapest	9.7	516	22.2	0.82	96	529
Canberra	13.6	632.6	22.5	0.86	550	888
Denver	10.2	363	40	0.43	1560	1730
Dusseldorf	12	800	18	0.73	28	186
Geneva	12.1	928	18.4	5.46	370	457
Krakow	8	663	24	0.60	187	383.6
Lausanne	11.3	1132.2	20.1	13.61	372	935
Ljubljana	11	1400	22.5	2.53	261	676
Luxembourg	8.3	950.8	20	3.46	230	408
Madrid	15.2	455	26	0.26	582	742
Milan	19.3	1186	23.5	0.67	44	166
Munich	10	1000	23.4	0.40	24	148
Ottawa	6	1166	30.3	0.22	224	824
Paris	12	646	23.2	2.11	177	399
Phoenix	24.2	183	33.6	0.10	1044	1176
Prague	8.6	676.5	20	0.60	400	700
San José	15.9	2788	11.2	4.46	500	699
Santiago	16.1	320	21.9	0.60	151	542
Sofia	10.9	625.7	28.8	0.37	112	294
Vienna	10	620	21.1	0.09	78	116
Vilnius	7.3	675	22	2.28	122	1035
Warsaw	9	481.7	26.1	0.92	169	646
Zagreb	12	1050	23	1.57	392	871
Zaragoza	18	328.8	29.9	0.56	250	800
Zurich	9.9	1022	18.6	5.20	620	1077

Appendix B

Script for R Used for Calculations

library(readxl) # For reading Excel files

library(cluster) # For clustering functions

library(factoextra) # For visualization of clustering results

library(ggplot2) # For visualization of PCA results

library(Rtsne) # For t-distributed Stochastic Neighbor Embedding

library(umap) # For UMAP

library(scatterplot3d)

library(plotly)

library(ggrepel)

#####DATA PREPARATION

# Load the data

data <- read_excel(“cities_data.xlsx”)

data <- as.data.frame(data) # Convert into a data frame

# Set the “city” column as row names without removing it from the dataframe

row.names(data) <- data$city

# Exclude the “city” column for clustering

data_for_clustering <- data[, !names(data) %in% c(“city”)]

# Apply log transformation

cols_to_log <- c(“pop”, “area”, “agglomeration_pop”, “agglomeration_area”, “agglomeration_density”,

“city_density”, “avg_precipitation”, “lowest_point”, “height_area_ratio”)

data_trans <- data_for_clustering

data_trans[cols_to_log] <- log10(data_trans[cols_to_log])

#Ensure all columns are numeric and scaled

data_scaled <- scale(data_trans) # Normalizes all variables to mean = 0 and SD = 1

##### Z-SCORE OUTLIER CHECK

# Compute z-scores (standardized values)

z_scores <- scale(data_trans)

# Find which entries have any z-score > 3 or < -3 (potential outliers)

outlier_matrix <- abs(z_scores) > 3

# Summarize which cities and variables are affected

outlier_summary <- which(outlier_matrix, arr.ind = TRUE)

if (nrow(outlier_summary) > 0) {

cat(“Potential outliers detected:\n”)

for (i in 1:nrow(outlier_summary)) {

city_name <- rownames(data_for_clustering)[outlier_summary[i, 1]]

variable_name <- colnames(data_for_clustering)[outlier_summary[i, 2]]

z_value <- z_scores[outlier_summary[i, 1], outlier_summary[i, 2]]

cat(sprintf(“ - %s in ‘%s’ (z = %.2f)\n”, city_name, variable_name, z_value))

}

} else {

cat(“No z-score outliers (|z| > 3) detected.\n”)

}

# Calculate a distance matrix based only on numeric columns

dist_matrix <- dist(data_scaled, method = “euclidean”)

graphics.off()#Reset plots

##### CLUSTERING

###WARD

# Perform hierarchical clustering

hc_w <- hclust(dist_matrix, method = “ward.D2”)

# Plot dendrogram to visualize clusters with city names

plot(hc_w, labels = row.names(data), main = “Ward”, xlab = “Cities”, sub = ““, cex = 0.9)

# Optional Step: Cut the dendrogram into clusters

clusters <- cutree(hc_w, k = 4)

# Add the cluster labels to your data

#data$Cluster <- clusters

# Save the clustered data

#write.csv(data, “ward_clusters.csv”, row.names = TRUE)

#####PCA

# Perform PCA

pca_result <- prcomp(data_scaled, scale. = TRUE)

summary(pca_result) # Provides explained variance for each component

var_info <- get_pca_var(pca_result)

head(var_info$contrib[, 1])

head(var_info$contrib[, 2])

fviz_contrib(pca_result, choice = “var”, axes = 1) +

ggtitle(“Variable contributions to PC1”)

fviz_contrib(pca_result, choice = “var”, axes = 2) +

ggtitle(“Variable contributions to PC2”)

# Scree plot

explained_variance <- pca_result$sdev^2 / sum(pca_result$sdev^2)

plot(cumsum(explained_variance), type = “b”, xlab = “Number of Components”, ylab = “Cumulative Explained Variance”)

# Print loadings for the first few components

loadings <- pca_result$rotation[, 1:3] # Adjust the number of components as needed

print(loadings)

# Extract PCA results for individuals (cities)

pca_ind <- get_pca_ind(pca_result)

# Create a data frame for plotting: keep the first 3 dimensions

pca_data <- as.data.frame(pca_ind$coord[, 1:3])

colnames(pca_data) <- c(“Dim.1”, “Dim.2”, “Dim.3”)

pca_data$city <- data$city # Add city names as a separate column

##############################################################################

# 1) 2D PCA PLOT (PC1 vs PC2) - Using ggplot2

##############################################################################

pca_data$color_group <- with(pca_data,

ifelse(city %in% c(“Almaty”, “Astana”), “red”,

ifelse(city %in% c(“Ottawa”, “Zaragoza”, “Denver”), “green”,

ifelse(city %in% c(“Phoenix”, “Ankara”, “Sofia”, “Prague”, “Zagreb”, “Vilnius”), “blue”,

“other”))))

ggplot(pca_data, aes(x = Dim.1, y = Dim.2, label = city, color = color_group)) +

geom_point(size = 3) +

geom_text_repel(

min.segment.length = 0,

box.padding = 0.5,

max.overlaps = Inf

) +

scale_color_manual(

values = c(red = “red”, green = “green”, blue = “blue”, other = “black”),

guide = “none”

) +

labs(

title = “PCA of Cities”,

x = “PC1”,

y = “PC2”

) +

theme_minimal()

##############################################################################

# 2) INTERACTIVE 3D PCA PLOT (PC1, PC2, PC3) - Using plotly

##############################################################################

plot_ly(

data = pca_data,

x = ~Dim.1,

y = ~Dim.2,

z = ~Dim.3,

text = ~city, # hover text

type = “scatter3d”,

mode = “markers+text”, # points plus text labels

marker = list(size = 4),

textposition = “top center”

) %>%

layout(

title = “PCA of Cities (Interactive 3D)”,

scene = list(

xaxis = list(title = “PC1”),

yaxis = list(title = “PC2”),

zaxis = list(title = “PC3”)

)

##### t-SNE

# Perform t-SNE

graphics.off()#Reset plots

tsne_result <- Rtsne(data_scaled, dims = 2, perplexity = 5,

eta = 100, max_iter = 1000, theta = 0,

check_duplicates = FALSE, pca = FALSE)

# Convert the t-SNE result to a data frame and add city names

tsne_data <- as.data.frame(tsne_result$Y)

tsne_data$city <- row.names(data) # Add city names as a separate column

tsne_data$color_group <- with(tsne_data,

ifelse(city %in% c(“Almaty”, “Astana”), “red”,

ifelse(city %in% c(“Ottawa”, “Denver”, “Phoenix”), “green”,

ifelse(city %in% c(“Ankara”, “Vilnius”, “Madrid”, “Zaragoza”, “Canberra”), “blue”,

“other”))))

# Plot with ggplot2, adding city labels without clustering colors

ggplot(tsne_data, aes(x = V1, y = V2, label = city, color = color_group)) +

geom_point(size = 3) +

geom_text_repel(

min.segment.length = 0,

box.padding = 0.5,

max.overlaps = Inf

) +

scale_color_manual(

values = c(red = “red”, green = “green”, blue = “blue”, other = “black”),

guide = “none”

) +

labs(

title = “t-SNE of Cities”,

x = “Dimension 1”,

y = “Dimension 2”

) +

theme_minimal()

References

Macke, J.; Casagrande, R.M.; Sarate, J.A.; da Silva, K.A. Smart city and quality of life: Citizens’ perception in a Brazilian case study. J. Clean. Prod. 2018, 182, 717–726. [Google Scholar] [CrossRef]
Moser, C.; Wendel, T.; Carabias-Hütter, V. Scientific and Practical Understandings of Smart Cities; ZHAW Zurich University of Applied Sciences: Winterthur, Switzerland, 2014. [Google Scholar] [CrossRef]
Radchenko, K. Modern Foreign Approaches to Defining the Concept of Smart City. Manag. Econ. Theory Pract. Chumachenko’s Ann. 2022, 2022, 174–188. [Google Scholar] [CrossRef]
Kim, C.; Kim, K.A. The Institutional Change from E-Government toward Smarter City. J. Open Innov. Technol. Mark. Complex. 2021, 7, 42. [Google Scholar] [CrossRef]
Meijer, A.; Bolívar, M. Governing the smart city: A review of the literature on smart urban governance. Int. Rev. Adm. Sci. 2016, 82, 392–408. [Google Scholar] [CrossRef]
Urdabayev, M.; Kireyeva, A.; Vasa, L.; Digel, I.; Nurgaliyeva, K.; Nurbatsin, A. Discovering smart cities’ potential in Kazakhstan: A cluster analysis. PLoS ONE 2024, 19, e0296765. [Google Scholar] [CrossRef]
Nurbatsin, A.; Kireyeva, A.; Gamidullaeva, L.; Abdykadyr, T. Spatial analysis and technological influences on smart city development in Kazakhstan. J. Infrastruct. Policy Dev. 2023, 8, 3012. [Google Scholar] [CrossRef]
Digel, I.; Mussabalina, D.; Urdabayev, M.; Nurmukhametov, N.; Akparova, A. Evaluating development prospects of smart cities: Cluster analysis of Kazakhstan’s regions. Probl. Perspect. Manag. 2022, 20, 319–330. [Google Scholar] [CrossRef]
Mendybayev, B. Imbalances in Kazakhstan’s Smart Cities Development. Environ. Urban. Asia 2022, 13, 389–402. [Google Scholar] [CrossRef]
Mudronja, G.; Jugović, A.; Škalamera-Alilović, D. Seaports and Economic Growth: Panel Data Analysis of EU Port Regions. J. Mar. Sci. Eng. 2020, 8, 1017. [Google Scholar] [CrossRef]
Landlocked Developing Countries. UNIS Vienna. Available online: https://unis.unvienna.org/unis/topics/related/2014/landlocked-developing-countries.html (accessed on 5 February 2025).
Bayar Çağlak, S.; Aydın, G.; Alkan, G. The Impact of Seaport Investments on Regional Economics and Developments. Int. J. Bus. Manag. Stud. 2011, 3, 333–339. [Google Scholar]
Astana vs. Almaty: Which City Is Best for Connected Cars and EVs? Available online: https://astanahub.com/ru/blog/astana-vs-almaty-kakoi-gorod-luchshe-vsego-podkhodit-dlia-connected-cars-i-evs (accessed on 6 February 2025).
Kazakhstan Sees Positive Shift in Migration Dynamics—The Astana Times. Available online: https://astanatimes.com/2025/01/kazakhstan-sees-positive-shift-in-migration-dynamics/ (accessed on 7 February 2025).
OECD. Insights on the Business Climate in Kazakhstan; OECD Publishing: Paris, France, 2023; Available online: https://www.oecd.org/content/dam/oecd/en/publications/reports/2023/05/insights-on-the-business-climate-in-kazakhstan_60af2af3/bd780306-en.pdf (accessed on 8 May 2025).
Kursiv Media. Where in Kazakhstan Are Utilities the Most Expensive? Kursiv Media, 3 March 2025. Available online: https://kz.kursiv.media/2025-03-03/print1073-kchl-tarif (accessed on 9 May 2025).
Eurasian Research Institute. A Spatial Analysis of Internal Migration in Kazakhstan; Eurasian Research Institute: Almaty, Kazahstan, 2023; Available online: https://www.eurasian-research.org/publication/41789/ (accessed on 9 May 2025).
TheGlobalEconomy. Carbon Dioxide Emissions per Capita, European Union (Average of 27 Countries). 2023. Available online: https://www.theglobaleconomy.com/rankings/Carbon_dioxide_emissions_per_capita/European-union/ (accessed on 9 May 2025).
United Nations Development Programme. Water Management in Kazakhstan: A Systems Approach for a Secure Future. Blog Post, 18 March 2025. Available online: https://www.undp.org/kazakhstan/blog/water-management-kazakhstan-systems-approach-secure-future (accessed on 10 May 2025).
World Bank. Cleaner Residential Heating Key to Reducing Air Pollution in Kazakhstan’s Cities. Press Release. 28 March 2022. Available online: https://www.worldbank.org/en/news/press-release/2022/03/28/cost-effective-air-quality-management-in-kazakhstan (accessed on 10 May 2025).
AWEX Almaty. Smart Cities Kazakhstan: Fact Sheet on the “Digital Kazakhstan” State Programme. February 2022. Available online: https://www.awex-export.be/files/library/Fiches-Pays/Kazakhstan/Kazakhstan-fiche-sectorielle-smart-cities.pdf (accessed on 10 May 2025).
Kogabayev, T.; Banerjee, S. Smart Governance in Kazakhstan: A Systematic Review and Analysis of Development, Challenges, and Future Directions. Smart Cities Reg. Dev. (SCRD) Prepr. 2024, 1. [Google Scholar] [CrossRef]
Atanasova, A.; Naydenov, K. The Innovative Approaches for the Development of Smart Cities. In Key Challenges in Geography; Nedkov, S., Zhelezov, G., Eds.; Springer: Cham, Switzerland, 2019; pp. 351–367. [Google Scholar] [CrossRef]
Noori, N.; Hoppe, T.; de Jong, M. Classifying pathways for smart city development: Comparing design, governance and implementation in Amsterdam, Barcelona, Dubai, and Abu Dhabi. Sustainability 2020, 12, 4030. [Google Scholar] [CrossRef]
Jabłońska, A. Smart Cities in Practice: A Comparative Case Study Between Warsaw, Gdynia, Copenhagen and Malmö. Master’s Thesis, Lund University, Lund, Sweden, 2018. Available online: https://lup.lub.lu.se/luur/download?fileOId=8959739&func=downloadFile&recordOId=8959738 (accessed on 25 February 2025).
Nunes, S.A.S.; Ferreira, F.; Govindan, K.; Pereira, L. “Cities go smart!”: A system dynamics-based approach to smart city conceptualization. J. Clean. Prod. 2021, 313, 127683. [Google Scholar] [CrossRef]
Neirotti, P.; Marco, A.; Cagliano, A.C.; Mangano, G.; Scorrano, F. Current trends in Smart City initiatives: Some stylised facts. Cities 2014, 38, 25–36. [Google Scholar] [CrossRef]
Hedegaard, M.; Kuzior, A.; Tverezovska, O.; Hrytsenko, L.; Kolomiiets, S. Smart City Projects Financing. Socioecon. Chall. 2024, 8, 286–309. [Google Scholar] [CrossRef]
Oyadeyi, O.A.; Oyadeyi, O.O. Towards Inclusive and Sustainable Strategies in Smart Cities: A Comparative Analysis of Zurich, Oslo, and Copenhagen. Res. Glob. 2025, 10, 100271. [Google Scholar] [CrossRef]
Vidiasova, L. Conceptualization of the “Smart City” concept: Socio-technical approach. Int. J. Open Inf. Technol. 2017, 5, 52–57. Available online: http://injoit.org/index.php/j1/article/viewFile/506/480 (accessed on 8 May 2025).
Shi, F.; Shi, W. A Critical Review of Smart City Frameworks: New Criteria to Consider When Building Smart City Framework. ISPRS Int. J. Geo Inf. 2023, 12, 364. [Google Scholar] [CrossRef]
Kogan, N.; Lee, K. Exploratory Research on the Success Factors and Challenges of Smart City Projects. Asia Pac. J. Inf. Syst. 2014, 24, 141–189. [Google Scholar] [CrossRef]
Kim, M.; Ko, J. Becoming a Smart Citizen, a Study of Digital Literacy and Perceptions of Smart Life Changes. Korea Real Estate Policy Assoc. 2024, 25, 49–66. [Google Scholar] [CrossRef]
Berawi, M.A.; Sari, M.; Miraj, P. Citizen and Technology: The Core in Developing Human-Centric Smart Cities. CSID J. Infrastruct. Dev. 2024, 7, 363–365. [Google Scholar] [CrossRef]
Urdabayev, M.; Utkelbay, R. SWOT analysis of smart city projects in capital cities of Russia and Kazakhstan. R-Economy 2021, 7, 235–247. [Google Scholar] [CrossRef]
Toxanov, S.; Neftissov, A.; Abzhanova, D.; Kazambayev, I. The concept of the smart city of Astana: Energy-efficient technologies and solutions for sustainable development. Sci. J. Astana IT Univ. 2022, 11, 74–86. [Google Scholar] [CrossRef]
Irnazarov, F.; Kayumova, M. Toward Smart City Development in Central Asia: A Comparative Assessment. Cent. Asian Aff. 2017, 4, 51–82. [Google Scholar] [CrossRef]
Turgel, I.; Bozhko, L.; Ulyanova, E.; Khabdullin, A. Implementation of the smart city technology for environmental protection management of cities: The experience of Russia and Kazakhstan. Environ. Clim. Technol. 2019, 23, 148–165. [Google Scholar] [CrossRef]
Makhatov, N.B.; Alzhanov, A. Human capital development in «smart cities» of Kazakhstan: Networks and «live laboratories». Cent. Asian Econ. Rev. 2022, 3, 100–112. [Google Scholar] [CrossRef]
Mupfumira, P.; Mutingi, M.; Sony, M. Smart City Frameworks SWOT Analysis: A Systematic Literature Review. Front. Sustain. Cities 2024, 6, 1449983. [Google Scholar] [CrossRef]
Han, H. Adoption of K-Means Clustering Algorithm in Smart City Security Analysis and Mythical Experience Analysis of Urban Image. PLoS ONE 2025, 20, e0319620. [Google Scholar] [CrossRef]
Zhu, J.; Gianoli, A.; Noori, N.; de Jong, M.; Edelenbos, J. How Different Can Smart Cities Be? A Typology of Smart Cities in China. Cities 2024, 149, 104992. [Google Scholar] [CrossRef]
Hammoumi, L.; Rhinane, H. Machine learning (ai) for identifying smart cities. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2024, XLVIII-4/W9-2024, 221–228. [Google Scholar] [CrossRef]
Cantuarias-Villessuzanne, C.; Weigel, R.; Blain, J. Clustering of European Smart Cities to Understand the Cities’ Sustainability Strategies. Sustainability 2021, 13, 513. [Google Scholar] [CrossRef]
Ding, C.; He, X. Principal Component Analysis and Effective K-Means Clustering. In Proceedings of the 2004 SIAM International Conference on Data Mining; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2004; pp. 497–501. [Google Scholar] [CrossRef]
Abdulhafedh, A. Incorporating K-means, Hierarchical Clustering and PCA in Customer Segmentation. J. City Dev. 2021, 3, 12–30. Available online: https://pubs.sciepub.com/jcd/3/1/3/ (accessed on 26 May 2025).
Akande, A.; Cabral, P.; Casteleyn, S. Assessing the Gap between Technology and the Environmental Sustainability of European Cities. Inf. Syst. Front. 2019, 21, 581–604. [Google Scholar] [CrossRef]
Bogdanov, O.; Jeremić, V.; Jednak, S.; Čudanov, M. Scrutinizing the Smart City Index: A Multivariate Statistical Approach. Zb. Rad. Ekon. Fak. Rij. 2019, 37, 777–799. [Google Scholar] [CrossRef]
Kourtzanidis, K.; Angelakoglou, K.; Apostolopoulos, V.; Giourka, P.; Nikolopoulos, N. Assessing Impact, Performance and Sustainability Potential of Smart City Projects: Towards a Case Agnostic Evaluation Framework. Sustainability 2021, 13, 7395. [Google Scholar] [CrossRef]
Gaurav, A.; Gupta, B.B.; Arya, V.; Attar, R.W.; Bansal, S.; Alhomoud, A.; Chui, K.T. Smart waste classification in IoT-enabled smart cities using VGG16 and Cat Swarm Optimized random forest. PLoS ONE 2025, 20, e0316930. [Google Scholar] [CrossRef]
Yang, Z.; Li, D.; Nai, W. T-SNE Based on Halton Sequence Initialized Butterfly Optimization Algorithm. In Proceedings of the 2023 IEEE 13th International Conference on Electronics Information and Emergency Communication (ICEIEC), Beijing, China, 14–16 July 2023; pp. 1–5. [Google Scholar] [CrossRef]
Ivanov, D.; Smith, J.; Petrov, A.; Wang, J.; Volkov, S.; Zhao, D. Integrating IoT Sensors into Smart City Data Visualization Systems. Preprint 2025. [Google Scholar] [CrossRef]
Da Silva Lopes, M.A.; Doria Neto, A.D.; De Medeiros Martins, A. Parallel T-SNE Applied to Data Visualization in Smart Cities. IEEE Access 2020, 8, 11482–11490. [Google Scholar] [CrossRef]
Huang, S. Image Data Visualization Using T-SNE for Urban Pavement Disease Recognition. J. Phys. Conf. Ser. 2023, 2547, 012013. [Google Scholar] [CrossRef]
IMD. Smart City Index 2024. Available online: https://www.imd.org/smart-city-observatory/home/rankings/ (accessed on 3 January 2025).
Global Data Lab. Available online: https://globaldatalab.org (accessed on 1 April 2025).
Bafail, O. Optimizing Smart City Strategies: A Data-Driven Analysis Using Random Forest and Regression Analysis. Appl. Sci. 2024, 14, 11022. [Google Scholar] [CrossRef]
UN-Habitat. What Is a City? Definition of Cities; United Nations Human Settlements Programme: Nairobi, Kenya, 2020; Available online: https://unhabitat.org/sites/default/files/2020/06/city_definition_what_is_a_city.pdf (accessed on 7 January 2025).
Archova Visuals. Why Urban Development Is Important. 2024. Available online: https://archovavisuals.com/why-urban-development-is-important/ (accessed on 4 January 2025).
Rosenthal, S.S.; Strange, W.C. Chapter 49 Evidence on the Nature and Sources of Agglomeration Economies. In Handbook of Regional and Urban Economics; Elsevier: Amsterdam, The Netherlands, 2004; pp. 2119–2171. [Google Scholar] [CrossRef]
Duranton, G.; Puga, D. The Growth of Cities; CEPR Discussion Paper No. 9590; Centre for Economic Policy Research: London, UK, 2013; Available online: https://real-faculty.wharton.upenn.edu/wp-content/uploads/~duranton/Duranton_Papers/Current_Research/urban_growth.pdf (accessed on 6 January 2025).
Ritchie, H.; Samborska, V.; Roser, M. Urbanization: The World Population Is Moving to Cities. Why Is Urbanization Happening and What Are the Consequences? Our World in Data. 2024. Available online: https://ourworldindata.org/urbanization (accessed on 9 January 2025).
Aziz, A.; Makkawi, B. Relationship between Foreign Direct Investment and Country Population. Int. J. Bus. Manag. 2012, 7, 63. [Google Scholar] [CrossRef]
European Commission. Increasing Importance of Cities. Knowledge for Policy. 2020. Available online: https://knowledge4policy.ec.europa.eu/foresight/topic/continuing-urbanisation/increasing-importance-cities_en (accessed on 11 January 2025).
World Bank. Urban Development Overview. 2023. Available online: https://www.worldbank.org/en/topic/urbandevelopment/overview (accessed on 12 January 2025).
Bolshakov, V. Boundary delimitation of Chelyabinsk agglomeration. IOP Conf. Ser. Mater. Sci. Eng. 2018, 451, 012134. [Google Scholar] [CrossRef]
Bolter, K.; Robey, J. Agglomeration Economies: A Literature Review; W.E. Upjohn Institute for Employment Research: Kalamazoo, MI, USA, 2020; Available online: https://research.upjohn.org/reports/252 (accessed on 14 January 2025).
Wibowo, Y.; Kudo, T. Agglomeration and Urban Manufacture Labor Productivity in Indonesia. Signifikan J. Ilmu Ekon. 2019, 8, 145–158. [Google Scholar] [CrossRef][Green Version]
Fu, W.; Luo, C.; He, S. Does Urban Agglomeration Promote the Development of Cities? An Empirical Analysis Based on Spatial Econometrics. Sustainability 2022, 14, 14512. [Google Scholar] [CrossRef]
United Nations. Population in Urban Agglomerations of More than 1 Million (EN.URB.MCTY). World Development Indicators, World Bank. 2018. Available online: https://databank.worldbank.org/metadataglossary/world-development-indicators/series/EN.URB.MCTY (accessed on 15 January 2025).
Yao, Y.; Liu, L. Research on Population Mobility and Sustainable Economic Growth from a Communication Perspective. Front. Psychol. 2022, 13, 935606. [Google Scholar] [CrossRef] [PubMed]
Beyer, S. The 4 Benefits of Urban Agglomeration. Market Urbanism Report. 2019. Available online: https://www.marketurbanist.com/blog/the-4-benefits-of-urban-agglomeration (accessed on 16 January 2025).
Buch, T.; Hamann, S.; Niebuhr, A.; Rossen, A. What Makes Cities Attractive? The Determinants of Urban Labour Migration in Germany. Urban Stud. 2014, 51, 1960–1978. [Google Scholar] [CrossRef]
Wilmoth, J.; Menozzi, C.; Bassarsky, L. Why Population Growth Matters for Sustainable Development; UN DESA Policy Brief No. 130; United Nations Department of Economic and Social Affairs: New York, NY, USA, 2022; Available online: https://www.un.org/development/desa/pd/sites/www.un.org.development.desa.pd/files/undesa_pd_2022_policy_brief_population_growth.pdf (accessed on 17 January 2025).
Filipenco, D. The Impact of Population Growth on Sustainable Development. DevelopmentAid, 21 March 2024. Available online: https://www.developmentaid.org/news-stream/post/163665/population-growth-and-sustainable-development (accessed on 17 January 2025).
Kremer, M. Population Growth and Technological Change: One Million B.C. to 1990. Q. J. Econ. 1993, 108, 681–716. [Google Scholar] [CrossRef]
Wogan, J.B. Population Growth Means a City Is Thriving, or Does It? Public Officials and Reporters Alike Adopt the Myth That Bigger Is Better. Governing Magazine, 29 August 2017. Available online: https://www.governing.com/archive/gov-population-city-growth-thriving.html (accessed on 18 January 2025).
Duranton, G.; Puga, D. The Economics of Urban Density. J. Econ. Perspect. 2020, 34, 3–26. [Google Scholar] [CrossRef]
Güneralp, B.; Zhou, Y.; Ürge-Vorsatz, D.; Gupta, M.; Yu, S.; Patel, P.L.; Fragkias, M.; Li, X.; Seto, K.C. Global Scenarios of Urban Density and Its Impacts on Building Energy Use through 2050. Proc. Natl. Acad. Sci. USA 2017, 114, 8945–8950. [Google Scholar] [CrossRef]
Schiller, G. Urban Infrastructure: Challenges for Resource Efficiency in the Building Stock. Build. Res. Inf. 2007, 35, 399–411. [Google Scholar] [CrossRef]
Brown, J.; Barber, A. Social Infrastructure and Sustainable Urban Communities. Proc. Inst. Civ. Eng.-Eng. Sustain. 2012, 165, 99–110. [Google Scholar] [CrossRef]
Takano, T.; Morita, H.; Nakamura, S.; Togawa, T.; Kachi, N.; Kato, H.; Hayashi, Y. Evaluating the Quality of Life for Sustainable Urban Development. Cities 2023, 142, 104561. [Google Scholar] [CrossRef]
Rahman, M.M.; Najaf, P.; Fields, M.G.; Thill, J.-C. Traffic Congestion and Its Urban Scale Factors: Empirical Evidence from American Urban Areas. Int. J. Sustain. Transp. 2021, 16, 406–421. [Google Scholar] [CrossRef]
Livingstone, N.; Short, M.; Fiorentino, S.; Bunce, S. Editorial: Density, Sustainability and the Governance of Urban Futures. Front. Sustain. Cities 2023, 5, 1277926. [Google Scholar] [CrossRef]
Oh, S.; Chen, N. Do Public Transit and Agglomeration Economies Collectively Enhance Low-Skilled Job Accessibility in Portland, OR? Transp. Policy 2022, 115, 209–219. [Google Scholar] [CrossRef]
Chen, L.; Yu, L.; Yin, J.; Xi, M. Impact of population density on spatial differences in the economic growth of urban agglomerations: The case of Guanzhong Plain Urban Agglomeration, China. Sustainability 2023, 15, 14601. [Google Scholar] [CrossRef]
Henderson, J.V.; Nigmatulina, D.; Kriticos, S. Measuring urban economic density. J. Urban Econ. 2021, 125, 103188. [Google Scholar] [CrossRef]
Yu, X.; Wu, Z.; Zheng, H.; Li, M.; Tan, T. How urban agglomeration improves emission efficiency? A spatial econometric analysis of the Yangtze River Delta urban agglomeration in China. J. Environ. Manag. 2020, 260, 110061. [Google Scholar] [CrossRef]
Jia, Y.; Tang, L.; Zhang, P.; Xu, M.; Luo, L.; Zhang, Q. Exploring the scaling relations between urban spatial form and infrastructure. Int. J. Sustain. Dev. World Ecol. 2022, 29, 665–675. [Google Scholar] [CrossRef]
Akinyemi, A.D.; Egogo-Stanley, A.O.; Ibrahim, O.M.; Ezeamii, G.C. Assessing the role of topographical analysis in sustainable urban development: Insights from GIS-driven terrain studies. EPRA Int. J. Econ. Growth Environ. Issues 2025, 13, 9–22. [Google Scholar] [CrossRef]
Ilham, I. Impact of regional structure and topography on the effectiveness of public transportation services. West Sci. Soc. Humanit. Stud. 2024, 2, 2083–2092. [Google Scholar] [CrossRef]
Kuang, M.; Zheng, Y.; Deng, X.; Yang, Y.; Wang, J.; Sui, X.; Peng, Y. Flood risk management in planning and construction of city: The Guangzhou experience. Proc. IAHS 2024, 386, 277–283. [Google Scholar] [CrossRef]
She, Y.; Shen, L.; Jiao, L.; Zuo, J.; Tam, V.W.Y.; Yan, H. Constraints to achieve infrastructure sustainability for mountainous townships in China. Habitat Int. 2018, 73, 65–78. [Google Scholar] [CrossRef]
Syed Abdul Rahman, S.A.F.; Abdul Maulud, K.N.; Ujang, U.; Wan Mohd Jaafar, W.S.; Shaharuddin, S.; Ab Rahman, A.A. The digital landscape of smart cities and digital twins: A systematic literature review of digital terrain and 3D city models in enhancing decision-making. Sage Open 2024, 14, 21582440231220768. [Google Scholar] [CrossRef]
Elagiry, M.; Kraus, F.; Scharf, B.; Costa, A.; Delotto, R. Nature 4 Cities: Nature-Based Solutions and Climate Resilient Urban Simulation with Greenpass^® Tool and On Site Validation. A Case Study in Segrate/Milano/IT. In Proceedings of the 16th IBPSA International Conference and Exhibition, Rome, Italy, 2–4 September 2019. [Google Scholar] [CrossRef]
Das, S.; Choudhury, M.R.; Chatterjee, B.; Das, P.; Bagri, S.; Paul, D.; Bera, M.; Dutta, S. Unraveling the Urban Climate Crisis: Exploring the Nexus of Urbanization, Climate Change, and Their Impacts on the Environment and Human Well-Being–A Global Perspective. AIMS Public Health 2024, 11, 963–1001. [Google Scholar] [CrossRef]
Dhar, T.K.; Khirfan, L. Climate Change Adaptation in the Urban Planning and Design Research: Missing Links and Research Agenda. J. Environ. Plan. Manag. 2016, 60, 602–627. [Google Scholar] [CrossRef]
Creutzig, F.; Baiocchi, G.; Bierkandt, R.; Pichler, P.-P.; Seto, K.C. Global Typology of Urban Energy Use and Potentials for an Urbanization Mitigation Wedge. Proc. Natl. Acad. Sci. USA 2015, 112, 6283–6288. [Google Scholar] [CrossRef]
Leal Filho, W.; Abeldaño Zuñiga, R.A.; Sierra, J.; Dinis, M.A.P.; Corazza, L.; Nagy, G.J.; Aina, Y.A. An Assessment of Priorities in Handling Climate Change Impacts on Infrastructures. Sci. Rep. 2024, 14, 14147. [Google Scholar] [CrossRef]
Gürçam, S. Paving the Way for Climate Resilience through Sustainable Urbanization: A Comparative Study. Lectio Socialis 2024, 8, 17–34. [Google Scholar] [CrossRef]
Shan, J.; Yu, M.; Lee, C.-Y. An Empirical Investigation of the Seaport’s Economic Impact: Evidence from Major Ports in China. Transp. Res. Part E Logist. Transp. Rev. 2014, 69, 41–53. [Google Scholar] [CrossRef]
Marzouki, A.; Nefzi, M.; Mellouli, S.; Hajji, A.; Rekik, M. Transforming City Initiative into a Smart City Initiative. In Proceedings of the 17th International Digital Government Research Conference on Digital Government Research, Shanghai, China, 8–10 June 2016; pp. 536–537. [Google Scholar] [CrossRef]
Huang-Lachmann, J.-T. Systematic Review of Smart Cities and Climate Change Adaptation. SAMPJ 2019, 10, 745–772. [Google Scholar] [CrossRef]
Geraldo Bastías, P.; Brand, J.E. Causal Inference. Sociology 2020. [Google Scholar] [CrossRef]
City Population. Available online: https://www.citypopulation.de/ (accessed on 25 February 2025).
Climate Data. Available online: https://en.climate-data.org (accessed on 1 April 2025).
Topographic-map. Available online: https://topographic-map.com (accessed on 1 April 2025).
Jaeger, A.; Banks, D. Cluster analysis: A modern statistical review. Wiley Interdiscip. Rev. Comput. Stat. 2023, 15, e1597. [Google Scholar] [CrossRef]
Ward, J.H., Jr. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 1963, 58, 236–244. [Google Scholar] [CrossRef]

Figure 1. PCA of chosen smart cities, Almaty, and Astana.

Figure 2. Cluster analysis of chosen smart cities, Almaty, and Astana.

Figure 3. t-SNE of chosen smart cities, Almaty, and Astana.

Figure 4. Contribution of variables to principal component 1. The red dotted line shows the average expected contribution of each variable.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Urdabayev, M.; Digel, I.; Kireyeva, A. Application of Quantitative Methods to Identify Analogous Cities: A Search for Relevant Experiences in the Development of Smart Cities for Implementation in Kazakhstan. Smart Cities 2025, 8, 92. https://doi.org/10.3390/smartcities8030092

AMA Style

Urdabayev M, Digel I, Kireyeva A. Application of Quantitative Methods to Identify Analogous Cities: A Search for Relevant Experiences in the Development of Smart Cities for Implementation in Kazakhstan. Smart Cities. 2025; 8(3):92. https://doi.org/10.3390/smartcities8030092

Chicago/Turabian Style

Urdabayev, Marat, Ivan Digel, and Anel Kireyeva. 2025. "Application of Quantitative Methods to Identify Analogous Cities: A Search for Relevant Experiences in the Development of Smart Cities for Implementation in Kazakhstan" Smart Cities 8, no. 3: 92. https://doi.org/10.3390/smartcities8030092

APA Style

Urdabayev, M., Digel, I., & Kireyeva, A. (2025). Application of Quantitative Methods to Identify Analogous Cities: A Search for Relevant Experiences in the Development of Smart Cities for Implementation in Kazakhstan. Smart Cities, 8(3), 92. https://doi.org/10.3390/smartcities8030092

Article Menu

Application of Quantitative Methods to Identify Analogous Cities: A Search for Relevant Experiences in the Development of Smart Cities for Implementation in Kazakhstan

Abstract

Highlights

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

4. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

Script for R Used for Calculations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI