Application of Quantitative Methods to Identify Analogous Cities: A Search for Relevant Experiences in the Development of Smart Cities for Implementation in Kazakhstan
Round 1
Reviewer 1 Report (Previous Reviewer 2)
Comments and Suggestions for Authors I revise the new improved version of this paper, and their answers associated to my comments. This paper has been significantly improved from my point of view. I accept that it could be published in smartcities.Author Response
Dear Reviewer,
We sincerely thank you for taking the time to review our manuscript and for your positive assessment. We appreciate your recognition that our paper has been significantly improved. We are grateful for your recommendation that the paper could be published in Smart Cities.
We have also addressed the areas marked in the checklist as "Can be improved" or "Must be improved" throughout the revised manuscript to enhance clarity, contextualization, and the robustness of our arguments and conclusions.
Reviewer 2 Report (New Reviewer)
Comments and Suggestions for AuthorsIn the abstract, it is not clear the expression: “and thus the relevance of foreign experience”. Foreign meaning “other cities”?
The following sentence should not be included in the literature review as it is a result/finding of the paper: “Cluster analysis demonstrates that these cities have the highest potential for transformation in smart cities, unlike other municipalities that require additional investments in digital infrastructure [7].”
I think the literature review should focus on the papers that have already used quantitative methods to classify or cluster cities based on the level of smartness. And, in this respect, the paper should stress on the main contribution and originality of this paper compare to the rest.
Including physical topography among the indicators is a decision of the researchers that should be explained since another decision was to avoid cities which are seaports, or that are near the sea. The same criteria for avoiding seaports could be leading to avoid cities with certain topography. The same applies to climate conditions.
I would not refer in the methodology section to causality. The authors seem to express that the methodology use is a kind of causality experiment. They say: “Limiting our analysis to basic sociodemographic and geographic indicators is not a shortcoming of the study, but a methodological necessity based on the fundamental principles of causal inference.” However, only experimental techniques or pseudo experimental such as propensity score matching can control for endogeneity and are methods of causality.
I think the cluster analysis should not focus on a pair wise analysis, because by doing this way, we impede large clusters and finding similarities with more cities. I would focus on the three clusters defined around height 6 from figure 2.
On the other side, variables/indicators included for conducting the Principal components analysis could not necessarily be the same as those included in the cluster analysis. For a cluster analysis to maximize differences among clusters, there must be statistical differences in the means/media (anova) for each indicator. The authors should include information of the variables/indicators and their weight in each methodology used.
In this respect, the authors say in page 12, line 508: “ These cities are comparable in terms of total area, population, as well as growth rates and agglomeration density. At the same time, both agglomerations are characterized by similar urbanization dynamics and urban environment structure.” These result of the paper should be a result of any the quantitative methodologies employed and not a statement based on qualitative or exploratory/descriptive analysis independent of the methodologies employed.
I would like to see how the similarities between the pair of cities are traduced into particular or specific public policies or strategies. For instance, since the pair of cities are more similar in terms of dimension 1 than dimension 2 , then we can say that variables concerning dimension1 should be applied in the same direction for the pair of cities. But, then, which are the variables with more communality or variance in dimension 1? Please explain better in the results.
On the other side, the discussion section is not well built. There is no comparison at all with previous papers or finding, where are the similarities and differences of your results with other papers mentioned in the literature review?
I cannot see the complement between methodologies in defining the similarities between cities. In isolation, the different methodologies are explained, but not as a complement to each other.
Author Response
Dear Reviewer,
We greatly appreciate your thorough and constructive comments on our manuscript. Your feedback has been extremely valuable in improving the quality and clarity of our work. Please find below our responses to each of your comments.
Comment 1: "In the abstract, it is not clear the expression: “and thus the relevance of foreign experience”. Foreign meaning “other cities”?"
Response 1: We agree that the wording could be clearer. "Foreign experience" was indeed intended to mean the experience of "other smart cities worldwide." We have revised the sentence in the abstract to reflect this.
- Location: Page 1, paragraph 1, lines 34-36.
- Revised text: "The proposed methodology allowed us to assess the similarity of urban development conditions, with an assumption that similar development conditions determine approaches to the development of smart cities, and thus the relevance of experiences from other smart cities worldwide that could be applied to Almaty and Astana." (Previously: "...and thus the relevance of foreign experience.")
Comment 2: "The following sentence should not be included in the literature review as it is a result/finding of the paper: “Cluster analysis demonstrates that these cities have the highest potential for transformation in smart cities, unlike other municipalities that require additional investments in digital infrastructure [7].”"
Response 2: The initial intention was to cite existing work [7] that had already made this demonstration. To avoid misconstruing this as a finding of our current paper, we have revised the phrasing to clearly attribute this finding to another quantitative study within the Literature Review, rather than moving it to Results, as our paper does not replicate that specific cluster analysis but builds upon the general potential identified by others.
- Location: Page 5, paragraph 2, lines 201-204.
- Revised text: "The results of existing studies show that Almaty and Astana have the most tremendous potential for the development of smart cities in Kazakhstan [6, 39]. Another research applying quantitative methods demonstrates that these cities have the highest potential for transformation in smart cities, unlike other municipalities that require additional investments in digital infrastructure [7]."
Comment 3: "I think the literature review should focus on the papers that have already used quantitative methods to classify or cluster cities based on the level of smartness. And, in this respect, the paper should stress on the main contribution and originality of this paper compare to the rest."
Response 3: We have expanded the Literature Review. A new thematic block about quantitative classifications of smart cities now summarizes key studies that use quantitative methods related to the topic of smart cities. We then explicitly position our paper and its contribution to filling the gaps in existing research.
- Location: Pages 5-7, lines 216-314. This includes discussions of prior works and our specific contributions (focus on fundamental conditions, comprehensive three-stage methodology, adaptation to specific context, and creating a basis for targeted knowledge transfer).
Comment 4: "Including physical topography among the indicators is a decision of the researchers that should be explained since another decision was to avoid cities which are seaports, or that are near the sea. The same criteria for avoiding seaports could be leading to avoid cities with certain topography. The same applies to climate conditions."
Response 4: We have added a new explanatory paragraph in Materials and Methods to justify this decision. We clarify that seaport access is considered a categorical economic advantage, making direct comparison with landlocked Almaty/Astana difficult. In contrast, topography and climate are continuous constraints that all landlocked cities manage, making them relevant analytical variables for understanding adaptation.
- Location: Page 10, paragraph 2, lines 462-477.
- Added text excerpt: "The selection of variables and exclusion criteria for this study was guided by their differential impact... Access to a seaport is a binary trait... Topography and climate, by contrast, vary along continuous scales... Retaining these variables lets the model show how otherwise similar cities adapt smart-city solutions to local physical constraints..."
Comment 5: "I would not refer in the methodology section to causality. The authors seem to express that the methodology use is a kind of causality experiment... However, only experimental techniques or pseudo experimental such as propensity score matching can control for endogeneity and are methods of causality."
Response 5: We have removed all direct references to "causal inference" and "collider variables." The revised text now emphasizes a "comparative, explanatory approach" focused on using baseline, pre-treatment-like conditions to avoid selection bias when identifying comparable cities, without making explicit causal claims.
- Location: Page 10, the last paragraph, lines 482-494.
- Revised focus: "Limiting our analysis to key socio-demographic and geographic indicators is an attempt to identify precisely those factors that influence decision-making... but are not... influenced by these strategies [104]."
Comment 6: "I think the cluster analysis should not focus on a pair wise analysis, because by doing this way, we impede large clusters and finding similarities with more cities. I would focus on the three clusters defined around height 6 from figure 2."
Response 6: While our primary aim remains finding the most similar cities for focused experience transfer (where tight pairs are often most practical), we have adjusted our presentation.
- Location: Page 13, the last paragraph, lines 613-617.
- Adjusted text: “The closest city for Astana is Ottawa, for Almaty - Denver. Ankara and Phoenix are also ​​in the same cluster as the previous two. Similarities with other clusters are found at a fairly large distance. The cluster analysis does not show proximity to Zaragoza, which means that the PCA missed important differences between the cities of interest and Zaragoza.”
Comment 7: "On the other side, variables/indicators included for conducting the Principal components analysis could not necessarily be the same as those included in the cluster analysis... The authors should include information of the variables/indicators and their weight in each methodology used."
Response 7: We used the same set of scaled variables for all three methods to ensure comparability of outputs. Figure 4 ("Variable contributions to PC1") is added to the discussion about the usage of loadings of variables. However, the use of ANOVA in the context of this paper does not seem necessary, since its purpose does not suit the goal of the paper. The paper lacks any treatment or experimental groups to allow for the use of a mean difference test, of which ANOVA is a variation. In addition, it does not seem reasonable to divide the cities into groups for the use of ANOVA.
- Location: Figure 4 on page 18.
Comment 8: "In this respect, the authors say in page 12, line 508 [original]: “These cities are comparable in terms of total area, population, as well as growth rates and agglomeration density...” - “These result of the paper should be a result of any the quantitative methodologies employed...”.
Response 8: The phrasing has been adjusted. The revised text now directly cites the outputs of the analysis. The summary of results for identified similar cities is now explicitly linked to quantitative outputs.
- Location: Page 15, paragraph 1, lines 632-644.
Comment 9: "I would like to see how the similarities between the pair of cities are traduced into particular or specific public policies or strategies... which are the variables with more communality or variance in dimension 1?"
Response 9: While a full policy transfer analysis is beyond this methodological paper's scope, we have added a new piece of text in the Discussion dedicated to the topic.
- Location: Page 17, paragraph 3, lines 751-772.
Comment 10: "On the other side, the discussion section is not well built. There is no comparison at all with previous papers or finding, where are the similarities and differences of your results with other papers mentioned in the literature review?"
Response 10: We agree and have substantially revised the Discussion section. It now explicitly contrasts our findings and methodological approach with specific previous studies mentioned in the literature review.
- Location: Page 15, paragraph 5, lines 662-679 and Page 16, paragraph 1, lines 680-690.
Comment 11: "I cannot see the complement between methodologies in defining the similarities between cities. In isolation, the different methodologies are explained, but not as a complement to each other."
Response 11: Unfortunately, we cannot add more to the discussion of method integration than is already stated in the text. Specifically, on page 12, paragraph 3, lines 566-586 and on page 17, paragraph 2, lines 745-750.
Reviewer 3 Report (New Reviewer)
Comments and Suggestions for AuthorsSee Attachment
Comments for author File: Comments.pdf
Author Response
Dear Reviewer,
Thank you very much for your constructive feedback and for recognizing the innovative and practical value of our methodological framework. We appreciate your thoughtful comments which have helped us improve the paper. Below are our responses to your specific suggestions.
Comment 1 (Introduction): "Indicating that the research contributes to the transfer of experience... but lacks in-depth analysis of the practical significance of Kazakhstan's urban development in specific fields such as economy, society, and environment..."
Response 1: To address this, we have added a dedicated paragraph in the Introduction (revised manuscript, page 2, lines 64-90) that provides a quantitative and referenced analysis of the practical significance of smart city development for Kazakhstan. This new text details the economic benefits (e.g., diversification from extractive industries, >29% of GDP [15]), social imperatives (e.g., managing effects of internal migration, 16-23% urban population increase [17]; reducing administrative burdens [15]), and environmental urgencies (e.g., high per-capita carbon emissions, 2.2x EU average [18]; severe air pollution in Almaty [20]). This addition directly links to Kazakhstan's national strategies [21-22] and thereby better underscores the importance and practical relevance of our research.
- Location: Page 2, paragraph 3, lines 64-90.
Comment 2 (Introduction): "The description of local features such as 'continental climate' and 'intra urban migration' can be associated with variable selection."
Response 2: We agree. The revised Introduction (Page 2, lines 59-60) mentions these features. Their connection to variable selection is now more explicitly supported by the rationale for variables like 'temperature difference' (diff_temp), 'average annual population growth rate', 'city_density' described in Materials and Methods (pages 8-10). Additionally, the new text in "Materials and Methods" (Page 10, lines 463-477) discussing topography and climate explicitly links these to Kazakhstan's continental context.
Comment 3 (Introduction): "Use specific cases to illustrate the significant impact of seaports on the economy."
Response 3: In our justification for excluding seaport cities (Materials and Methods), we have referred to existing literature [10-12, 101] that details these economic impacts. Some speicifc illustrations, for example, concerning Rotterdam's maritime logistics premium (citing Mudronja et al., 2020), have been added to this section.
- Location: Page 10, paragraph 3, lines 463-478.
Comment 4 (Literature review): "When elaborating on the current research status of smart cities in Kazakhstan... The logical sequence is not reasonable enough..."
Response 4: We have reordered the paragraph in the Literature Review discussing Kazakhstan (page 5, paragraph 1, lines 199-212) to follow a more logical sequence: overall potential of Almaty/Astana -> confirmation by other studies -> leadership examples -> existing obstacles -> need for relevant experience.
Comment 5 (Literature review): "Lack of specialized discussion on "urban similarity" research"
Response 5: As addressed for Reviewer 2 (Comment 2-3), we have integrated the additions within the quantitative methods review reviewing seminal works in urban similarity and comparative urban analysis, highlighting the methodological approaches used.
Comment 6 (Materials and Methods): "Supplementary parameter selection criteria (reason for setting the percolaty [perplexity] value of t-SNE to 5)."
Response 6: We have added a detailed justification for t-SNE parameter selection (perplexity=5, eta=100, max_iter=1000) in the Discussion section.
- Location: Page 17, paragraph 1, lines 738-744.
Comment 7 (Results): "The chart display is complete, but the visualization effect needs to be optimized. Key points can be marked with different colors."
Response 7: Figures 1 (PCA) and 3 (t-SNE) have been recreated with color-coding (e.g., Almaty/Astana in red; closest analogues like Denver/Ottawa in green; second-tier cities in blue), as detailed in the R-script (Appendix B, lines 930-957 and 987-1017).
- Change in revised manuscript: Figures 1 and 3 updated with color-coding.
Comment 8 (Overall): "Suggest the author to check and revise each item to enhance the rigor and readability of the paper."
Response 8: We have thoroughly reviewed and revised the entire manuscript, incorporating all feedback to enhance its rigor, clarity, and readability.
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsVery interesting article from a hypothesis perspective (cities with similar development conditions would likely benefit from similar smart city strategies), well-written and with interesting references! Mainly regarding the methodology, I noted some questions-problems:
- (1) Are there enough quantitative indicators that measure to define the ''development conditions'' in each city?
Spatial characteristics - The physical size and configuration of the urban area (city area, agglomeration area)
Demographic characteristics - Population size, distribution, and growth patterns (population, gender ratio, growth rate)
Urban density - The concentration of people within the urban space (agglomeration density)
For example, other factors such as whether the city is flat (elevation within the city, hilliness/ steepness) are not taken into account. According to their measurements, two cities may seemingly have the same result in development conditions, one being flat with bike lanes and the other built on hills, with narrow streets, without the possibility of even creating a sidewalk. This will result in a smart city strategy, assuming that a plan for green transportation will be equally effective in both cities.
- (2) It is not specified what they define as "smart city strategies".
The spectrum is too wide to conclude that if the development conditions are similar, then any smart city strategy will perform the same. Due to my PhD, I focused on examples that have to do with sustainable transportation and I selected in the previous observation (1), but the indicators they use may be suitable for some smart city applications (e.g. digital services or where recycling stations should be placed).
- (3) They have not taken into account the state of infrastructure, but only whether it exists or not.
If, for example, in some areas of the city there is no possibility of good internet speed or the age of the buildings. They measure population density without seeing if this population is in a city where the buildings cannot support photovoltaics, for example, or other systems.
- (4) They do not take qualitative indicators at all - they simply see the result of urban development in each city and do not examine why.
This poses a risk of proposing smart city strategies in cities that seemingly have similar urban development but whose residents operate completely differently in terms of their movement, waste management, energy consumption, etc.
In summary, my concerns are about the internal validity (whether authors measure development conditions correctly-notes 1 and 3) and the external validity (whether what they propose can actually be applied-notes 2 and 4) of the research.
Assuming that the indicators they measure could be proxy variables, their conclusions may be acceptable.
Reviewer 2 Report
Comments and Suggestions for AuthorsIn this paper, the authors analyze different approaches for associating two cities in Kazakhstan with other cities around the world from the point of view of “smart cities”. Seven key indicators are considered to apply three statistical methods.
Introduction
- In your introduction, you haven’t compared smart cities with digital twins. Why?
- How did you select the three statistical methods (PCA, HCA, t-SNE) for identifying the most relevant cities in the world compared to Astana and Almaty? Do you have references? (line 74)
- How did you identify and create your “sample of smart cities”? (line 74)
Literature review
- Why do you consider studies based on countries, like Belgium, Japan and Sweden, if your works are focused on cities? It is not the same geographical level. (line 119)
- When you mention small cities (line 126), what does it mean? Is it based on population or territory size?
- More details should be given for each reference in the paragraph at line 125, to improve the understanding of these specific projects.
Materials and Methods
- In this section, the authors describe the methodology for accomplishing their study.
- It is difficult to understand why only the cities in the Top 50 from IMD Smart City Index 2024 are considered. A top 50 is not a credible criterion for me. Maybe the more relevant city is ranked at position 55, but you don’t consider this!? Why don’t you apply a first selection or filter based on territorial characteristics (access to oceans)?
- As in the introduction, you don’t justify the usage of three statistical method to analyze your data. Why these methods compared to others (regression, …)? Why three methods and not two or four? Do you have references to justify this methodology based on the analysis of the results of applying these methods to 17 cities?
- The selection of the key indicators is based on what (line 202)? Do you have references?
- What do the acronyms OECD (line 215), PMC (line 227), GDP (line 244) mean?
- You only selected seven key indicators. It seems like a restricted selection. Why don’t consider more criteria or variables?
- You mention that the methods are combined in sequence (line 345) : PCA, Clustering and then t-SNE. Is it really a combination of methods? For instance, could you apply the clustering method before the PCA method?
Results
- This section presents the results of each statistical method applied to 19 cities (Astana and Almaty included). An analysis is given for each result. We can easily see on the diagrams that Ottawa (Canada) is close to Astana; and Madrid (Spain) and Warsaw (Poland) are close to Almaty.
- In figure 1, what PC1 and PC2 mean as axis of the diagram?
Discussion
- I agree with the mentioned limitations.
As a conclusion, I like the global approach and the idea of defining strategies to identify similarities between cities in a smart city context. But I am frustrated with the key indicators (number and type) and the label of “smart city” associated with this work. Your indicators are standard and not associated with “smart city”. We can apply your methodology to other domains (business, commerce, and so on.). When we read the literature review, many indicators are mentioned from different categories linked to smart cities or not. But you only consider geography and population (and gender). Economic and technological aspects could be considered. And what about “quality of life” and “well-being”? And the population age? Meteorological aspects? …
I understand the limitation mentioned in the section “Discussion”, but I would like to be informed about that at the beginning of the paper. That’s why, I think that the goal of your paper is not clear. It is not really linked to “smart city”. Maybe it could be better to give us lessons about the mechanisms, strategies and methods to compare cities in different domains like smart cities. “Smart cities” becomes the applicative context. So, it could be better to criticize the difficulty of identifying key indicators for a specific domain.