Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Contrastive Learning with Image Deformation and Refined NT-Xent Loss for Urban Morphology Discovery

ISPRS Int. J. Geo-Inf. 2025, 14(5), 196; https://doi.org/10.3390/ijgi14050196

by Chunliang Hua¹, Daijun Chen², Mengyuan Niu¹

, Lizhong Gao¹, Junyan Yang² and Qiao Wang^1,3,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4: Anonymous

ISPRS Int. J. Geo-Inf. 2025, 14(5), 196; https://doi.org/10.3390/ijgi14050196

Submission received: 14 March 2025 / Revised: 20 April 2025 / Accepted: 29 April 2025 / Published: 8 May 2025

(This article belongs to the Special Issue Advances in AI-Driven Geospatial Analysis and Data Generation (2nd Edition))

Round 1

Reviewer 1 Report (Previous Reviewer 2)

Comments and Suggestions for Authors

I reviewed this manuscript at IJGI in January of this year. There were some improvements in this manuscript, but there were still a lot of issues that were not clear enough. Also, the references are missing.

(1) In line 60, the author mentioned: "These differences result in significant visual distinctions." Furthermore, are the significant visual differences due to differences in scale?

(2) In lines 96-97, the author mentioned: "Eventually, each pipeline categorized these samples into five classes." Please provide a detailed description of these five categories and the reasoning behind the classification.

(3) In lines 198-201, the author mentioned: "However, due to the fact that our dataset contains many samples that are far from the central urban area and have low building density, we achieved unsatisfactory results." Since your model is better than the one mentioned in line 194, please provide evidence in the manuscript, such as comparison data, to demonstrate the superiority of the new model.

(4) In lines 477-479, the author mentions: "This is because Shanghai has a high degree of urbanization, which does not perfectly align with our assumptions about the dataset. In contrast, Beijing and Chongqing have lower urbanization rates." Is it true that Beijing and Chongqing have lower urbanization rates than Shanghai? How was this determined? Additionally, this may be related to the spatial distribution of the data samples.

(5) In lines 484-485, the author mentions: "Extracting the 10 samples closest to the cluster centers for each cluster." Why were only 10 samples selected? Are there other similar studies that used this algorithm?

(6) In lines 490-491, the author mentions: "We repeated the above experimental process ten times. The statistic is shown in Table 2. It can be seen that D3 is almost always less than D2." Table 2 only shows averages, so it cannot be concluded that D3 is always less than D2. Additionally, in the "Frequency of getting a smaller value" section, D3 of Shanghai is greater than D2. Please revise this sentence.

(7) Regarding Figure 8 (Page 15), there is a lack of samples from the urban center, which raises concerns about the model's validity. This seems more like a study of the morphology of suburban settlements. This also raises concerns about the sample selection in Beijing and Chongqing.

Author Response

Comments 1: In line 60, the author mentioned: "These differences result in significant visual distinctions." Furthermore, are the significant visual differences due to differences in scale?

Response 1: Thank you for pointing out this deficiency. You are correct. We believe that the significant visual differences are due to differences in scale and density. Sample 1 has more buildings, which are concentrated in the center of the image. In sample2, there are fewer buildings and they are scattered. These differences result in significant visual distinctions. We have restructured the paper, and this section has now been moved to page 8, with a more detailed explanation located on lines 294-312.

Comments 2: In lines 96-97, the author mentioned: "Eventually, each pipeline categorized these samples into five classes." Please provide a detailed description of these five categories and the reasoning behind the classification.

Response 2: Thank you for pointing this out. In fact, we do not have a strict criterion for selecting the number of clusters; instead, it is done according to the requirements of our project partners. They are mentioned in the acknowledgments and are also the ones who provided us with anonymous scores afterward.

Comments 3: In lines 198-201, the author mentioned: "However, due to the fact that our dataset contains many samples that are far from the central urban area and have low building density, we achieved unsatisfactory results." Since your model is better than the one mentioned in line 194, please provide evidence in the manuscript, such as comparison data, to demonstrate the superiority of the new model.

Response 3: Thank you for pointing this out. The model used in literature [12] is the unmodified SimCLR framework, which corresponds to SimCLR1 in this paper. In the comparative experiments, we have described the characteristics of SimCLR1, such as lower ratings in Table 1, excessive focus on overall visual features, and so on.

Comments 4: In lines 477-479, the author mentions: "This is because Shanghai has a high degree of urbanization, which does not perfectly align with our assumptions about the dataset. In contrast, Beijing and Chongqing have lower urbanization rates." Is it true that Beijing and Chongqing have lower urbanization rates than Shanghai? How was this determined? Additionally, this may be related to the spatial distribution of the data samples.

Response 4: Thank you for pointing this out. We agree with your point. What we intended to express is indeed the proportion of built-up area. Shanghai, Beijing, and Chongqing have similar populations and built-up area, but the total area of Shanghai is only 6340 square kilometer, Beijing has 16410 square kilometers, and Chongqing has 82400 square kilometers. We have revised it to a more appropriate statement, correcting the urbanization rate to the proportion of built-up area. This modification can be found on lines 486 and 488.

Comments 5: In lines 484-485, the author mentions: "Extracting the 10 samples closest to the cluster centers for each cluster." Why were only 10 samples selected? Are there other similar studies that used this algorithm?

Response 5: Thank you for pointing this out. In fact, this is not a standard practice. In Figures 7, A.1, and A.2, we selected the 10 samples closest to the cluster centers for visualization. Therefore, we chose 10 samples to maintain consistency with the visualized results of clustering.

Comments 6: In lines 490-491, the author mentions: "We repeated the above experimental process ten times. The statistic is shown in Table 2. It can be seen that D3 is almost always less than D2." Table 2 only shows averages, so it cannot be concluded that D3 is always less than D2. Additionally, in the "Frequency of getting a smaller value" section, D3 of Shanghai is greater than D2. Please revise this sentence.

Response 6: Thank you for pointing this out. I apologize for any confusion in our language expression. In the "Frequency of getting a smaller value" section, D3 of Shanghai is 0.9. This means that D3 achieved a smaller value than D2 in 90% of the experiments. So we believe that D3 is almost always less than D2. We have now modified the table, changing " Frequency of getting a smaller value " to "Frequency of D2 > D3". This should be clearer.

Comments 7: Regarding Figure 8 (Page 15), there is a lack of samples from the urban center, which raises concerns about the model's validity. This seems more like a study of the morphology of suburban settlements. This also raises concerns about the sample selection in Beijing and Chongqing.

Response 7: Thank you for pointing this out. We understand your concerns about the data defect. This issue was due to a defect in our original dataset. To response this comment, we re-collected the data and ultimately filled in most of the missing samples. The current study area is shown in the word document below.

We performed urban morphology discovery on this dataset again, and the results are shown in the word document below.

It can be seen that this result has a high similarity to the results in the manuscript. Therefore, we believe that the effectiveness of the model and the conclusions in the manuscript were not affected by the defects in the dataset. For the data of Beijing and Chongqing, such missing data did not occur. However, in order to maintain the sample size of Chongqing at the same order of magnitude as Beijing and Shanghai, we randomly deleted some samples from Chongqing.

Author Response File: Author Response.docx

Reviewer 2 Report (Previous Reviewer 4)

Comments and Suggestions for Authors

The revisions are in line with the previous recommendations.

Author Response

Thank you for your comments!

Reviewer 3 Report (Previous Reviewer 1)

Comments and Suggestions for Authors

We thank the authors for carefully considering the preliminary review opinion and revising the manuscript. After the review, I think the author has made effective improvements to the problems raised before, which are shown as follows:
1.Add the specific description of the image deformation pipeline technology, gives the algorithm pseudo code and provides the specific effect of the deformation strategy.
2.The paper adds the limitations of the model and further explores the future directions.

At present, the quality of manuscripts has been significantly improved, and it is recommended to accept them for publication.

Author Response

Thank you for your comments!

Reviewer 4 Report (New Reviewer)

Comments and Suggestions for Authors

This study intends to develop a method for bringing city images to more useful morphological indicators.

General comments

This manuscript is not easy to follow. The purposes of the study are not presented with sufficient clarity. At the current state, it is obvious the study has not been finished yet.

In my opinion, some aspects of the Introduction belong to a later section of the manuscript. I suggest making an effort to improve the organization of the manuscript.

The methodology section mentions some key aspects of the study’s approach. I think presenting a more specific description of your method will improve the readability of the paper. For example, explain in more detail how the optimization function for minimizing the loss function. Never mind: Loss function is in Eq. (3).

Comments about Form:

The manuscript repeatedly uses the term “deformation”. Deformation implies some degree of distortion, which does not get along with precision and scientific procedures. Would not be better to use the word “transformation”?

NOlli maps or Nolli maps ?

Final comments:

This manuscript is not finished. Much improvements need to be implemented. Most of it, however, has to do with form. The bottom of the subject is valid and deserves further consideration. In order to be published, the authors should first improve their exposition of the objectives of the study and the methods they plan to use to reach those objectives.

Author Response

Comments 1: In my opinion, some aspects of the Introduction belong to a later section of the manuscript. I suggest making an effort to improve the organization of the manuscript.

Response 1: Thank you for pointing out this deficiency. We have moved some intricate details from the introduction to page eight, where they now form a new section subordinate to "Study Area." As an alternative, we have added content on lines 52 to 63 to explain the motivations that drove our innovations.

Comments 2: The methodology section mentions some key aspects of the study’s approach. I think presenting a more specific description of your method will improve the readability of the paper. For example, explain in more detail how the optimization function for minimizing the loss function. Never mind: Loss function is in Eq. (3).

Response 2: Thank you for pointing this out. The optimizer we used is Adam. The detailed parameters for optimization have been added in lines 443-449, and the content is as follows:

“They all have the same parameters with a weight decay of 10⁻⁶, a minibatch size N of 4, a learning rate of 10⁻² and a temperature parameter 10⁻³. The optimizer used was Adam, which is a widely used optimization algorithm in deep learning. It combines the ideas of momentum methods (Momentum) and RMSProp, aiming to adjust the learning rate for each parameter by computing the first-order and second-order moment estimates of the gradients, thereby achieving more efficient network training. The training process spans 30 epochs.”

Following the convention of other papers, we did not provide the specific optimization function but only gave the optimization parameters. The detailed optimization function can be found in reference [34].

Comments 3: The manuscript repeatedly uses the term “deformation”. Deformation implies some degree of distortion, which does not get along with precision and scientific procedures. Would not be better to use the word “transformation”?.

Response 3: Thank you for pointing this out. We have carefully considered your comment and believe they are reasonable, but this would require a change to the title of our manuscript, so I am in communication with the editor to ask if this is feasible.

Comments 4: NOlli maps or Nolli maps ?

Response 4: Thank you for pointing this out. We have corrected these typos and standardized them to Nolli.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report (Previous Reviewer 2)

Comments and Suggestions for Authors

The author has made careful revisions. I recommend accepting the manuscript.

Reviewer 4 Report (New Reviewer)

Comments and Suggestions for Authors

In this version, the manuscript has improved. A clearer structure now allows for an easier reading as compared to the previous version. I think it now deserves publishing.

Just one question: What does 7. Patents stand for at the end of the text?

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper extracts and analyzes urban morphology features by combining image deformation techniques and improved NT-Xent loss function to solve the problem of fragmentation and center bias in traditional grid cell-based urban morphology studies. The results show that this method can effectively capture the architectural complex characteristics of representative cities and provide a new technical reference for urban planning and social research. Furthermore, the following issues should be revised:

1.New image deformation pipeline and improved NT-Xent loss function are presented, but the specific description and theoretical support for the technical implementation are lacking. It is suggested to add detailed formula derivation, algorithm pseudocode, and specific effect analysis of deformation strategy and loss function improvement to improve the transparency and reproducibility of the method.

2.Although the current study demonstrates the effectiveness of the proposed method, it lacks a systematic comparison with other mainstream comparative learning frameworks (SimCLR、BYOL). It is suggested to add more comparative experiments, and add richer evaluation indicators (such as generalization ability, computing efficiency,and adaptability to marginal cities), in order to comprehensively evaluate the advantages and limitations of the proposed method.

3.The paper does not fully discuss the limitations of methods in practical applications, such as specific application scenarios in urban planning, social research, or possible errors under certain special building layout. The proposal is further extended in the discussion section, emphasizing the potential application value of the research and the future direction of optimization.

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript attempts to study urban morphology using the refined NT-Xent loss and contrastive learning. However, there are several issues with the manuscript.

(1) Regarding Figure 1 (line 64), why did you choose an image of Changzhou instead of research areas? Additionally, please supplement with remote sensing images to help readers better understand.

(2) In lines 59-60, the author mentioned: "These differences result in significant visual distinctions." Furthermore, are the significant visual differences due to differences in scale?

(3) In line 85, the author mentioned: "We then used three representative Chinese cities: Shanghai, Beijing, and Chongqing, as experimental materials for separate experiments." Chongqing is a mountainous city, and it is quite different from Shanghai and Beijing. Why was Chongqing selected?

(4) In line 89, the author mentioned: "Eventually, each pipeline categorized these samples into five classes." Please provide a detailed description of these five categories and the reasoning behind the classification.

(5) The manuscript lacks author names for cited references. For example, lines 129 (For example, [16]), 131 ([17] proposed), 133 ([18] assessed), 135 ([19] formed), 137 ([20] designed), 161 ([26] learning), 162 (Additionally, [27]), 170 ([29] summarizes), 173 ([12] uses), 178 (Our paper was inspired by [12]).

(6) In lines 182-184, the author mentioned: "However, due to the fact that our dataset contains many samples that are far from the central urban area and have low building density, we achieved unsatisfactory results." Since your model is better than the one mentioned in line 178 ([12]), please provide evidence in the manuscript, such as comparison data, to demonstrate the superiority of the new model.

[12] Wang, J.; Huang, W.; Biljecki, F. Learning visual features from figure-ground maps for urban morphology discovery. Computers, Environment and Urban Systems 2024, 109, 102076.

(7) In line 222, the author mentioned: "students and faculty at the School of Architecture for scoring," and in line 386, it mentions "some teachers and students from the School of Architecture." If these are teachers and students from the architecture or urban planning departments, please specify their majors and numbers.

(8) In line 389, the author mentioned: "Six people participated in the scoring." Please provide detailed information on the six participants, specifying their major and whether they are teachers or students.

(9) The table titles are positioned below the tables, including Table 1 (line 391), Table 2 (line 397), and Table A.1 (line 512).

Please refer to the format provided in https://www.mdpi.com/journal/ijgi/instructions.

(10) In lines 394-396, the author mentions: "This is because Shanghai has a high degree of urbanization, which does not perfectly align with our assumptions about the dataset. In contrast, Beijing and Chongqing have lower urbanization rates." Is it true that Beijing and Chongqing have lower urbanization rates than Shanghai? How was this determined? Additionally, this may be related to the spatial distribution of the data samples.

(11) In lines 401-402, the author mentions: "Extracting the 10 samples closest to the cluster centers for each cluster." Why were only 10 samples selected? Are there other similar studies that used this algorithm?

(12) In lines 407-408, the author mentions: "We repeated the above experimental process ten times. The statistic is shown in Table 2. It can be seen that D3 is almost always less than D2." Table 2 only shows averages, so it cannot be concluded that D3 is always less than D2. Additionally, in the "Frequency of getting a smaller value" section, D3 of Shanghai is greater than D2. Please revise this sentence.

(13) Regarding Figure 7 (Page 13), there is a lack of samples from the urban center, which raises concerns about the model's validity. This seems more like a study of the morphology of suburban settlements. This also raises concerns about the sample selection in Beijing and Chongqing.

(14) In lines 429-432, the author mentioned: "This form consists of sparse buildings with small floor areas, typically found in rural areas. Sometimes it exhibits a linear distribution pattern, which is because rural buildings are often arranged along the road network." The author refers to rural buildings, and the description of Figure 6c also seems typical of rural areas. However, the study title included "Urban Morphology Discovery." Please provide a reasonable explanation.

(15) In line 483, the author mentioned "administrative village units," which does not fit the title of "Urban Morphology Discovery."

(16) In the references, lines 519, 521-522, and 541 contained "https://doi.org/https://doi.org/." Please revise them.

Reviewer 3 Report

Comments and Suggestions for Authors

This article combines learning with image deformation and refined NT-Xent loss for the analysis and research of urban morphology. Research methods are innovative. However, there are still many research doubts that need to be clarified when introducing the research method of visual neural networks (primarily CNNs) intoNolli maps to examine changes in urban form. The research doubts of this article are as follows:

1. Nolli maps are crucial in traditional urban forms, but the biggest problem is that Nolli maps are two-dimensional images. In traditional settlement research, there is not much difference in building height between each other. This two-dimensional image will not have a huge difference in building height due to land use and urban development. This article actively responds to whether the selected analysis target has the problem of building height difference, and this problem will also directly affect the results and findings of the research.

2. In addition, the cities selected for this article are the first-tier cities in China, and their cultural backgrounds, geographical locations and climate conditions are very different. The development of urban form will also cause changes in urban form due to latitude, climate, and topography. However, these are not described in this article. Why choose three cities with very different types and geographical conditions? How to eliminate the influence of these humanistic background and geographical conditions. Or how to prove that the geographical background of these cities will not affect the final results of this research?

3. In short, this article proposes a research model that integrates new methods with traditional Nolli maps, which is worthy of recognition. But how to verify and eliminate the influencing factors of the background environment and prove that the research has validity and reference value. Therefore, the credibility of this article's contributions and conclusions will be greatly limited. The author should add information such as the urban geographical background and urban morphological spatial structure and conduct in-depth analysis to prove that the methods and content adopted in this article are appropriate and feasible.

4. In addition, for the numbers and calculation formulas in this article, in addition to presenting the numbers themselves, these quantitative studies should also be supplemented with some corresponding visual charts to display the research results. As this article is a design-oriented research, the method and level of image presentation should also be paid attention to. Although the research motivation of this article is correct and good, it still needs a lot of improvement in terms of research rigor, data reliability, graphics and text quality to meet the requirements of the journal.

Reviewer 4 Report

Comments and Suggestions for Authors

The obtained results can be better correlated with the research's main goals and, mainly, with the overall morphological defined framework. In addition, the discussion section can make more clear how the obtained results will robust morphological readings and the interrelated layers it deals with. The conclusion section can also highlight a more critical perspective concerning the obtained results, addressing a more articulated narrative when closing the paper.

Reviewer 5 Report

Comments and Suggestions for Authors

Abstract

The abstract introduces the context of urban morphology discovery but lacks clarity and precision.

• The statement “Previous studies on urban morphology discovery have always been based on grid cells and have been limited to the central city area” lacks evidence or citation. Additionally, “grid cells” is unclear—if it refers to raster analysis, this should be specified.

• While the critique of grid cells leading to fragmented forms and focusing only on central city areas is valid, it remains vague. The claim that previous studies predominantly focused on downtown areas requires more elaboration and contextualization.

• The decision to test only three Chinese cities limits the broader applicability of the methodology. Including cases that align with previous studies or comparing the results with prior research would make the study more compelling.

1. Introduction

The introduction outlines key concepts but lacks coherence in framing the study’s goals and methodology. It also omits crucial literature, making it harder to understand the research’s novelty and contribution.

• Lines 25–30: The distinction between “figure” and “ground” is not clearly defined, which is essential for understanding the methodology.

Figure: Typically refers to “solid space,” such as buildings or blocks, and does not include parks.

Ground: Refers to “void space,” which includes open spaces, plazas, or roads. Parks, as open spaces, are often considered part of the “ground,” but this categorization should be explicitly stated. The study needs to clarify whether parks are treated as part of the “ground” category in this research.

This ambiguity may confuse readers unfamiliar with figure-ground

• Established references, such as Trancik’s Finding Lost Space, should be cited to provide a more robust definition. Furthermore, the inclusion of parks as “ground” in figure-ground maps should be clarified, as parks are typically considered open spaces. The introduction introduces both the “Nolli map” and “figure-ground map” concepts but does not address their relative prevalence or typical usage in urban morphology studies.

• The study should clarify which mapping technique is more typical for the analysis and why it was chosen. This will help align the methodology with the study’s objectives and clarify its contribution to urban morphology research.

• The inclusion of case study images (Figure 1) before establishing the research gap and objectives is premature. The abrupt transition detracts from the introduction’s flow. Research goals remain unclear, making it difficult to discern whether the study seeks to test updated methods or argue for their applicability.

2. Related Work

This section, while informative, is insufficiently developed to provide a comprehensive review of relevant literature.

• Previous studies on figure-ground maps and their methodologies are not adequately addressed. There is little discussion of the strengths and weaknesses of existing approaches or how the study’s proposed methodology addresses these limitations.

• The section emphasizes emerging methods but fails to provide enough detail on why these methods are innovative or how they improve upon traditional approaches. A critical review of related work, detailing how this study advances the field, is needed to establish its significance.

3. Methodology

The methodology section provides some details but lacks justification for key decisions and clarity in its description.

• The exclusive focus on Chinese cities is insufficiently justified. For a study on urban morphology, it is crucial to explain why these cities were chosen and how they contribute to addressing the research gap. If the methodology is universally applicable, this should be explicitly stated; otherwise, the unique relevance of Chinese cities should be clearly demonstrated.

• Section 3.3 (SimCLR and Improvements) is difficult to follow without prior knowledge of SimCLR. The methodology should begin with a concise summary of SimCLR and the specific improvements made in this study. A conceptual framework diagram illustrating the methodological workflow would significantly enhance clarity.

• The methodology lacks a discussion of expected outcomes, which would help orient readers and provide context for the subsequent results.

4. Results

The results section presents findings but suffers from poor structure and inadequate explanation.

• The clustering results are presented without sufficient interpretation, leaving readers uncertain about their implications for urban morphology.

• Lines 386–391 (Table 1): It is unclear whether the survey results are part of the methodology or an auxiliary analysis. This ambiguity undermines the study’s coherence.

• Figure 7: Only results for Shanghai are shown, excluding the other two case studies. Additionally, the figure lacks a legend, making the color scheme incomprehensible.

• Section 4.3: Comparing figure-ground maps with satellite images is useful, but the inclusion of all (a) figures seems redundant as they do not directly add to the analysis.

To improve clarity, the results section should end with a summary paragraph synthesizing key findings. This would provide a smoother transition to the discussion.

5. Discussion

The discussion attempts to contextualize the findings but lacks depth and critical engagement with the literature.

• There is little reference to whether the findings confirm, diverge from, or challenge previous studies. Without this context, it is difficult to assess the study’s contribution to the field.

• The discussion does not reflect on the methodology’s effectiveness. For instance, were there challenges in applying SimCLR or any limitations in the analysis? Addressing these points would improve transparency and credibility.

• A dedicated section on research limitations is missing. This section should discuss constraints related to data availability, case study selection, or methodological applicability, providing a more balanced evaluation of the study.

6. conclusion

The conclusion is repetitive and overlaps with the discussion, reducing its impact.

• The conclusion summarizes the study but fails to provide actionable insights or future research directions. Including suggestions for future research—such as testing the methodology in different geographic regions or exploring its application in other urban morphology contexts—would enhance the conclusion’s depth.

Comments on the Quality of English Language

No further comment

Article Menu

Contrastive Learning with Image Deformation and Refined NT-Xent Loss for Urban Morphology Discovery

Further Information

Guidelines

MDPI Initiatives

Follow MDPI