How People Recognize a Street: Enhancing Perceived Identity for Socio-Environmental Sustainability
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis study takes Huaihai road as an example to explore the impact of low-level visual features on the rapid recognition of urban streets. The topic is interesting and the entire paper is well-structured. However, considering land is a highly qualified journal, some issues still need to be addressed. I recommend a major revision before the manuscript can be considered for publication. Below are my comments:
- The theoretical foundation of the "1. Introduction" section is somewhat weak. It is suggested to supplement literature on the connection between low-level visual features (spatial frequencies, color) and urban street perception to strengthen the theoretical framework.
- Please check the index number of each title in the article, such as "3.1. Participants" in the "2. Method" section.
- 22/45 participants were Shanghai residents, likely familiar with Huaihai Road. Therefore, it is necessary to analyze its impact on results and discuss limitations.
- Ensure all the figures (Figures 2 and 4) and text are visually clear and visible.
- In the "4 Discussions" section, the suggestions proposed (such as "prioritize color and form") are too vague for urban designers. Which elements matter most? How should policies balance detail vs. simplicity? Suggest providing actionable examples.
- Finally, a more critical perspective is required on the results obtained such as a more sophisticated account of the implications of the findings and a more advance on the conclusions.
Author Response
Please see the attachment.
|
Comments 1: The theoretical foundation of the "1. Introduction" section is somewhat weak. It is suggested to supplement literature on the connection between low-level visual features (spatial frequencies, color) and urban street perception to strengthen the theoretical framework. |
|
Response 1: Thank you very much for this insightful comment. We agree that the theoretical grounding in the Introduction was previously insufficient, particularly regarding the linkage between low-level visual features and urban street perception. In response, we have expanded the relevant discussion in the revised manuscript. Corresponding references have been added to ensure a more rigorous and comprehensive theoretical foundation. The newly added paragraph is as follows: (see page 3, line 105) “In addition to their fundamental role in visual cognition, low-level visual features also shape how people perceive and evaluate urban streets. Studies show that images containing naturalistic spatial frequency patterns are generally preferred and can enhance perceptions of naturalness within urban environments [1, 2]. Urban settings that incorporate low-level visual properties resembling natural patterns—such as façade textures or vegetation—are often perceived more positively, with natural elements like trees and plants being associated with enhanced feelings of safety, liveliness, and beauty [3]. Color, another key low-level feature, also significantly influences people’s emotional and cognitive impressions of urban streets. Bright colors can enhance perceived vibrancy and playfulness, while excessively low brightness may reduce attractiveness and safety [4]. At the same time, high color saturation may increase the perception of boredom, and very low brightness can diminish the perceived vitality of urban environments [5]. In addition, color similarity and balance contribute positively to street evaluation, even though color compatibility does not necessarily influence perceptions of wealth or safety [6]. 1.Valtchanov, D. and C.G. Ellard, Cognitive and affective responses to natural scenes: Effects of low level visual properties on preference, cognitive load and eye-movements. Journal of environmental psychology, 2015. 43: p. 184-195. 2. Akcelik, G.N., K.E. Schertz, and M.G. Berman, The influence of low-and mid-level visual features on the perception of streetscape qualities, in Human Perception of Visual Information: Psychological and Computational Perspectives. 2021, Springer. p. 241-262. 3. Li, Y., et al., Exploring Urban Spatial Quality Through Street View Imagery and Human Perception Analysis. Buildings, 2025. 15(17): p. 3116. 4. Hu, K., et al., Research on street color environment perception based on CEP-KASS framework. Buildings, 2023. 13(10): p. 2649. 5. Chen, N., et al., Perception of urban street visual color environment based on the CEP-KASS framework. Landscape and Urban Planning, 2025. 259: p. 105359. 6. Song, M. and Y. Xiao, Does streetscape color matter for urban perceptions? A deep learning approach to street view images. Land Use Policy, 2025. 155: p. 107581.” |
|
Comments 2: Please check the index number of each title in the article, such as "3.1. Participants" in the "2. Method" section. |
|
Response 2: Thank you very much for your careful reading of our manuscript and for pointing out the editing error. We sincerely apologize for this oversight. We fully understand that such issues may affect the clarity and professionalism of the manuscript, and we are truly sorry for any inconvenience it may have caused during your review. We have now corrected the numbering problem and carefully rechecked the entire manuscript to ensure that no similar inconsistencies remain. |
|
Comments 3: 22/45 participants were Shanghai residents, likely familiar with Huaihai Road. Therefore, it is necessary to analyze its impact on results and discuss limitations. |
|
Response 3: Thank you very much for raising this important point. We sincerely appreciate your careful consideration of the potential influence of participants’ familiarity with Huaihai Road on the experimental results. In response, we have added a dedicated analysis to assess whether familiarity affected recognition accuracy. Specifically, we conducted a one-way ANOVA we performed a one-way ANOVA with recognition accuracy as the dependent variable and participant group (Shanghai residents vs. non-residents) as the independent factor. The results confirmed a significant effect of group on recognition performance (F(1, 223) = 5.24, *p* = .023, η² = .023), indicating that Shanghai residents achieved higher accuracy than non-residents. This finding directly supports that greater familiarity enhances the correct recognition of the street. We have integrated this analysis and conclusion into the revised manuscript. We acknowledge that this result introduces a limitation to our study, as recognition performance may have been partially influenced by participants’ prior experience rather than solely by visual features. However, this influence was reduced through our recruitment procedure, in which we intentionally balanced Shanghai residents and non-residents at a 1:1 ratio to limit the effect of street familiarity. And we have added this issue to the limitations section to ensure full transparency in the interpretation of our findings. The following paragraph has been added to the manuscript: (see page 10, line 374) “Meanwhile, we also found that participants who were more familiar with Huaihai Road demonstrated higher recognition accuracy. To examine whether familiarity influenced participants’ ability to recognize the street based on its visual features, we conducted a one-way ANOVA with recognition accuracy as the dependent variable and participant group (Shanghai residents vs. non-residents) as the independent factor. The results showed a significant effect of group on recognition performance, indicating that Shanghai residents achieved higher accuracy than non-residents (F(1,223) = 5.24, p = 0.023, η²=0.023).” (see page 17, line 644) “Lastly, although our 1:1 recruitment of Shanghai residents and non-residents helps reduce familiarity-related bias, recognition performance may still reflect prior experience in addition to low-level visual cues.” |
|
Comments 4: Ensure all the figures (Figures 2 and 4) and text are visually clear and visible. |
|
Response 4: We sincerely appreciate your comment regarding the clarity of Figures 2 and 4. In response, we have uniformly improved the resolution of all figures in the manuscript and increased the size of the text within them to ensure better readability. The updated figures have now been incorporated into the revised manuscript. For your convenience, we have also provided the figures as separate attachments so that you may view them at full resolution. Additionally, we would like to note that the first-row image in Figure 2 appears blurred because it is a deliberately processed version with high-frequency information removed for methodological demonstration. As a result, the source image is inherently blurry. |
|
Comments 5: In the "4 Discussions" section, the suggestions proposed (such as "prioritize color and form") are too vague for urban designers. Which elements matter most? How should policies balance detail vs. simplicity? Suggest providing actionable examples. |
|
Response 5: Thank you very much for this valuable suggestion. We carefully considered your comment and have now added more specific and actionable design implications to strengthen the practical relevance of our findings. The newly added paragraph is as follows: (see page 3, line 615) “In addition to the above points, our findings on the diagnostic roles of spatial frequency and color also provide more concrete design directions for practitioners. First, emphasizing clear street silhouettes and coherent building forms can improve global recognizability, especially when supported by deliberate color application. Prior studies show that color not only shapes the emotional perception of urban spaces but also enhances the distinctiveness of local architectural character [52]. Cities such as BraÈ™ov demonstrate how carefully coordinated building colors can strengthen visual harmony and reinforce place identity within historic contexts [53], while color systems developed for Tibetan-inhabited areas show how culturally rooted palettes can preserve local distinctiveness in contemporary planning [54]. Second, design attention at the ground-floor level is particularly crucial for street identity. Transparent façades, active frontages, and pedestrian-oriented layouts have been shown to enhance street vitality and improve the legibility of the urban environment [55, 56]. These strategies create visually engaging edges that complement the global structural cues highlighted in our findings. Finally, from a policy perspective, balancing detail and simplicity may be achieved through clear yet flexible design codes. Frameworks similar to the National Planning Policy Framework (NPPF) in the UK illustrate how overarching design principles can guide color, form, and ground-floor standards while still allowing local authorities to adapt guidelines to cultural and contextual needs [57]. Such an approach can help translate perceptual insights into actionable urban design practices.” |
|
Comments 6: Finally, a more critical perspective is required on the results obtained such as a more sophisticated account of the implications of the findings and a more advance on the conclusions. |
|
Response 6: Thank you very much for your valuable suggestion. We have carefully considered your comment and added two new paragraphs in the Discussion section to provide a more nuanced and multi-layered interpretation of our findings. The added paragraphs are as follows: (see page 15, line 550) “Our findings also provide a clearer perspective on how low-level visual information contributes to the rapid recognition of urban streets. The improved recognition performance after removing mid-to-high spatial frequencies suggests that some commercial streets may contain excessive fine-grained details—such as dense signage, textures, and decorative elements—that interfere with the extraction of global visual structure. In this context, our results indicate that clear overall forms and coherent color patterns help observers grasp the essential structure of a street within a very short time window, whereas overly intricate surface details may offer limited contribution to rapid identification. These insights highlight how the arrangement of broad visual components can shape the perceptual distinctiveness of streets in early visual processing.” (see page 15, line 587) “Regarding the non-significant effect of trees, our interpretation is more cautious. Although trees did not improve rapid street recognition in our task, this does not mean that natural elements are unimportant in urban environments. It is possible that trees influence how people experience and remember a street over longer periods rather than how quickly they identify it in a brief visual exposure. This perspective emphasizes the need to consider different temporal and cognitive processes when evaluating urban elements and to distinguish between features that support instant perceptual identification and those that shape broader experiential qualities.” |
Author Response File:
Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript presents an interesting topic with potential contributions to urban visual perception and place identity; however, I recommend major revisions before reconsideration. However, significant revisions are needed to clarify the novelty, strengthen methodological rigor, improve statistical reporting, and establish clearer relevance to planning research. 1- The introduction is well written, explaining the main components that the introduction should have. I have a minor issue in the introduction regarding the terms associated with 'identity'. In some places, the authors use 'identity'; 'place identity'; and 'urban identity'. They both are correct, but they have different meanings. These terms are used in the manuscript interchangeably. I recommend that the authors either provide a clear definition of each term or clarify the term "place identity" used in the title and ensure its clear distinction in the introduction. I would encourage the author to review the following manuscript, which might help differentiate identity. Semantic similarities between personality, identity, character, and singularity within the context of the city or urban, neighbourhood, and place in urban planning and design Branding Cities Through Architecture: Identify, Formulate, and Communicate the City Image of Amman, Jordan 2- Methodologically, the study requires clearer justification and greater transparency. The rationale provided for the sample size appears loosely connected to the actual experiment, and the effect sizes cited from previous literature may not be fully appropriate for a study of complex urban scenes. The extremely small number of images (only eight street scenes in total) limits the generalizability of the results and raises concerns about sampling adequacy. Additionally, the approach to constructing the “no trees” condition by using winter images introduces confounding differences—such as lighting, sky color, and façade visibility—that are not adequately controlled or acknowledged. More detail is also needed regarding the quality of eye-tracking calibration and how issues like drift and participant variability were handled.
3-The statistical reporting requires improvement. Several ANOVA results are presented without complete information, including effect sizes and confidence intervals. The interpretation of the findings sometimes overgeneralizes from modest differences, particularly in relation to the claim that removing mid- to high-frequency spatial components improves street recognition. Similarly, the conclusion that trees are not important for street recognition is overstated, given the limited dataset and the presence of unaddressed seasonal confounds. The discussion section would also benefit from stronger integration between the reported results and the theoretical mechanisms proposed, as well as a more nuanced consideration of how these findings translate into practical insights for urban designers and planners.
4- Several minor issues warrant attention. The section numbering in the methods appears inconsistent, and the writing contains repeated ideas, overly long sentences, and grammatical errors that require careful proofreading. Some figures, particularly those depicting the HMM heatmaps, lack clear labeling and are challenging to interpret without explicit region-of-interest markers. The ethics approval identifiers are incomplete, and the data availability statement should specify the conditions for accessing the data more clearly. Several references appear incomplete or inconsistently formatted.
Author Response
Please see the attachment.
|
Comments 1: The introduction is well written, explaining the main components that the introduction should have. I have a minor issue in the introduction regarding the terms associated with 'identity'. In some places, the authors use 'identity'; 'place identity'; and 'urban identity'. They both are correct, but they have different meanings. These terms are used in the manuscript interchangeably. I recommend that the authors either provide a clear definition of each term or clarify the term "place identity" used in the title and ensure its clear distinction in the introduction. I would encourage the author to review the following manuscript, which might help differentiate identity. Semantic similarities between personality, identity, character, and singularity within the context of the city or urban, neighbourhood, and place in urban planning and design Branding Cities Through Architecture: Identify, Formulate, and Communicate the City Image of Amman, Jordan |
|
Response 1: Thank you very much for raising this issue. We indeed overlooked the subtle distinctions in terminology, and we sincerely apologize for that. We are also grateful for the references you provided, which allowed us to better differentiate between the relevant concepts. Based on the conceptual distinctions discussed in the works you suggested, we refined the terminology used in our manuscript. In this study, we adopt the term urban identity. We emphasize the visible and spatial characteristics of the built environment that create distinctiveness and recognizability at the street level, which aligns more closely with the definition of urban identity understood as the perceptible features that differentiate one place from another. At the same time, we would like to note that our analysis concentrates on the recognizability of visual features rather than on the emotional, mnemonic, or socio-psychological dimensions often associated with broader discussions of identity. Below is the revised paragraph. (see page 1, line 30) “What visual features makes cities more distinctive and recognizable? The mystery above is central to sustaining urban identity and therefore ensuring the long-term socio-environmental sustainability of urban life. Cities are significant to urban identity and the sustainability of the social environment because they act as major catalysts in shaping cultural narratives, lived experiences, and communal spaces. Urban characteristics—ranging from architectural elements and historic landmarks to the design of public spaces and participatory planning practices—collectively influence community attachment, social cohesion, and inclusive urban governance [1-3]. However, increasing globalization introduces tensions between standardized, efficiency-driven urban design and the preservation of culturally specific architectural forms. This trend has led to growing urban homogenization, where modern developments often neglect regional identities and local traditions, weakening both urban identity and socio-environmental resilience [4, 5]. ” |
|
Comments 2: Methodologically, the study requires clearer justification and greater transparency. The rationale provided for the sample size appears loosely connected to the actual experiment, and the effect sizes cited from previous literature may not be fully appropriate for a study of complex urban scenes. The extremely small number of images (only eight street scenes in total) limits the generalizability of the results and raises concerns about sampling adequacy. Additionally, the approach to constructing the “no trees” condition by using winter images introduces confounding differences—such as lighting, sky color, and façade visibility—that are not adequately controlled or acknowledged. More detail is also needed regarding the quality of eye-tracking calibration and how issues like drift and participant variability were handled. |
|
Response 2: We greatly appreciate your thoughtful and detailed comments. We have addressed each of your concerns. (1) Regarding the concern about the number of images, we truly appreciate your careful observation and apologize for any confusion this may have caused. In our experiment, each image underwent four different types of processing in order to examine how different visual conditions affect street recognizability. Including the original images, each participant was required to evaluate forty images in total. Under such circumstances, increasing the number of stimuli may lead to participant fatigue, which could in turn reduce the reliability of the results. At the same time, related research—such as OSIEshort: A small stimulus set can reliably estimate individual differences in semantic salience—has shown that a set of 40 images can reliably capture individual differences in visual exploration measures [1]. This suggests that our current stimulus size can still provide reasonably accurate findings for the scope of our study. In our design, we prioritized comparing differences across visual conditions, which required some compromise on the total number of images. To ensure that the selected street photographs remained representative, we also paid close attention to the selection process: we chose images that reflected key street characteristics, maintained a consistent pedestrian-eye viewpoint, and ensured comparable vanishing points across conditions. We acknowledge that our current approach remains limited, and in future studies where visual dimensions are not the primary focus, we plan to increase the number of images to further validate the conclusions drawn in this work. 1. Linka, M. and B. de Haas, OSIEshort: A small stimulus set can reliably estimate individual differences in semantic salience. Journal of vision, 2020. 20(9): p. 13-13. (2) Thank you very much for your thoughtful comment regarding the winter photographs. We did take several steps to control the relevant visual conditions when selecting and preparing these images. Specifically, the lighting conditions and sky color were kept consistent with those of the original images to ensure overall comparability. Regarding façade visibility, we maintained the same viewing angle across all photographs. Although the absence of trees naturally increases façade visibility compared with the original images, the proportion of visible façades remained comparable within the tree-absent condition. In our experiment, the primary focus was on participants’ ability to recognize whether a given image depicted Huaihai Road under the same visual manipulation, rather than comparing visibility across different seasons. Therefore, we believe that maintaining consistent controls within each condition is the most important factor for ensuring valid comparisons. To address your concern more clearly, we have added an explanation in the revised manuscript. (see page 6, line 256) “To minimize potential confounding variables, the images in both conditions were carefully matched in terms of lighting, perspective, and sky color, and façade visibility was kept consistent across all images within the tree-absent condition.” (3) In the revised manuscript, we have clarified our calibration procedure to provide greater methodological transparency. With respect to drift, participants viewed each image only briefly (approximately 2–3 seconds), which helped reduce the potential influence of slow positional drift on fixation patterns. (4) Regarding individual differences among participants, we considered the diversity of participant backgrounds during recruitment. Specifically, we included both university students and community residents from diverse occupational backgrounds, while also balancing long-term residents and recent migrants to control for potential bias stemming from varying degrees of familiarity with Huaihai Road. To further examine the role of individual differences, we conducted supplementary analyses on the relationship between participant type and recognition accuracy. The results indicated that greater familiarity with Huaihai Road was associated with higher recognition accuracy. We have incorporated this analysis into the Results section and discussed its implications in the Limitations section. Below is the newly added paragraph: (see page 10, line 374) “Meanwhile, we also found that participants who were more familiar with Huaihai Road demonstrated higher recognition accuracy. To examine whether familiarity influenced participants’ ability to recognize the street based on its visual features, we conducted a one-way ANOVA with recognition accuracy as the dependent variable and participant group (Shanghai residents vs. non-residents) as the independent factor. The results showed a significant effect of group on recognition performance, indicating that Shanghai residents achieved higher accuracy than non-residents (F(1,223) = 5.24, p = 0.023, η²=0.023).” (see page 17, line 644) “Lastly, although our 1:1 recruitment of Shanghai residents and non-residents helps reduce familiarity-related bias, recognition performance may still reflect prior experience in addition to low-level visual cues.” |
|
Comments 3: The statistical reporting requires improvement. Several ANOVA results are presented without complete information, including effect sizes and confidence intervals. The interpretation of the findings sometimes overgeneralizes from modest differences, particularly in relation to the claim that removing mid- to high-frequency spatial components improves street recognition. Similarly, the conclusion that trees are not important for street recognition is overstated, given the limited dataset and the presence of unaddressed seasonal confounds. The discussion section would also benefit from stronger integration between the reported results and the theoretical mechanisms proposed, as well as a more nuanced consideration of how these findings translate into practical insights for urban designers and planners. |
|
Response 3: Thank you very much for raising these important points. We have carefully addressed each of your comments in the revised manuscript. (1) We have revised the presentation of the ANOVA results by adding effect sizes along with confidence intervals, as recommended. This adjustment offers greater transparency and facilitates a clearer interpretation of the observed effects. (2) Thank you very much for raising these important concerns about the interpretation of our findings. We fully agree that some statements in the earlier version risked overgeneralizing the observed effects, particularly regarding the improvement in recognition after removing mid- to high-frequency spatial components. In the revision, we have clarified that the effect size is modest and should be interpreted as preliminary evidence rather than a definitive conclusion. We now emphasize that these results indicate a possible tendency for excessive fine-grained detail to interfere with rapid recognition, but that further studies with broader image samples are required to substantiate this interpretation. Regarding the interpretation of trees, we appreciate your thoughtful comment. Our dataset did not show a positive contribution of trees to rapid street recognition, and we acknowledge that our initial wording overstated this finding. In the revision, we now explain that this null result may relate to the way participants allocated attention: during rapid identification, observers tended to focus more on ground-floor elements—such as storefronts and building edges—rather than on vegetation, which may have reduced the perceptual contribution of trees in our specific task. We also highlight that this interpretation is limited by the scope of our sample and that future work should test this effect using a wider range of streetscapes and seasonal conditions. (3) Thank you for this thoughtful and valuable suggestion. We sincerely appreciate your guidance on strengthening the Discussion section, particularly regarding the integration of results with theory and their practical relevance for design and planning. In direct response to your feedback, we have carefully revised and expanded this section. We have added paragraphs to better connect our experimental findings to the underlying perceptual and cognitive theories. For instance, we have further elaborated on the counterintuitive result concerning mid-to-high spatial frequencies by framing it within concepts of perceptual load and informational balance, suggesting that visual "noise" from excessive detail may sometimes hinder rapid recognition. We also offer a more nuanced interpretation of the non-significant finding related to trees. Furthermore, to address the need for more actionable insights, we have introduced a dedicated part focusing on design and policy implications. Here, we strive to translate our core findings on spatial frequency and color into more concrete, hierarchical guidance—touching on strategic priorities at the global, street-level, and policy-making scales—to bridge the gap between perceptual research and practical application. Below is the newly added paragraph: (see page 15, line 550) “Our findings also provide a clearer perspective on how low-level visual information contributes to the rapid recognition of urban streets. The improved recognition performance after removing mid-to-high spatial frequencies suggests that some commercial streets may contain excessive fine-grained details—such as dense signage, textures, and decorative elements—that interfere with the extraction of global visual structure. In this context, our results indicate that clear overall forms and coherent color patterns help observers grasp the essential structure of a street within a very short time window, whereas overly intricate surface details may offer limited contribution to rapid identification. These insights highlight how the arrangement of broad visual components can shape the perceptual distinctiveness of streets in early visual processing.” (see page 15, line 587) “Regarding the non-significant effect of trees, our interpretation is more cautious. Although trees did not improve rapid street recognition in our task, this does not mean that natural elements are unimportant in urban environments. It is possible that trees influence how people experience and remember a street over longer periods rather than how quickly they identify it in a brief visual exposure. This perspective emphasizes the need to consider different temporal and cognitive processes when evaluating urban elements and to distinguish between features that support instant perceptual identification and those that shape broader experiential qualities.” (see page 16, line 615) “In addition to the above points, our findings on the diagnostic roles of spatial frequency and color also provide more concrete design directions for practitioners. First, emphasizing clear street silhouettes and coherent building forms can improve global recognizability, especially when supported by deliberate color application. Prior studies show that color not only shapes the emotional perception of urban spaces but also enhances the distinctiveness of local architectural character [52]. Cities such as BraÈ™ov demonstrate how carefully coordinated building colors can strengthen visual harmony and reinforce place identity within historic contexts [53], while color systems developed for Tibetan-inhabited areas show how culturally rooted palettes can preserve local distinctiveness in contemporary planning [54]. Second, design attention at the ground-floor level is particularly crucial for street identity. Transparent façades, active frontages, and pedestrian-oriented layouts have been shown to enhance street vitality and improve the legibility of the urban environment [55, 56]. These strategies create visually engaging edges that complement the global structural cues highlighted in our findings. Finally, from a policy perspective, balancing detail and simplicity may be achieved through clear yet flexible design codes. Frameworks similar to the National Planning Policy Framework (NPPF) in the UK illustrate how overarching design principles can guide color, form, and ground-floor standards while still allowing local authorities to adapt guidelines to cultural and contextual needs [57]. Such an approach can help translate perceptual insights into actionable urban design practices.” |
|
Comments 4: Several minor issues warrant attention. The section numbering in the methods appears inconsistent, and the writing contains repeated ideas, overly long sentences, and grammatical errors that require careful proofreading. Some figures, particularly those depicting the HMM heatmaps, lack clear labeling and are challenging to interpret without explicit region-of-interest markers. The ethics approval identifiers are incomplete, and the data availability statement should specify the conditions for accessing the data more clearly. Several references appear incomplete or inconsistently formatted. |
|
Response 4: Thank you very much for your careful reading of our manuscript and for raising these important issues. We sincerely appreciate your detailed feedback. We address each point as follows: (1)We apologize for the oversight regarding section numbering in the Methods. We have corrected the inconsistency and thoroughly rechecked the manuscript to ensure that similar issues do not remain. (2)We have carefully proofread the entire manuscript to improve its clarity and readability, aiming to reduce repeated ideas, overly long sentences, and grammatical errors. (3)Regarding figure labeling, we have added clarifications where needed. For the HMM figures specifically, we have included explanations of what each part represents. (4)We have supplemented the complete ethics approval identifiers to ensure full compliance with institutional requirements. (5)We have clarified in the data availability statement that the dataset can be accessed upon reasonable request by contacting the corresponding author. We have also carefully reviewed and manually corrected the formatting of all references to ensure completeness and consistency throughout the manuscript. |
Author Response File:
Author Response.docx
Reviewer 3 Report
Comments and Suggestions for AuthorsThe topic of the paper is very interesting and the paper can be improved.
In P.2. "In parallel, the rapid recognition of scenes (scene gist) relies heavily on low-level visual features, particularly spatial frequencies and color, to quickly convey the overall..............etc". Do we have other elements apart from colours, if so cite them.
In P.03. Line 100. the concept "socio-environmental sustainability" needs to be more elaborated as part of the literature review.
In addition more explanation and clarification is needed for the concept "low-level visual features'.
In P. 04. In a country like China, is the sample of 45 participants is appropriate. authors need to justify this number.
In P.5. "The original images were obtained in the summer months (July–October)" of which year?
In P. 06. in Figure 01, the pictures are very small and thus unclear.
In P.7. Section 3.4 (Analysis). The paragraphs are too small leading to fragmentation.
In P.8. Section 03 (Results). 3.1. Subsection ? is it a typo-mistake?
In P.9. Add a summary of the diagrams before going to Section 3.2.
In P.13. Are the "open public spaces" are part of the people patterns.
Author Response
Please see the attachment.
|
Comments 1: In P.2. "In parallel, the rapid recognition of scenes (scene gist) relies heavily on low-level visual features, particularly spatial frequencies and color, to quickly convey the overall..............etc". Do we have other elements apart from colours, if so cite them. |
|
Response 1: Thank you for your valuable comment regarding the elements involved in rapid scene recognition. We appreciate you highlighting this point for clarification. You are correct that other key low-level visual features contribute significantly to gist perception alongside color. In response to your suggestion, we have expanded the relevant paragraph on P.2 to explicitly include spatial frequency – which is fundamental to extracting scene layout – as well as luminance contrast, which enhances object detection and recognition within scenes. These additions are supported by relevant citations from the literature on scene perception. The updated paragraph now reads as follows: (see page 3, line 131) “In parallel, the rapid recognition of scenes (scene gist) relies heavily on low-level visual features, particularly spatial frequencies and color, to quickly convey the overall meaning and context of an environment [29, 30]. Studies have demonstrated that humans can accurately extract gist information within milliseconds, enabling efficient categorization and recognition of environments [31, 32]. Color specifically enhances rapid recognition and memory retention of scenes, though its impact varies depending on the context, being especially beneficial in natural environments [33, 34]. Among these low-level features, spatial frequencies are fundamental, as the visual system relies on processing patterns of orientation and frequency across the scene to rapidly extract its global structure and layout [35]. Beyond color and spatial frequency, luminance contrast is another critical element, where higher contrast between objects and the background significantly improves detection and recognition performance [36].“ |
|
Comments 2: In P.03. Line 100. the concept "socio-environmental sustainability" needs to be more elaborated as part of the literature review. In addition more explanation and clarification is needed for the concept "low-level visual features'. |
|
Response 2: Thank you very much for raising these concept-related questions. First, regarding the term “socio-environmental sustainability”, we have added a clearer definition at its first appearance in the manuscript. The following explanation has now been incorporated into the opening paragraph: (see page 1, line 34) “In this context, socio-environmental sustainability refers to the capacity of urban environments to support not only ecological stability but also socially resilient communities. It encompasses elements such as cultural continuity, social cohesion, equitable access to shared spaces, and residents’ long-term sense of belonging—factors that are directly influenced by the distinctiveness and recognizability of the urban landscape [1, 2]. “ Second, for the concept of “low-level visual features”, we have added an explicit clarification where the term first appears. The following description has been inserted: (see page 2, line 89 “In the context of this study, low-level visual features are perceptual attributes processed at early stages of visual analysis, before object-level or semantic interpretation. They typically include spatial frequency, luminance and contrast gradients, edges and contours, texture patterns, and color [23, 24]. These features provide the foundational information that allows rapid extraction of a scene’s overall layout or “gist,” even when fine details or semantic cues are absent [25]. In urban street images, low spatial frequencies (LSF) capture broad structural layout, such as the arrangement of buildings, street width and depth, skyline silhouettes, and major massing relationships. High spatial frequencies (HSF) encode fine architectural details, including façades, signage, window frames, and decorative elements [24]. Texture or orientation distributions reflect surface patterns on buildings or pavements, contributing to the material character of the street, while color highlights contrasting façades, shopfronts, street furniture, sky, or greenery, aiding rapid differentiation of streets before conscious identification of landmarks [23]. In this paper, we focus primarily on LSF, HSF, and color as the key low-level visual features, representing complementary dimensions of perceptual information: global structure versus local detail, and chromatic information.” |
|
Comments 3: In P. 04. In a country like China, is the sample of 45 participants is appropriate. authors need to justify this number. |
|
Response 3: Thank you very much for raising this important point about the appropriateness of our sample size. We carefully considered this issue during the study design phase and conducted a G*Power analysis to determine the minimum number of participants required. Specifically, prior research on the effect of low spatial frequency information on rapid scene recognition reported a large effect size of 1.47, corresponding to a minimum required sample of 9 participants. We also referred to studies using eye-tracking to examine fixation behavior under low-frequency conditions, which reported an effect size of 1.13 and a minimum required sample of 14 participants. Based on these benchmarks, we selected a substantially larger sample of 51 participants to ensure adequate statistical power for both behavioral and eye-tracking analyses and to reduce variability commonly associated with eye-movement data. We appreciate your suggestion, and we hope this explanation clarifies the rationale behind our sampling decision. |
|
Comments 4: In P.5. "The original images were obtained in the summer months (July–October)" of which year? |
|
Response 4: Thank you very much for raising this point. We have now added the missing information to the manuscript. Specifically, all summer street-view images were collected between July and October of 2021, ensuring consistent lighting conditions and the presence of tree canopies. We have also clarified in the revised text that all images used in the experiment were captured during 2021. |
|
Comments 5: In P. 06. in Figure 01, the pictures are very small and thus unclear. |
|
Response 5: We sincerely appreciate your comment regarding the clarity of figures. In response, we have uniformly improved the resolution of all figures in the manuscript and increased the size of the text within them to ensure better readability. The updated figures have now been incorporated into the revised manuscript. For your convenience, we have also provided the figures as separate attachments so that you may view them at full resolution. |
|
Comments 6: In P.7. Section 3.4 (Analysis). The paragraphs are too small leading to fragmentation. |
|
Response 6: Thank you for raising the issue of the fragmented paragraph structure in Section 3.4 (Analysis). We agree that the original version was somewhat disjointed, and your comment prompted us to review and restructure this section for better logical flow. As suggested, we have consolidated the paragraphs to enhance the logical flow. Specifically, we merged the paragraphs discussing the ANOVA methodology (originally describing accuracy analysis and eye-tracking metrics separately) into a unified passage. Additionally, the paragraphs detailing the eye-tracking data processing tools and visualization methods were combined into another single, coherent paragraph. The revised structure now consists of three clear paragraphs: 1) an overview of the multi-step analytical approach, 2) a consolidated description of the statistical analysis (ANOVA), and 3) a unified explanation of the eye-tracking data processing and modeling techniques. We believe this reorganization addresses the issue of fragmentation and presents the methodology in a more logical and readable manner. We are grateful for your constructive feedback, which has helped us improve the clarity of the manuscript. |
|
Comments 7: In P.8. Section 03 (Results). 3.1. Subsection ? is it a typo-mistake? |
|
Response 7: Thank you very much for your careful reading of our manuscript and for pointing out the editing error. We sincerely apologize for this oversight. We fully understand that such issues may affect the clarity and professionalism of the manuscript, and we are truly sorry for any inconvenience it may have caused during your review. We have now corrected the numbering problem and carefully rechecked the entire manuscript to ensure that no similar inconsistencies remain. |
|
Comments 8: In P.9. Add a summary of the diagrams before going to Section 3.2. |
|
Response 8: We sincerely thank you for raising this point. We agree that adding a summary would help clarify the flow of the manuscript. Following your suggestion, we have added a paragraph in the corresponding section of the manuscript, which mainly provides a summary of the findings. The added paragraph is as follows: (see page 10, line 382) “Overall, Figure 3 shows that low-spatial-frequency images enhanced recognition accuracy, whereas black-and-white images reduced it, and tree removal had little effect. Furthermore, participants familiar with Huaihai Road generally performed better. These findings set the stage for examining how such image manipulations affected participants’ viewing strategies in the next section.” |
|
Comments 9: In P.13. Are the "open public spaces" are part of the people patterns. |
|
Response 9: Thank you very much for raising this question. We are not entirely certain about the specific meaning of “open public spaces” as intended in your comment, as it is unclear whether you are referring to their categorization within people-related patterns or to another aspect of our analysis. We would be grateful if you could provide a bit more detail so that we can revise this part accurately. We sincerely appreciate your careful reading, and we will make sure to address this point thoroughly once clarified. |
Author Response File:
Author Response.docx
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe author has addressed all of my comments. l recommend acceptance of the paper in its present form.
Reviewer 2 Report
Comments and Suggestions for AuthorsThank you for submitting a revised version and addressing all my previous comments.

