Next Article in Journal
Experimental Study on the Flexural Resistance of Damaged Reinforced Concrete Beams Strengthened by Carbon Fiber Nets
Previous Article in Journal
Uncertainty-Based Model Averaging for Prediction of Corrosion Ratio of Reinforcement Embedded in Concrete
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Understanding the Influence of Environmental Elements on Spatial Attractiveness in a Jiangnan Water Town Through Computer Vision Techniques

1
School of Engineering, The University of Tokyo, 7-Chōme-3-1 Hongō, Tokyo 113-8654, Japan
2
School of Art, Huzhou University, No. 1 Xueyuan Road, Wuxing District, Huzhou 313000, China
3
School of Architecture, Soochow University, No. 199 Ren-Ai Road, Suzhou Industrial Park, Suzhou 215123, China
4
China-Portugal Belt and Road Joint Laboratory on Cultural Heritage Conservation Science, Suzhou 215123, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Buildings 2025, 15(12), 2091; https://doi.org/10.3390/buildings15122091
Submission received: 28 April 2025 / Revised: 11 June 2025 / Accepted: 15 June 2025 / Published: 17 June 2025
(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

Abstract

Traditional Jiangnan water towns in China serve as important cultural heritage sites and tourist destinations. Existing studies have revealed a potential connection between environmental elements and spatial perception in these towns. However, there remains a lack of research systematically investigating whether and how these environmental elements influence subjective evaluation indicators, such as spatial attractiveness, and the mechanisms underlying the interactions between these elements. To further understand these mechanisms, we used Nanxun Old Town as our study site, employed computer vision techniques to perform semantic segmentation on street-view images, extracted the visual proportions of environmental elements, and conducted quantitative correlation analysis with subjective attractiveness evaluations. The findings indicate that different environmental elements in water towns shape spatial imagery in diverse ways, thereby influencing perceived attractiveness. Firstly, though space-defining elements such as buildings and water generally contribute positively to perceived attractiveness, their proportions should be controlled within a reasonable range to maintain a spatial scale that aligns with the traditional imagery of water towns. Secondly, foreground elements like boats and lanterns, although occupying a smaller proportion, can effectively enhance the space when properly combined. Finally, the influence of elements such as bridges and buildings depends on the specific viewing distance and angle. These findings, based on an interpretable analytical framework, reveal that the effects of environmental elements on spatial attractiveness are context-dependent and nonlinear, varying with their proportions, combinations, and perspectives. This approach offers a more comprehensive understanding of the mechanisms by which environmental elements shape spatial attractiveness, providing a scientific foundation for regulating key visual components and optimizing spatial composition for sustainable traditional water town environment management.

1. Introduction

Old towns and villages serve as vital preservers of China’s outstanding historical culture, sustaining its long-standing lineage [1]. The spatial configurations of small bridges over rivers, along with white walls and black-tiled roofs, reflect the architectural achievements of the Jiangnan region and help shape the quintessential imagery of traditional water towns. This distinctive landscape has endowed these old towns with abundant, regionally characteristic cultural and tourism resources [2], making them unique tourist destinations. Previous research has highlighted the significance of attractiveness as a key evaluation metric in scenic areas, serving as a fundamental motivator of tourism [3]. Exploring the attractiveness of such destinations from a tourist perspective is an effective means of assessing their spatial conditions [4] that helps in evaluating both the likelihood of initial visitations and the intention to return [5,6]. Recently, some studies have found that environmental elements play a critical role in how visitors perceive water towns [7]. The visibility of these elements is central to assessing the quality of spatial perception, with their impact stemming from attributes such as type and characteristics [8,9]. However, prior investigations into the role of spatial elements in image-based perception have largely remained at the static, descriptive level, overlooking the potential nonlinear and context-dependent variability in visual perception across differing scene characteristics. Unraveling this intricate and dynamic relationship between space and perception thus calls for research at the underlying mechanistic level. At the same time, the rapid development of computer vision technologies in recent years has enabled the efficient processing of environmental element visibility, making connections between visual elements and spatial perception possible [10]. Thus, within the context of traditional Jiangnan water towns, applying computer vision techniques to explore the ways in which environmental elements influence the perception of spatial attractiveness is not only feasible but also highly significant. At the theoretical level, this study reveals the nonlinear and context-dependent mechanisms by which environmental elements influence spatial attractiveness. It marks a shift from the mere presentation of outcomes to an explanation of underlying mechanisms, thereby enriching theoretical understandings of how spatial imagery is formed and perceived in traditional water towns. At the practical level, this study provides a more refined and applicable framework for spatial optimization, offering quantitative support for the adjustment and combination of environmental elements during the preservation and renewal of water towns. Furthermore, it demonstrates the potential application value of data-driven approaches in protecting and revitalizing historic districts, thus expanding their technological applications.
This study aims to uncover the mechanisms by which environmental elements influence tourists’ perceptions of spatial attractiveness in traditional Jiangnan water towns, specifically addressing the following three questions: (1) Which environmental elements are most strongly associated with perceived spatial attractiveness? (2) How do the visual proportions and characteristics of these elements shape spatial perception? (3) Do interactions among different environmental elements—including mutually reinforcing or conflicting effects—collectively influence overall attractiveness evaluations? To answer these questions, we selected Nanxun Old Town in Zhejiang Province, China, as a research site. Using computer vision techniques, we calculated the proportional presence of various spatial elements in street-view images, simulating human eye-level perspectives. Tourist ratings of spatial attractiveness were collected through questionnaires, and regression analysis was applied to establish a quantitative relationship between element proportions and attractiveness ratings. Additionally, the Shapley additive explanations (SHAP) approach was adopted to further explain the nonlinear relationships between different environmental element proportions and perceived attractiveness.

2. Literature Review

2.1. Research on Spatial Perception and Environmental Elements in Old Towns

Since the publication of The Image of the City by renowned American architect and urban planning theorist Kevin Lynch, the concept of the “environmental image” has become foundational in understanding space as a key medium for human cognition and emotional projection. Lynch proposed that five urban elements—paths, edges, nodes, landmarks, and districts—constitute people’s mental maps of cities, helping them form a sense of orientation and identity [11]. Italian architect Aldo Rossi further argued that the city is not merely a material form but a complex spatial structure imbued with memory and cultural depth. By emphasizing the temporal continuity of historic districts and iconic buildings, Rossi expanded urban form to include a historical–cultural dimension, suggesting that spatial imagery often arises from long-term lived experience and cultural identification [12]. In addition, Robert Venturi and his collaborators, in Learning from Las Vegas: The Forgotten Symbolism of Architectural Form, explored the formation of urban spatial attraction from the perspective of consumer culture and visual symbolism. They argued that the legibility of urban space derives not only from internal coherence but also from spatial narratives constructed through symbolic and visual elements [13].
With the continued development of environmental psychology, scholars have increasingly adopted human-centered perspectives to investigate how individuals perceive spatial imagery. For example, the environmental preference theory proposed by Rachel and Stephen Kaplan posits that people’s preferences for specific environments are primarily shaped by the informational characteristics of the environment itself. The theory suggests that humans are inclined to favor environments that are both easily interpretable and rich in potential for exploration [14]. Building on this, Hildebrand proposed a tripartite structure of spatial perception, consisting of the preference subject, the perceptual medium, and the preference object [15]. Edward Relph also emphasized that individual spatial perception is closely tied to a sense of belonging and identity—what he termed “sense of place”. This sense arises not only from the physical characteristics of the environment but also from emotional interaction and the construction of meaning between people and place [16]. Amos Rapoport echoed similar views, noting that cultural symbols (e.g., porches suggesting welcome) and behavioral cues (e.g., benches implying rest) interact with personal experience to activate deep-seated memories. Through long-term residence or repeated experience in a place, individuals develop a sense of belonging grounded in familiarity, giving rise to a perception of rootedness [17]. Harold Proshansky and colleagues further discussed the concept of “place identity”, arguing that the emotional bonds formed through prolonged interaction with specific environments become integral to one’s self-concept, significantly influencing environmental preferences and choices [18]. Additionally, in the field of emotional geography, scholars such as David Davidson and John Urry emphasized the interplay between environment and personal experience. They argued that spatial perception is inevitably infused with personal memories and emotions, which, in turn, shape diverse forms of spatial attachment and meaning [19].
Inspired by these theories, Chinese scholars have adopted a human-centered perspective and employed a variety of research methods to conduct numerous studies on the spatial imagery of traditional towns [20,21,22]. For instance, through web-based textual analysis and natural-language-processing techniques, researchers have identified attraction points [23] that reflect how visitors perceive old towns [24]. Questionnaires [25], field interviews [26], and other methods have also been widely employed to study spatial perceptions and imagery in old towns. In order to obtain more authentic and effective first-hand data, Huang et al. referenced Kevin Lynch’s classification of the five elements of urban imagery and conducted on-site interviews to explore perceptual differences across various spatial settings in traditional towns [27]. Through qualitative analysis of interview transcripts, Chen et al. further constructed a consensus map of spatial imagery [28]. Some scholars also invited tourists to freely photograph scenes in ancient towns, before conducting a spatial element analysis of the images to investigate preferences for distinctive environments [29]. In addition, comparative studies have shown that while sensory inputs such as smell and sound play certain roles in environmental perception [30], people tend to be more sensitive to and interested in visual elements [31,32]. This is reflected in various dimensions, such as the appreciation of natural scenery and cultural depth [33]. Through building a bag-of-words model, Xiao et al. classified visual perceptions of old town scenes and summarized tourists’ perceptual patterns [34]. These perceptual differences appear not only at the macro level of viewpoints but also in the recognition of individual spatial elements [8]. For example, based on visual perception, Zhao et al. discovered that features such as color and element age lead to differentiated perceptual responses [35]. These features, along with the degree of spatial openness, collectively influence visual appeal [9]. Understanding these perceptual differences allows for the development of more targeted renewal strategies [36], which play an important role in creating attractive tourist destinations [37].
Focusing on specific scenes in Jiangnan water town spaces, scholars have found that tourists are especially drawn to elements such as “water”, “bridges”, and “boats” [38] and particularly fond of the classic imagery of “small bridges, flowing water, and residential homes” [39]. This preference stems not only from the vibrancy and quality of the space [7] but also from a deep-rooted cultural pride in these symbolic elements [40]. These unique features of Jiangnan water towns, along with their spatial combinations, are products of the region’s long-standing waterside way of life, serving as both diachronic and synchronic place narratives [41]. They stimulate a sense of cultural identity and place attachment [42]. In particular, static elements like buildings both preserve historical character and generate new “spirituality” [43]. Their texture and spatial presence significantly influence the character of historical districts [44].
Notably, the creation of high-quality old town spaces also relies on other spatial elements. For instance, the inclusion of an appropriate amount of greenery enlivens street spaces, adds layers to spatial experiences [45], and enhances the visual appeal and overall evaluation of plazas and other featured nodes [46,47]. Unique scenic elements such as lanterns and traditional shop signs also play key roles. These features convey the enduring significance of everyday life, indirectly improving the quality of visual perception [48]. A strong visual experience often implies a sense of authenticity [49], which can trigger positive emotions and a sense of local identity among visitors [50]. Furthermore, scholars have found that these distinctive landscapes not only influence visual assessments but also impact tourist behaviors—such as pausing, gathering, and lingering—through visual attraction [51]. Table 1 summarizes the key environmental elements of ancient towns discussed in the literature.

2.2. Research on Environmental Elements Conducted Using Street-View Images

In urban contexts, research involving the use of street-view images (SVIs) to study street environments is becoming increasingly widespread, covering aspects such as spatial form, urban vitality, and the driving environment [52]. Recent years have seen a notable rise in studies employing semantic image segmentation based on deep learning [10], largely due to its ability to produce more intuitive and quantifiable results [53,54]. In such research, scholars commonly use Google Street View (GSV) to acquire SVIs because of its comprehensive coverage, even allowing sampling in urban areas [55]. However, in some cases, researchers may opt to collect images themselves using panoramic cameras or other equipment, either to compensate for GSV’s limitations in certain regions or to fulfill specific research needs [56]. After the images are preprocessed (e.g., panoramic distortions are rectified [57]), environmental elements can be accurately extracted.
Semantic image segmentation requires constructing image datasets and training convolutional neural networks (CNNs). In studies involving SVIs, researchers enjoy considerable flexibility in model selection, with commonly used CNN architectures including DeepLabv3 [58], FCN-8s [59], SegNet [60], and PSPNet [61]. These models require large, pre-labeled datasets such as Cityscapes and ADE20K, which are widely used in architecture, urban area, and landscape research. Some researchers also build and train their own models using custom datasets [62,63] or manually annotate a few images to calculate spatial elements proportions [64].
CNNs are now capable of extracting a vast array of labels from images, including vegetation, road, person, sky, and building labels. The Green View Index (GVI) and Sky View Factor (SVF) are two of the most widely studied spatial features [65,66,67]. The GVI has consistently been shown to positively influence urban vitality [68], relaxation [69], and recreational interest [66], while the SVF also effectively enhances positive emotional responses [60]. Moreover, some researchers have noted that spatial perception is not solely determined by individual elements but also shaped by their interplay. For example, the pedestrian experience is strongly influenced by the visual ratio of buildings to the ground surface [70]. Studies on such visual factors [71] have enabled the quantification of environmental perceptions such as walkability [72].
These studies often employ regression analysis to link objective features or indicators with subjective evaluations; this has become a widely accepted and mature methodological approach. In recent research on walkability, for example, scholars have used advanced regression techniques and machine learning tools to establish reliable correlations between objective spatial indicators (e.g., green view, sky visibility, etc.) and walking willingness or satisfaction. These methods have led to the development of walkability assessment models [57,73] with greater applicability and precision than those based solely on objective indices such as Walkscore. Similar research approaches have also been applied to study bikeability [74] and street-level interest [75]. Through such data-driven methodologies, researchers have successfully quantified the relationship between objective spatial features and human perception, thereby making the impact mechanisms more intuitive while reducing research costs and improving the efficiency and scientific validity of large-scale spatial evaluations.

2.3. Research Gap

It is evident that visual perception research on the spatial imagery of traditional Jiangnan water towns in China has been extensively carried out in recent years. However, existing studies tend to remain at the level of static, descriptive representations—typically relying on averaged results derived from global analyses—while overlooking the nonlinear and context-dependent complexity of visual perception across different spatial settings [76,77]. This has led to an oversimplification of the relationship between environmental elements and visual perception. Such intricate and dynamic interactions between space and perception call for more in-depth, mechanism-oriented investigations.
To address this research gap, this study takes Nanxun Old Town as a case site, constructing a custom image dataset and training a convolutional neural network tailored to the traditional water town context. By integrating machine learning regression techniques, we explore the underlying mechanisms linking environmental elements to subjective perception. This approach enables a transition from mere outcome presentation to mechanism-based explanation, from generalized construction to context-specific differentiation, and from static summarization to the dynamic modeling of perceptual processes.

3. Methodology

3.1. Research Framework

To establish an evaluation model suitable for spatial experiences in Chinese Jiangnan water towns, we divided this study into four stages (Figure 1). Firstly, we captured panoramic street-view images on-site in the main scenic area of Nanxun Old Town, Huzhou, China. Secondly, following existing research [78,79,80] on the key environmental elements of Jiangnan ancient towns, we chose proper labels, performed semantic segmentation on the images, and calculated the proportion of environmental elements within them. Thirdly, we obtained subjective attractiveness ratings by asking the participants to score images to assess the perceived attractiveness of different scenes. In the fourth step, we processed the proportion of environmental elements in the images and the subjective evaluation results using the Random Forest algorithm, SHAP, and Spearman correlation analysis methods.

3.2. Study Area

Nanxun Old Town, located in the Nanxun District of Huzhou City in northern Zhejiang Province near the Jiangsu border, is widely acclaimed as the foremost of the “Six Great Water Towns of Jiangnan”. Long honored as the “Land of Culture” and the “Home of Poetry and Books” [81], Nanxun preserves the quintessential Jiangnan imagery of flowing water, arched stone bridges, and white-washed walls capped with black-tiled roofs [82]. Thanks to this well-kept water–land pattern and spatial character, the town stands as a prototypical, highly representative example among the many Jiangnan water towns that share similar landscapes and urban form [83]. In 2023, Nanxun Old Town received a total of 12.35 million visitors, ranking among the “Top Ten Popular Scenic Spots in China” and setting a new record for Chinese historical town tourism [84].
The study area for this research was the main scenic area of Nanxun Old Town, featuring an elongated north–south layout (Figure 2). The elements of the site exhibit a distinct land–water pattern, with clusters of buildings and important spatial nodes (such as squares and piers) arranged along both banks, offering a rich and dynamic spatial form. Additionally, there are dozens of large and small bridges and numerous heritage-preserved residences.

3.3. Data Collection

3.3.1. Acquiring Street-View Images

Since Google Maps does not provide sufficient panoramic street-view images for Nanxun Old Town, we established an image library by manually capturing images on-site. In urban streets, some scholars are accustomed to using a fixed distance, such as 10 m, to capture SVIs. However, this method is not suitable for the rich and diverse environmental elements found in the spaces of Jiangnan’s historical towns. Meanwhile, through on-site investigation, we found that the publicly accessible sightseeing spaces in Nanxun Old Town are primarily distributed along the river, while other areas, although within the boundaries of the scenic zone, are in fact not accessible. Therefore, based on existing research and practical considerations, we finalized the following criteria for selecting shooting locations: (1) spots along tourist-flow lines; (2) intersections of two or more tourist routes; (3) special node spaces (such as boat piers, squares, and the tops of bridges); and (4) the midpoints of long, linear spaces (such as streets and corridors). After filtering, we selected 95 points (Figure 3). According to the spatial characteristics and locations of the observation points, they can be categorized into five types of waterfront spaces: lane, lane interaction, bridge, square, and waterfront platform. This classification of waterfront spaces is, to some extent, informed by the spatial stratification of water surface, riverbank, and land proposed by Margherita Vanore and Massimo Triches [85]. Compared to traditional street-view photography based on fixed distances, this method effectively reduces the redundancy of highly similar scenes. Additionally, to minimize the impact of weather, sunlight, and other factors on the panoramic image results, all images were captured between 9:00 AM and 11:00 AM on September 30, 2024, Beijing time. The panoramic image capture device used was an Insta360 X4, with an original image resolution of 11,904×5952 pixels. Field research revealed that Chinese tourists make up the vast majority of the total visitor group, so the camera height was set based on the average height of Chinese adults [86] (H = 164 cm), with the line of sight set to H = 152 cm.
The advantage of panoramic images is that they provide viewpoints from all directions; however, they suffer from distortion [87], which may lead to the incorrect semantic segmentation of street landscape proportions. Therefore, using an equirectangular method (Code source: https://github.com/timy90022/Perspective-and-Equirectangular, accessed on 10 November 2024), we obtained static 640 × 640 images from each panoramic image in four directions: left, front, right, and back. The feasibility of this method for correcting panoramic images has already been confirmed in existing research [88,89]. After processing, a total of 380 images were obtained.

3.3.2. Semantic Segmentation

Semantic segmentation is a computer vision technique based on deep learning; it is used to extract micro-features from images and classify image components at the pixel level (Figure 4). Although there are comprehensive semantic image segmentation datasets developed for urban environments and some convolutional models have been localized for China [90], there is still a lack of adaptation for the Jiangnan water town environment. The Deeplabv3+ model (The Python code for implementing Deeplabv3+: https://github.com/lexfridman/mit-deep-learning, accessed on 10 November 2024), trained on the Cityscapes dataset, performs very poorly in this environment (Figure 5). Therefore, in this study, we attempted to address this issue by adopting our Deeplabv3+ model for semantic segmentation; this model has shown good accuracy and efficiency in this regard [91]. Through training and optimization, we ultimately developed a newly trained Deeplabv3+ model suitable for a water town environment, and the model demonstrated good overall accuracy (0.880), outperforming others (Table 2). In practical applications, it showed excellent recognition capabilities for elements such as boats, bridges, and water.
In previous studies, many scholars have used the famous urban design quality concepts proposed by Ewing and Handy [92] to determine element labels for recognition by convolutional models. For example, labels such as road, building, and vegetation are the most commonly used classification tags [53,93,94], and are closely related to walkability, stress reduction, and other factors. However, whether traditional urban research methods and conclusions are applicable to historical towns has not been fully discussed. Therefore, in this study, we made some adjustments to the label selection process in order to conduct more targeted research based on more appropriate classifications and achieve better results [70].
In The Aesthetic Townscape, Ashihara introduced the concepts of “primary profile” and “secondary profile” to describe, respectively, the original form of a building and the shape created by protrusions from the exterior wall or temporary additions [95]. Drawing on this framework, environmental elements in a historic town can be categorized into space-defining elements and foreground. The former possesses stability and a sense of spatial enclosure, acting as the “skeleton” of visual structures; these include elements such as ground, water, and buildings. The latter refers to decorative or temporary elements that are attached to the former and enrich the spatial image, such as pots, shop signs, signboards, and boats.
Considering the spatial characteristics of Nanxun Old Town, we introduced less-discussed labels such as “water” [96]. Elements such as utility poles—which are common in modern urban environments but rarely seen in this historical town setting—were grouped under the category “other”. Additionally, as there were no significant differences in elements such as roads and sidewalks within the pedestrian-restricted areas of the scenic spot, these elements were merged into a single label, “ground”. It is important to note that the “bridge” classification label may vary depending on the observer’s relative position, being categorized as a space-defining element when viewed from the slope or top, and as a foreground element when observed from a distant side angle. Additionally, considering the richly varied morphology of waterfronts and the distinctive spatial structures between water and land [85], riverbank has also been included as a separate label. We ultimately arrived at the following labels by considering the important positive role of lanterns in Jiangnan water towns with respect to spiritual connection and cultural confidence [97], as well as highly local and distinctive old town elements such as shop signs [48], boats, and arch bridges (Table 3); more specifically, these were building, boat, sky, person, pot, lantern, bridge, water, shop sign, vegetation, signboard, riverbank, ground, fence, and other. Figure 6 shows an example of the segmentation results for the convolutional model trained using street-view images of Nanxun Old Town.

3.3.3. Acquiring Attractiveness Score

Panoramic images were randomly selected from the clustered folders for questioning to improve the efficiency of the participants’ scene evaluations. Prior to this, we performed a hierarchical clustering of panoramic images from 95 scenes based on the proportions of environmental elements. The feasibility of using this method to group similar images was confirmed in Liu et al.’s research [98]. In contrast to K-means clustering and two-step clustering, hierarchical clustering can visually present clustering results in a tree diagram [99], facilitating subsequent operations.
Considering factors such as the average time the participants took to complete the questionnaires, we adopted a method involving randomly selecting images from within the same cluster to obtain subjective evaluations [100]. Ultimately, we selected a truncation height of 4, resulting in 13 image clusters (Figure 7). To ensure the participants adequately comprehended the scenes, we employed a mobile-based QR-code-scanning method to score the panoramic images. The application randomly selected one image from each of the 13 folders obtained through clustering for questioning, ensuring that each space category was evaluated in each round of scoring. In addition, this study adopted a mean-based modeling strategy to emphasize collective trends over individual variability and examined spatial attractiveness from the perspective of ancient towns as distinctive tourist destinations. Previous research has shown that tourists, compared to local residents, tend to exhibit stronger emotional responses to spatial imagery, and that cultural background can significantly influence spatial perception [2,101]. To minimize the influence of environmental familiarity, the questionnaire was administered exclusively to visiting tourists, excluding local residents. Furthermore, only domestic tourists were surveyed to reduce potential variability stemming from cross-cultural differences. Subsequently, subjective scoring was conducted on-site at the main entrance of the historical town scenic area. A total of 276 valid electronic questionnaires were obtained.

3.4. Establishing the Evaluation Model

The first step in establishing the evaluation model was to determine the weight of each environmental element’s impact on the attractiveness rating. We adopted a data-driven approach and used the Random Forest algorithm to calculate the weights in order to avoid subjective weighting issues and overcome the lower precision and efficiency of artificial judgment methods such as AHP [92]. The Random Forest algorithm is an efficient machine learning algorithm that calculates the importance—or weight—of variables based on their significance [102]. One of its inherent advantages lies in its strong resistance to overfitting, which can be further optimized through hyperparameter tuning. Moreover, it maintains robust predictive stability and generalization performance, even in scenarios with relatively limited sample sizes. In this study, we set the dependent variable as the attractiveness score for different spaces in the historical town and analyzed 15 independent variables. The attractiveness score for each scene was the average of all valid ratings obtained for a given scene.
However, considering the important values obtained through the Random Forest algorithm and a scatter plot analysis of the attractiveness of each element, we could only judge the correlation between a particular element’s proportion and spatial attractiveness on a macro level, without explaining the contribution of each. To facilitate the analysis and discussion of each variable’s specific performance and contribution in different scenes, we introduced SHAP to explain the contribution of each element to the prediction result after excluding interference from other elements [103]. This method has been widely applied in urban environments [104].

4. Results

4.1. Validation of the Attractiveness Model

The Intraclass Correlation Coefficient (ICC) is a statistical method used to measure the consistency of ratings applied by multiple raters to the same target. Through variance decomposition, it quantifies the proportion of variance attributable to differences between raters relative to the total variance. The ICC is commonly applied in reliability assessment, particularly in multi-rater studies, in measurement tool validation, and in experimental reproducibility analyses. Additionally, the ICC is useful for evaluating questionnaire reliability while accounting for individual differences in rating tendencies. As shown in Table 4, the ICC(1,k) (one-way random effects model), ICC(2,k) (two-way random effects model), and ICC(3,k) (two-way mixed effects model) values are 0.740, 0.754, and 0.799, respectively. These values indicate good reliability, demonstrating that the participants’ ratings of scene attractiveness exhibit a high level of consistency. Furthermore, this reliability is statistically significant.
As shown in Table 5, the attractiveness model reveals that the most important variable is water, whose value was re-adjusted to 1 in order to calculate the weights of other variables. The value of variable importance indicates how well a variable describes the strength of visual attractiveness.
Table 6 presents the performance results for the Random Forest regression model for both the training and testing datasets. R2 is a key indicator for assessing the goodness of fit; the closer the value is to 1, the greater the explanatory power of the model with respect to the data. In this study, the R2 value of the regression model with respect to the training set was 0.8358. Although it decreased for the testing set (0.7487), the model still performed well. Other auxiliary performance metrics, such as MSE (Mean Squared Error) and MAE (Mean Absolute Error), also provide insight into a model’s performance. For instance, the MSE, which represents the average squared difference between predicted and actual values, exhibited relatively low values, indicating that the model fits the training data well and has smaller prediction errors. Moreover, although the testing set has higher RMSE (Root Mean Squared Error), MAE, and MAPE (Mean Absolute Percentage Error) values than the training set, indicating a slight decline in model performance, the overall performance remains good, demonstrating strong reliability and stability for practical applications.

4.2. Spatial Distribution

In this study, panoramic image-capture points were established based on real-world conditions and categorized into five types according to their spatial locations: lane intersections, squares, lanes, bridges, and waterfront platforms. Overall, viewpoints located on waterfront platforms and bridges exhibited significantly greater spatial attractiveness compared to those at lane intersections and plazas, while those along lanes showed a wider range of attractiveness scores (Figure 8).
Figure 9 illustrates the distribution of the proportions of environmental elements at different locations. Viewpoints on waterfront platforms and bridges typically featured a higher proportion of visible water and broader panoramas, both of which increase spatial attractiveness for visitors. However, when comparing the two types of viewpoints (Figure 10a,b), it becomes apparent that although the proportions of buildings are similar, viewpoints from bridges primarily showcase rooftops. Additionally, both types of viewpoints present higher proportions of ground and bridge elements.
Lane viewpoints may have had relatively higher ratings than those at lane intersections and squares, because several of these points are located along the riverbank, where water is visible (Figure 10c). Moreover, many of these highly rated lane viewpoints include significant symbolic elements of Nanxun Old Town, such as lanterns and Meirenkao (a traditional leaning fence). In contrast, lower-rated viewpoints were typically found in enclosed lanes flanked by buildings on both sides (Figure 10d), where negative elements such as an overabundance of shop signs are more prevalent.
The distribution of environmental elements at lane intersections and plaza viewpoints showed notable similarities, namely a high proportion of buildings and ground surfaces and an almost negligible presence of water surfaces. This composition resulted in consistently lower attractiveness ratings for both types of locations. However, the viewpoints at intersections displayed a relatively higher proportion of bridges, a fact more clearly illustrated in the photo comparisons between the two categories.

4.3. Importance and Correlation Analysis

The elements’ importance can be demonstrated from a global perspective in terms of their influence on attractiveness ratings. Overall, space-defining elements tend to occupy a larger visual proportion and exhibit higher importance in influencing attractiveness evaluations, whereas foreground elements show the opposite pattern. Among the space-defining elements, buildings, ground, sky, and water rank among the top five in terms of visual proportion (Figure 9), and their importance values stand out clearly from the rest. By contrast, the remaining elements, primarily foreground elements, generally account for less than 2% of the visual field, with relatively minor differences among them. Notably, water and riverbank achieve importance values exceeding 10, making them the most significant contributors to perceived attractiveness. Ground and sky also demonstrate a certain level of significance. In contrast, foreground elements such as bridges and boats show relatively lower importance values (all below 5). However, some exceptions are worth noting. First, although buildings have the highest proportion among space-defining elements, their importance in predicting spatial attractiveness is relatively low. Second, two foreground elements, vegetation and people, deviate from the general trend. Vegetation, like buildings, occupies a relatively high proportion but shows limited importance. In contrast, people, despite accounting for less than 2% of the visual field on average, demonstrate a higher importance than both buildings and sky.
The results of the correlation analysis further reveal potential relationships between the proportion of visual elements and spatial attractiveness from a global perspective (Figure 11). Firstly, elements such as water, lanterns, sky, people, and riverbanks display a positive correlation with attractiveness. Among them, lanterns and people show the strongest correlations, indicating that once these elements are present in a certain proportion, they exert a pronounced positive effect on perceived attractiveness. Conversely, an increase in the proportion of buildings and shop signs is associated with a consistent decline in spatial attractiveness. Additionally, water, boats, and ground exhibit an inverted U-shaped relationship with attractiveness: at low proportions, they have a clearly positive effect, but as their proportions increase, the marginal gains in attractiveness diminish and may even become negative. Specifically, water and ground reach peak attractiveness at around 15% and 10%, respectively. This indicates the existence of optimal proportion ranges, where higher proportions do not necessarily lead to greater attractiveness. Moreover, scatterplots depicting the relationship between bridges, fences, and other elements and spatial attractiveness reveal fluctuations. Taking bridges as an example, the attractiveness scores show reversals at around 5% and 20%, following a pattern of declining, rising, and subsequently declining, indicating multiple shifts in the influence of the element across different proportion levels. Lastly, for elements such as vegetation, signboards, and pots, the scatterplots reveal widely dispersed data points with no evident upward or downward trend, suggesting that these elements do not exert a significant overall influence on perceived spatial attractiveness.

4.4. SHAP Values of Environmental Elements

The SHAP summary plot in Figure 12 illustrates each element’s contribution to the model’s predictions, along with the distribution of their values. The confidence intervals (indicated by the red areas in the figure) confirm that compared to correlation, SHAPs based on machine learning exhibit greater robustness and stability, offering a more reliable, mechanism-level interpretation. From the SHAP values’ horizontal spread, it is evident that different visual elements vary significantly in their importance in regard to spatial attractiveness. The top five most influential elements are water, riverbanks, ground, people, and sky. From the perspective of element value distribution, higher proportions of water, riverbanks, and sky (represented by red dots) are primarily located on the right side of the plot, where the SHAP values are positive. This indicates that greater proportions of these elements contribute positively to attractiveness ratings. In contrast, higher ground and bridge values are concentrated on the left side, showing a negative contribution. Additionally, elements such as ground, people, and shop signs exhibit a more dispersed, bidirectional distribution, suggesting that their impact on spatial attractiveness fluctuates and is dependent on both their proportion and their interaction with other elements.
More specifically, the relationship between element proportion and SHAP value is nonlinear, as shown in Figure 13. Firstly, the SHAP values of water, riverbanks, lanterns, and people increase with their respective proportions, indicating that higher proportions enhance their contribution to attractiveness. However, for water and riverbank, the SHAP values eventually plateau, suggesting diminishing marginal returns when these elements are overly abundant. Notably, water, lanterns, and people exhibit negative SHAP values at very low proportions, indicating that their presence does not always enhance spatial attractiveness, especially when minimal or poorly integrated. Unlike the gradual increases observed in the previous elements, the SHAP values for boats, fences, and sky rise sharply and then stabilize as their proportions increase. In other words, once these elements are present in sufficient quantities, they contribute consistently and positively to attractiveness. This implies that for these particular elements, the presence or absence (“whether or not”) is more impactful than the exact quantity (“how much”). Furthermore, the SHAP values for bridges continuously decline with increasing proportion, shifting from a positive to a negative contribution. This trend suggests that as the area occupied by bridge elements grows, their positive effect on attractiveness diminishes and eventually becomes negative, implying that the impact of bridges may vary depending on the specific visual context in which they appear. Lastly, the SHAP values for ground, vegetation, signboards, and other elements follow a fluctuating pattern—rising, then falling, and then rising again—whereas the SHAP values for shop signs and buildings show the opposite trend. These fluctuations reveal that the contribution of such elements to spatial attractiveness varies across different proportion levels. For instance, ground surfaces reach their maximum positive contribution at around 10% coverage, but their SHAP value reverses to its most negative point at approximately 35% coverage.

5. Discussion

5.1. Variations in Space-Defining Element Proportions Leading to Different Spatial Imagery

Elements such as water, riverbank, ground, and building account for a large proportion and exert significant influence, functioning as fundamental components in defining space within traditional water towns. These space-defining elements generally contribute positively to spatial attractiveness; however, their proportions must be carefully controlled within a certain range to avoid disrupting the spatial scale and undermining the spatial imagery of the water town.
As reported in previous qualitative studies [105], water was confirmed to be a key element of traditional water towns. The correlation between water proportion and SHAP values closely mirrors that for spatial attractiveness ratings. A strong association was also observed between water and riverbanks, likely because of their frequent co-occurrence within scenes, leading to their synchronized influence on attractiveness assessments. Nevertheless, the relationship between water proportion and attractiveness is not strictly linear. When the water proportion is low, it may exert a negative influence, whereas within the range of 5–15%, its positive contribution increases with its proportion. Furthermore, the water proportion—as seen in the SHAP value graph (Figure 13) and water proportion–attractiveness graph (Figure 11)—reveals an inverted U-shaped pattern, suggesting that more expansive water surfaces are not necessarily more attractive. The street-view image (Figure 14a) further illustrates that in scenes with low water proportion, reflections dominate and the color appears darker; these conditions deviate from the ideal imagery of bright and open waters [106]. Conversely, in streetscapes featuring waterside platforms (Figure 14b), an excessively high water proportion may cause spatial imbalance and lead to a lack of spatial layering. Additionally, an overabundance of water may displace other key spatial elements, thereby undermining the integrated aesthetic and identity of the traditional water town. To explain this phenomenon, we refer to the theories proposed by Kaplans and Nasar, both of whom emphasize that environments with a moderate level of visual complexity tend to be more appealing to observers [107,108]. In this case, an overly high water proportion may result in visual monotony and a lack of spatial enclosure, whereas too little water may weaken the spatial identity and symbolic cultural character of water towns. The observed water proportion of around 15% appears to represent an optimal balance—one that sustains spatial imagery, evokes cultural associations, and preserves structural coherence.
Similar characteristics can be observed in the analysis of the ground. Notably, the ground shows a positive impact even at lower proportions, with its contribution increasing steadily and peaking at around 10%. The streetscape imagery at this stage (Figure 14c) reveals that the spatial scale of the street is relatively appropriate, aligning with the narrow alleyways typical of Jiangnan water towns [109]. However, beyond this point, the contribution declines, eventually exerting a significantly negative influence. At this stage, the spatial scale of the streets becomes excessively large, eroding the distinctive features of traditional water towns (Figure 14d). Similarly, when the proportion of the sky element reaches around 10%, its SHAP value also crosses a noticeable threshold, after which it begins to exert a stable positive effect. Meanwhile, as they jointly shape the streetscape space, buildings are closely linked with the ground. This relationship is reflected in their correlated functional mechanisms. At lower building proportions, the spatial scale of the street appears well balanced. However, when the building proportion exceeds approximately 70%, the street becomes overly congested [60].

5.2. Proper Organization of Foreground Elements to Enhance Water Town Atmosphere

Although foreground elements like lanterns, people, boats, fences, and shop signs account for relatively small proportions of the spatial compositions, they still exert a notable influence and serve as key symbols within traditional water towns. These elements often contribute to the overall atmosphere of the environment and, when appropriately combined with space-defining elements, enhance spatial attractiveness.
Lanterns and people exhibit the most significant impact on spatial attractiveness and show similar trends in the correlation diagrams. Lanterns were confirmed to have a substantial positive effect on attractiveness once their proportion surpassed a certain threshold, a finding that is consistent with expectations [110]. As shown in Figure 13, when the proportion of lanterns is low, they may exert a slight negative effect; however, for scenes where their presence is more prominent, the attractiveness ratings were generally higher. It can be reasonably inferred that a small number of lanterns may not significantly enhance appeal, while a more systematic and scaled arrangement of lanterns is far more compelling than isolated placements. Moreover, the observation of the scene images (Figure 14e) reveals that in highly rated scenarios, lanterns often contrast sharply with white building façades, further accentuating their vibrant red coloration. In contrast to some previous studies that raise concerns about crowded environments [111], in this study, the presence of people is shown to have a clearly positive influence on perceived attractiveness. This may be because lanes interspersed with pedestrians evoke a sense of everyday vitality and liveliness, something which is lacking in empty or deserted streetscapes (Figure 14f,g).
Secondly, although the SHAP values for boats and fences vary significantly with their proportion, their correlation with attractiveness scores is less direct. Both elements exhibit a pronounced threshold effect: when their proportions are low, they either contribute minimally or even negatively to spatial attractiveness. However, once their proportions increase slightly, their SHAP values rise sharply and then level off into a plateau. This phenomenon can be interpreted through the Weber–Fechner law [77], which quantifies the relationship between the physical intensity of a stimulus and its perceived magnitude. When the visual proportion of an environmental element is extremely small, its stimulus intensity tends to fall below the just-noticeable difference (JND) and is therefore imperceptible. Once this threshold is exceeded, the perceived effect rises rapidly but logarithmically, so that additional increases yield progressively smaller marginal gains. This finding suggests that for such symbolic elements in Jiangnan water towns, the mere presence of the element is more important than its quantity, underscoring their symbolic value in evoking imagery such as “boats navigating water”, reinforcing a traditional water town feel [39]. However, it is also worth noting that beyond a certain point, the attractiveness score associated with an increasing boat presence begins to decline. This trend may be attributed to the visual obstruction caused by the presence of too many boats, which diminishes the visibility and openness of the water surface, thereby disrupting the optimal balance.
In particular, the positive effect of fences in traditional towns contrasts with findings from urban studies, where fences are often perceived as barriers to walkability [112]. Panoramic images indicate that fences in water towns typically manifest as traditional architectural features known as Meirenkao (Figure 14h). As traditional symbols, these fences are decorative and are generally well-received by visitors for their cultural connotations. By contrast, shop signs can contribute positively to street atmosphere when present in moderation, harmonizing with the surrounding architecture to create a vibrant, everyday ambiance. However, an excessive number of shop signs may raise concerns among visitors about over-commercialization [113]. This observation helps to explain the overall decline in SHAP values for shop signs as their proportional presence increases.

5.3. Viewing Distance and Angle Influencing the Perceived Attractiveness of Environmental Elements

Elements such as bridges, buildings, and vegetation demonstrate multiple performance characteristics; thus, their impact on spatial attractiveness cannot be evaluated solely based on their proportional presence. Environmental elements only contribute positively to spatial perception when observed from appropriate angles or presented with specific attributes that align with traditional water town imagery.
Firstly, based on prior research, bridges are widely recognized as iconic elements of water towns [39]. While the small proportion of bridges in this study contributed positively to spatial attractiveness, their increased presence resulted in a markedly negative effect. The street-view images show that when bridges occupy a smaller proportion of an image (below 1%) as a foreground element, visitors can perceive them as symbolic representations of water-town identity (Figure 14i). However, when the proportion increases (over 1%), the primary visual component encountered is the sloped surface of the bridge, which becomes a space-defining element (Figure 14j). Further analysis reveals that at intersections or nodal points, bridges are perceived as stepped pathways. This perception aligns with urban studies that have shown that slopes and stairs may cause discomfort for pedestrians, significantly reducing the attractiveness ratings of such locations [74]. From viewpoints situated on the bridge itself, the element is merely perceived as ground and therefore does not contribute to enhanced spatial attractiveness.
Secondly, in addition to contributing to the spatial scale of streetscapes, buildings may display different visual characteristics at varying proportions, potentially explaining the gradual decline in their SHAP values. For example, in the scenes captured from bridge-level perspectives (Figure 14k), when the building proportion is low, the viewpoint is generally farther away, and architectural elements are simplified into stylized “white walls and dark-tiled roofs”, an impression consistent with earlier studies [114]. However, when the building proportion is high, visitors are exposed to detailed and close-range architectural components, whose varied visual characteristics elicit different perceptual responses [115]. Moreover, widespread instances of modern-style buildings constructed in the past two decades—often the result of commercial development and modernization—can compromise the authenticity of the traditional architectural landscape. Such elements may interfere with visitor perceptions [116] and ultimately reduce the overall attractiveness of a scene (Figure 14l).
Additionally, the influence of vegetation on spatial attractiveness also varies considerably depending on the observer’s position. Unlike in urban environments [74], vegetation in traditional towns is often perceived as individual plants when observed from a distance, thus enriching the environment. When one is surrounded by plants, however, vegetation tends to form continuous green masses that lack depth and layering—a finding consistent with previous studies [117].

5.4. Policy Recommendations

Research on the impact of various elements on spatial attractiveness can provide valuable guidance for the management and design of public spaces within water towns. First, spatial attractiveness can be enhanced by ensuring that space-defining elements are present in appropriate proportions, thereby creating an ancient town environment that balances openness and enclosure. This study quantitatively confirms the importance of visible water in shaping the spatial perception of the water town. Accordingly, increasing the number and accessibility of waterfront platforms and improving water visibility within street corridors can be effective tools for planners to enhance spatial attractiveness [85]. Meanwhile, international cases within historic districts also reflect this approach. For example, in the 2025 Venice Biennale, Guendalina Salimei and Carlo Ratti created interactive spaces between people and water to enhance spatial appeal in historical contexts. However, the proportion of visible water must be carefully controlled. Both insufficient and excessive water exposure can cause spatial discomfort. Furthermore, ground and buildings jointly shape visitors’ perceptions of street space. It is essential to regulate street scale appropriately, as an overly large proportion of ground relative to buildings may lead to a loss of spatial enclosure and undermine the narrow-street imagery typical of water towns, while the opposite may result in an overly oppressive environment. Additionally, there are potential conflicts between maximizing water visibility and preserving the spatial imagery of historic streets. Streets that are overly open to water may disrupt the perceptual coherence and structure of water town space. Therefore, decision-makers must strike a balance; selective integration of waterfront platforms should be selectively integrated to enhance the water experience without compromising spatial imagery, while preserving appropriate street proportions and enclosures.
Second, spatial atmosphere can be enhanced through the careful combination of foreground elements to collectively create a characteristic water town image. We found that boats and fences are significant symbolic elements that exhibit threshold effects. These elements should be selectively placed as points of visual focus and coordinated with water and architectural features, respectively, to avoid visual obstruction due to overuse. Lanterns serve as important cultural symbols, and our research shows that a strategic arrangement, particularly when contrasted with white walls, has a strong visual impact. Similarly, shop signs can enhance the street atmosphere, but only when used with restraint. However, our findings also reveal that lanterns, which have demonstrated clear positive effects, appear in no more than 6% of the image area. As a result, it is crucial not to overlook the risk that excessive use may also lead to perceptions of over-commercialization. Therefore, to maintain the authenticity of water towns, the proportion and stylistic harmony of lanterns and shop signs with surrounding architecture must be carefully managed. We recommend collaborating with local urban planners to develop design guidelines, such as setting limits on lantern density and shop sign quantity, and encouraging the use of traditional styles and materials. Moreover, an analysis of human presence provides useful implications for promotional materials: street scenes including people are more appealing to tourists than empty streets, underscoring the importance of conveying vibrancy and life.
Third, spatial attractiveness can be further improved by managing the visual characteristics of elements and highlighting them from optimal viewing angles. Bridges and buildings are key cultural symbols of ancient towns, but their visual impact often falls short of expectations because of limited or suboptimal viewing perspectives. Hence, designated viewing spots should be incorporated in tourism route planning to guide visitors toward the best vantage points for appreciating bridges and architectural features. Additionally, we found that vegetation is more effective when perceived as individual plants as opposed to large clusters. Therefore, vegetation should be used primarily as a decorative element to avoid detracting from other key elements.

5.5. Limitations

Our study has some limitations that should be addressed in future research. First and foremost, the number and density of selected observation points in this study remain insufficient. As a result, some environmental elements suffered from a lack of representative samples and were more susceptible to noise interference, which led to suboptimal model fitting. Future research should consider expanding the sample size to enhance the robustness and validity of the analysis. In addition, considering the significant visual differences between day- and nighttime street scenes, future studies should also extend the image dataset to include night views and develop corresponding research methods to investigate the relationship between lighting conditions and perceived spatial attractiveness at night.
Secondly, there is room for a more detailed classification of certain features. For example, this study overlooked the architectural diversity within such towns, including components such as roofs, walls, and doors/windows. Moreover, the building category also encompasses modern structures surrounding the old town, raising concerns regarding the authenticity of visual representation. In addition, the potential differences in the influence of ordinary buildings versus landmark structures were not analyzed. All these detailed classifications may have a significant impact on spatial attractiveness for tourists. However, due to the macro-level perspective of this study and the limited sample size of special building types, a thorough analysis of these factors was beyond the current scope. Therefore, future research should focus on different building types and individual architectural components, further subdivide architectural elements, and specifically examine their respective influences on spatial attractiveness.
Lastly, the attractiveness ratings of panoramic images at each observation point were obtained through mobile-based questionnaire surveys. However, this method is inevitably influenced by the limitations of how street views are presented. Incorporating VR devices to dynamically simulate the street-view experience, along with the application of the sensoryscape paradigm, could significantly enhance the realism of tourist perception [118,119]. Nonetheless, due to current constraints in experimental equipment and personnel, we are not yet able to conduct such advanced experiments. In future studies, we plan to first employ eye-tracking technology to objectively capture the visual attractiveness of different environmental elements. Additionally, we aim to model open street areas in Nanxun Old Town to explore tourists’ dynamic experience. Further research will also incorporate multisensory influences to provide a more holistic understanding of spatial perception.

6. Conclusions

In this study, we constructed a model to evaluate the spatial attractiveness of traditional Jiangnan water towns by integrating street-view pictures, image semantic segmentation, and data analysis. This model establishes a macro-level correlation between the proportions of various environmental elements and the overall attractiveness ratings. Furthermore, by introducing SHAP values, this study reveals that the influence of environmental elements on spatial attractiveness is nonlinear with respect to their visual proportions.
The findings demonstrate that different environmental elements contribute in distinct ways to the shaping of ancient town imagery, thereby influencing spatial attractiveness. Firstly, space-defining elements such as water, ground, buildings, and riverbanks need to be maintained within optimal proportions to create a spatial scale that aligns with the imagery of traditional towns. Specifically, water is the most positive and significant element in the context of water towns, while an excessive proportion may result in a sense of emptiness and lack of spatial layering. Similarly, SHAP values of the ground and building elements exhibit nonlinear relationships: moderate proportions help form a pleasant street scale, while an excessive presence can lead to a feeling of crowding or emptiness. Secondly, foreground elements such as lanterns, people, boats, fences, and shop signs, though accounting for relatively low visual proportions, can enhance attractiveness when appropriately combined. For instance, the contrast between lanterns and white walls creates a striking visual impact, while the presence of people adds vibrancy and a sense of life to street scenes. Boats and fences help to evoke the imagery of traditional water towns through their interaction with water and architecture, respectively; however, an overabundance may obstruct views or create negative visual impressions. Additionally, a moderate presence of shop signs can convey a lively and authentic commercial atmosphere, but an excessive quantity of shop signs may clash with architectural aesthetics and foster a sense of over-commercialization. Finally, the attractiveness of elements such as bridges, buildings, and vegetation depends heavily on specific viewing distance and angle. Bridges can effectively represent the water town image when viewed from optimal angles, whereas steep perspectives may diminish their appeal. Buildings exhibit varied appearances at different viewing distances, and a mixture of old and new structures can affect the overall coherence of the visual experience. Vegetation similarly influences attractiveness depending on an observer’s position and visual presentation.
This study proposes a new analytical framework for understanding how environmental elements influence tourists’ spatial perception in traditional water towns. By incorporating interpretable modeling techniques, the research reveals that the effects of the elements are context-dependent and nonlinear, varying with their proportions, combinations, and viewing perspectives. Compared to previous studies that focused primarily on the descriptive representations of spatial elements, this approach offers a more comprehensive explanation of the mechanisms through which these elements shape spatial attractiveness. These findings highlight the importance of interpretability in spatial perception research and provide a credible and actionable foundation for managing historic town environments in a sustainable manner.

Author Contributions

Conceptualization, C.X. and H.C.; methodology, C.X. and H.C.; software, H.C. and X.Y.; validation, C.X., H.C. and Z.X.; formal analysis, C.X., H.C. and Z.X.; resources, C.X. and Z.X.; data curation, H.C., X.Y. and Z.W.; writing—original draft preparation, C.X., H.C. and X.Y.; writing—reviewing and editing, C.X. and Z.X.; visualization, H.C. and Z.W.; supervision, C.X. and Z.X.; project administration, Z.X.; funding acquisition, C.X. and Z.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Jiangsu Province (BK20211315), the Humanities and Social Science Project of the Ministry of Education (18YJCZH195), the Scientific Research Fund of Zhejiang Provincial Education Department (Y202454182), and the Science and Technology Bureau Foundation of Huzhou (2021GZ12).

Institutional Review Board Statement

This study was reviewed and approved by Huzhou University’s Institutional Review Board (IRB). (code:202509-01, 4 September 2024). Our research was conducted in accordance with the ethical standards outlined in the Declaration of Helsinki, and all participants provided informed consent prior to participation.

Informed Consent Statement

Informed consent was obtained from all participants involved in this study. Participants were fully informed of the nature and purpose of the research, the procedures they would undergo, any potential risks or discomforts, and their right to withdraw from this study at any time without penalty. The confidentiality and anonymity of participants were ensured, and data were collected and stored in a manner that protects privacy.

Data Availability Statement

The original contributions presented in this study are included in the article; further inquiries can be directed to the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhao, Y.; Zhang, J.; Lu, S.; Liu, Z. Further study on the evaluation index system for historic villages and townships taking the second group of Chinese historic townships (villages) as example. Archit. J. 2008, 55, 64–69. (In Chinese) [Google Scholar] [CrossRef]
  2. Zhang, L.; Yang, K.; Liu, B.; Liu, S. A study on the regional characteristics perception of ancient towns in the south of the Yangtze river based on different perspectives of tourists and residents: A case study of Tongli Ancient Town. Chin. Landsc. Archit. 2019, 35, 10–16. (In Chinese) [Google Scholar] [CrossRef]
  3. Kim, J.-H.; Ritchie, J.R.B. Cross-cultural validation of a memorable tourism experience scale (MTES). J. Travel Res. 2014, 53, 323–335. [Google Scholar] [CrossRef]
  4. Lew, A.A. A framework of tourist attraction research. Ann. Tour. Res. 1987, 14, 553–575. [Google Scholar] [CrossRef]
  5. Shi, Z.; Qi, W. Research on tourists’ real perception, satisfaction, and willingness to revisit in imitation ancient town tourist attraction. Tour. Res. 2024, 16, 45–55. (In Chinese) [Google Scholar] [CrossRef]
  6. Zhao, S.; Marzuki, A.; Rong, W.; Ran, X. An empirical application of the consumer-based authenticity model in heritage tourism of the George Town historic district, Penang, Malaysia. Heliyon 2024, 10, e38254. [Google Scholar] [CrossRef]
  7. Zhang, F.; Ye, T.; Liu, Q. Vitality construction of traditional waterfront blocks from the perspective of “space behavior”, taking the traditional waterfront blocks in the ancient city of Suzhou as an example. Chin. Anc. City 2020, 34, 45–51. (In Chinese) [Google Scholar] [CrossRef]
  8. Zhou, X.; Cui, L.; Chen, S.; Tan, Z. The experience evaluation of urban landscape of historical & cultural blocks based on eye-tracking: A case study of Yongqingfang, Guangzhou. Chin. Landsc. Archit. 2023, 39, 54–57. (In Chinese) [Google Scholar] [CrossRef]
  9. Song, Z.; Liu, J.; Ge, G. Evaluation of spatial visual attraction of Huayang Ancient Town landscape based on analytic hierarchy process. Urban. Archit. 2019, 16, 100–101, 113. (In Chinese) [Google Scholar] [CrossRef]
  10. Liu, L.; Sevtsuk, A. Clarity or confusion: A review of computer vision street attributes in urban studies and planning. Cities 2024, 150, 105022. [Google Scholar] [CrossRef]
  11. Lynch, K. The Image of the City; MIT Press: Cambridge, MA, USA, 1960. [Google Scholar]
  12. Rossi, A. L’architettura Della Città; II Saggiatore: Milan, Italy, 2018. (In Italian) [Google Scholar]
  13. Venturi, R.; Brown, D.S.; Izenour, S. Learning from Las Vegas: The Forgotten Symbolism of Architectural Form; MIT Press: Cambridge, MA, USA, 1977. [Google Scholar]
  14. Kaplan, S.; Kaplan, R. Cognition and Environment: Functioning in an Uncertain World; Praeger: New York, NY, USA, 1982. [Google Scholar]
  15. Hildebrand, G. Origins of Architectural Pleasure; University of California Press: Berkeley, CA, USA, 1999. [Google Scholar]
  16. Relph, E. Place and Placelessness; Pion: London, UK, 1976. [Google Scholar]
  17. Rapoport, A. The Meaning of the Built Environment: A Nonverbal Communication Approach, 2nd ed.; University of Arizona Press: Tucson, AZ, USA, 1990. [Google Scholar]
  18. Proshansky, H.M.; Fabian, A.K.; Kaminoff, R. Place-identity: Physical world socialization of the self. J. Environ. Psychol. 1983, 3, 57–83. [Google Scholar] [CrossRef]
  19. Davidson, J.; Milligan, C. Embodying emotion, sensing space: Introducing emotional geographies. Soc. Cult. Geogr. 2004, 5, 523–532. [Google Scholar] [CrossRef]
  20. Ma, T.; Li, X.; Xie, Y. Old wine in new bottle? The myth of tourist satisfaction measurement. Tour. Trib. 2017, 32, 53–63. (In Chinese) [Google Scholar] [CrossRef]
  21. Hyunju, J.; Harumi, I.; Yoshifumi, M. Physiological and psychological benefits of viewing an autumn foliage mountain landscape image among young women. Forests 2022, 13, 1492. [Google Scholar] [CrossRef]
  22. Sun, D. (Ed.) A Study on the Landscape Imagery of Linpan Settlements in Western Sichuan; Sichuan University Press: Chengdu, China, 2019; p. 1. (In Chinese) [Google Scholar]
  23. Wang, K.; Meng, C.Y.; Lin, Q. Landscape image perception of ancient towns along the Grand Canal: Analysis based on network text data. Archit. Creat. 2024, 36, 184–191. (In Chinese) [Google Scholar]
  24. Li, Y.; Chen, X.; Liu, P.; Huang, G. Research on image perception of heritage tourist destinations from the three-dimensional perspective of “cognition-emotion-integrity”: Taking Xiangjiang Ancient Town group as an example. Hum. Geogr. 2021, 36, 167–176. (In Chinese) [Google Scholar] [CrossRef]
  25. Wu, C.; Shao, X. Analysis of tourism image perception of ancient villages based on UGC and questionnaire data—A case of Qikou Ancient Town. J. Arid Land Resour. Environ. 2020, 34, 195–200. (In Chinese) [Google Scholar] [CrossRef]
  26. Wu, L.; Huang, Z.; Tan, Z.; Hou, B.; Chen, X. Tourism impact perception of residents in jiangnan ancient town and its formation mechanism: A case study of Tongli. Hum. Geogr. 2015, 30, 143–148. (In Chinese) [Google Scholar] [CrossRef]
  27. Huang, H.; Zhang, W.; Zhao, M.; Zou, Y.; Zhang, Q. Exploration of image cognitive differentiation and social association in semi-urbanization areas: A case study based on the ancient town of Shawan in Guangzhou. South Archit. 2023, 43, 24–34. (In Chinese) [Google Scholar] [CrossRef]
  28. Chen, Q.; Han, G. Typical tourism image elements of Sanhe Ancient Town analyzed from the tourist gaze perspective: Based on the zaltman metaphor elicitation technique. J. Southwest Jiaotong Univ. (Soc. Sci.) 2023, 24, 89–103. (In Chinese) [Google Scholar] [CrossRef]
  29. Xu, X.; Xu, C.; Gao, J. Using VEP method to understand tourists’ perception on ancient water towns in Southern Yangtze River: A case study of Shanghai Fengjing Ancient Town. Areal Res. Dev. 2017, 36, 121–126. (In Chinese) [Google Scholar] [CrossRef]
  30. Dong, X.; Sun, H.; Liang, J. The Non-Visual perceptions: Foundation and reconstruction of the genius loci—A case of Yunshuiyao Old Town in Fujian. Hum. Geogr. 2022, 19, 32–35. (In Chinese) [Google Scholar] [CrossRef]
  31. Shen, H.; Aziz, N.F.; Lv, X. Using 360-degree panoramic technology to explore the mechanisms underlying the influence of landscape features on visual landscape quality in traditional villages. Ecol. Inform. 2025, 86, 103036. [Google Scholar] [CrossRef]
  32. Shen, H.; Aziz, N.F.; Huang, M.; Yu, L.; Liu, Z. Tourist perceptions of landscape in Chinese traditional villages: Analysis based on online data. J. Tour. Cult. 2024, 22, 232–251. [Google Scholar] [CrossRef]
  33. Shen, H.; Aziz, N.F.; Omar, S.I.; Huang, M.; Zhang, X.; Yu, L. Tourism’s impact on visual landscape: Residents’ perceptions from a traditional Chinese village. Pol. J. Environ. Stud. 2024, 33, 4707–4719. [Google Scholar] [CrossRef]
  34. Xiao, J.; Zhang, R.; Liu, S.; Liu, S.; Jia, K. The landscape perception study and flowing scenes setting of historic towns based on the ‘Bag of Photographic Words’ Model: A case study of Qingyan Historic Town. Dev. Small Cities 2023, 41, 22–32, 40. (In Chinese) [Google Scholar] [CrossRef]
  35. Zhao, Y.; Lin, J.; Liu, Y. Research on visual evaluation of tourism scenery based on eye movement experiment: A case of Tangjia Ancient Town in Zhuhai. Hum. Geogr. 2020, 35, 130–140. (In Chinese) [Google Scholar] [CrossRef]
  36. Li, M.; Yan, Y.; Ying, Z.; Zhou, L. Measuring villagers’ perceptions of changes in the landscape values of traditional villages. ISPRS Int. J. Geo-Inf. 2024, 13, 60. [Google Scholar] [CrossRef]
  37. Bi, F.; Liu, J.; Lin, B. Research on the construction strategy of cultural tourism space based on landscape visual attraction—A case study of Guanzhong historical and cultural Block. Chin. Landsc. Archit. 2023, 39, 84–89. (In Chinese) [Google Scholar] [CrossRef]
  38. Wang, X.; Jia, D. The analysis of contemporary on constructing strategy in traditional settlements and architectural space: The thought of water system of landscape and architecture space changes in Huizhou Hongcun. Huazhong Archit. 2011, 29, 83–85. (In Chinese) [Google Scholar] [CrossRef]
  39. Zhang, L.; Zhang, B.; Kou, H. Tourist perception of ancient town landscape in the area south of Yangtze River based on NLP of online comments Data. Chin. Urban For. 2022, 20, 125–132. (In Chinese) [Google Scholar] [CrossRef]
  40. Tang, S.; Zhou, S. Roles of text in placeness construction: Analysis on core literature of cultural geography in recent years. Sci. Geogr. Sin. 2011, 31, 1159–1165. (In Chinese) [Google Scholar] [CrossRef]
  41. Yu, G.; Chen, T.; Huang, L.; Xiao, J. The landscape characteristics and renovation strategies of historical neighborhoods in the ancient City of Shaoxing: A case of the typical historical neighborhoods. Urban. Archit. 2024, 21, 26–29. (In Chinese) [Google Scholar] [CrossRef]
  42. Han, F.; Ma, Y.; Huang, S.; Wang, S. The double construction of human settlements and regional culture: A review of domestic waterfront spaces research from the perspective of historical inheritance. Trop. Geogr. 2023, 43, 2369–2380. (In Chinese) [Google Scholar] [CrossRef]
  43. Zhao, H.; Sun, Y. Landscape and its topological regeneration of Nancheng Historical Blocks in Wuchang from the perspective of the “Embankment-Street” evolution. South Archit. 2024, 44, 69–77. (In Chinese) [Google Scholar] [CrossRef]
  44. Zhu, F.; Teng, Y. Research on Place-Making Approaches in Urban Waterfront Historical Districts: A Case Study of Qiaoxi, Dadou Road, and Xiaohezhi Street in Hangzhou. New Arts 2020, 41, 93–98. (In Chinese) [Google Scholar]
  45. Yang, C.; Zhou, Y.; Lu, D. Evaluation of the street space atmosphere of historic towns by semantic differential method: The case of Luocheng. New Archit. 2018, 36, 102–107. (In Chinese) [Google Scholar] [CrossRef]
  46. Zhou, J.; Hua, C.; Xia, X.; Wu, S. Perception of traditional settlement culture space based on network text content analysis: A case of Qiantong Traditional Town, Zhejing. Huazhong Archit. 2022, 40, 108–112. (In Chinese) [Google Scholar] [CrossRef]
  47. Meng, S.; Liu, C.; Zeng, Y.; Xu, R.; Zhang, C.; Chen, Y.; Wang, K.; Zhang, Y. Quality evaluation of public spaces in traditional villages: A study using deep learning and panoramic images. Land 2024, 13, 1584. [Google Scholar] [CrossRef]
  48. Ren, Y.; Leng, H.; Wang, Y.; Chen, C.; Zhou, J. Research on the reconstruction design of historical town’s street elevation with the unified synchronicity and diachronicity: Taking Sanjiang Avenue, Sanjiang Town, Nanchang County as an example. Decoration 2022, 65, 142–144. (In Chinese) [Google Scholar] [CrossRef]
  49. Jiang, X.; Zhang, X.; Qian, X.; Lin, Q.; Wang, X. Research on “authenticity” perceptual evaluation and renewal strategy of canal historical and cultural blocks (towns) based on multi-source data. Mod. Urban Res. 2021, 20–27, 37. (In Chinese) [Google Scholar] [CrossRef]
  50. Wang, Z.; Zeng, Z.; Yang, J.; Lin, R.; Xie, Y.; Li, X. Research on the perception of homesickness in ancient town landscape based on analysis of online texts and the IPA Model. J. Southwest Univ. (Nat. Sci. Ed.) 2023, 45, 210–218. (In Chinese) [Google Scholar] [CrossRef]
  51. Xu, M.; Lan, J. The study on influence factors of visitors dynamic in tourism town based on Geo-Detector Model: A case study in Wuzhen Xizha. Mod. Urban Res. 2023, 62–66, 73. (In Chinese) [Google Scholar] [CrossRef]
  52. Gao, W.; Jia, M.; Zhao, M.; Gao, Y.; Meng, H. Review of progress and quantitative measurement methods of research on street space. City Plan. Rev. 2022, 46, 106–114. (In Chinese) [Google Scholar]
  53. Ma, Z. Deep exploration of street view features for identifying urban vitality: A case study of Qingdao city. Int. J. Appl. Earth Obs. Geoinf. 2023, 123, 103476. [Google Scholar] [CrossRef]
  54. Jiang, Y.; Han, Y.; Liu, M.; Ye, Y. Street vitality and built environment features: A data-informed approach from fourteen Chinese cities. Sustain. Cities Soc. 2022, 79, 103724. [Google Scholar] [CrossRef]
  55. Alvarez Leon, L.; Quinn, S. The value of crowdsourced street-level imagery: Examining the shifting property regimes of OpenStreetCam and Mapillary. GeoJournal 2019, 84, 395–414. [Google Scholar] [CrossRef]
  56. Dai, M.; Ward, W.; Meyers, G.; Densley Tingley, D.; Mayfield, M. Residential building facade segmentation in the urban environment. Build. Environ. 2021, 199, 107921. [Google Scholar] [CrossRef]
  57. Ki, D.; Chen, Z.; Lee, S.; Lieu, S. A novel walkability index using Google Street View and deep learning. Sustain. Cities Soc. 2023, 99, 104896. [Google Scholar] [CrossRef]
  58. Wang, M.; Vermeulen, F. Life between buildings from a street view image: What do big data analytics reveal about neighbourhood organisational vitality? Urban Stud. 2021, 58, 3118–3139. [Google Scholar] [CrossRef]
  59. Gao, F.; Li, S.; Tan, Z.; Zhang, X.; Lai, Z.; Tan, Z. How is urban greenness spatially associated with dockless bike sharing usage on weekdays, weekends, and holidays? ISPRS Int. J. Geo-Inf. 2021, 10, 238. [Google Scholar] [CrossRef]
  60. Chen, C.; Li, H.; Luo, W.; Xie, J.; Yao, J.; Wu, L.; Xia, Y. Predicting the effect of street environment on residents’ mood states in large urban areas using machine learning and street view images. Sci. Total Environ. 2022, 816, 151605. [Google Scholar] [CrossRef]
  61. Yoo, E.-H.; Roberts, J.E.; Eum, Y.; Li, X.; Konty, K. Exposure to urban green space may both promote and harm mental health in socially vulnerable neighborhoods: A neighborhood-scale analysis in New York City. Environ. Res. 2022, 204, 112292. [Google Scholar] [CrossRef] [PubMed]
  62. Ito, K.; Biljecki, F. Assessing bikeability with street view imagery and computer vision. Transp. Res. Part C Emerg. Technol. 2021, 132, 103371. [Google Scholar] [CrossRef]
  63. Hoffmann, E.J.; Wang, Y.; Werner, M.; Kang, J.; Zhu, X.X. Model fusion for building type classification from aerial and street view images. Remote Sens. 2019, 11, 1259. [Google Scholar] [CrossRef]
  64. Schinasi, L.H.; Kanungo, C.; Christman, Z.; Barber, S.; Tabb, L.; Headen, I. Associations between historical redlining and present-day heat vulnerability, housing, and land cover characteristics in Philadelphia, PA. J. Urban Health 2022, 99, 134–145. [Google Scholar] [CrossRef]
  65. Huang, Z.; Qi, H.; Kang, C.; Su, Y.; Liu, Y. An ensemble learning approach for urban land use mapping based on remote sensing imagery and social sensing data. Remote Sens. 2020, 12, 3254. [Google Scholar] [CrossRef]
  66. Lu, Y. Using Google Street View to investigate the association between street greenery and physical activity. Landsc. Urban Plan. 2019, 191, 103435. [Google Scholar] [CrossRef]
  67. Sun, Q.; Macleod, T.; Both, A.; Hurley, J.; Butt, A.; Amati, M. A human-centred assessment framework to prioritise heat mitigation efforts for active travel at city scale. Sci. Total Environ. 2021, 763, 143033. [Google Scholar] [CrossRef]
  68. Wu, W.; Ma, Z.; Guo, J.; Niu, X.; Zhao, K. Evaluating the effects of built environment on street vitality at the city level: An empirical research based on spatial panel Durbin model. Res. Public. Health 2022, 19, 1664. [Google Scholar] [CrossRef]
  69. Li, D.; Deal, B.; Zhou, X.; Slavenas, M.; Sullivan, W.C. Moving beyond the neighborhood: Daily exposure to nature and adolescents’ mood. Landsc. Urban Plan. 2018, 173, 33–43. [Google Scholar] [CrossRef]
  70. Koo, B.; Guhathakurta, S.; Botchwey, N. How are neighborhood and street-level walkability factors associated with walking behaviors? A big data approach using street view images. Environ. Behav. 2022, 54, 211–241. [Google Scholar] [CrossRef]
  71. Gong, F.-Y.; Zeng, Z.-C.; Zhang, F.; Li, X.; Ng, E.; Norford, L.K. Mapping sky, tree, and building view factors of street canyons in a high-density urban environment. Build. Environ. 2018, 134, 155–167. [Google Scholar] [CrossRef]
  72. Nagata, S.; Nakaya, T.; Hanibuchi, T.; Amagasa, S.; Kikuchi, H.; Inoue, S. Objective scoring of streetscape walkability related to leisure walking: Statistical modeling approach with semantic segmentation of Google Street View images. Health Place 2020, 66, 102428. [Google Scholar] [CrossRef] [PubMed]
  73. Lu, Y.; Chen, H.-M. Using Google Street View to reveal environmental justice: Assessing public perceived walkability in macroscale city. Landsc. Urban Plan. 2024, 244, 104995. [Google Scholar] [CrossRef]
  74. Ito, K.; Bansal, P.; Biljecki, F. Examining the causal impacts of the built environment on cycling activities using time-series street view imagery. Res. Part Policy Pract. 2024, 190, 104286. [Google Scholar] [CrossRef]
  75. Kruse, J.; Kang, Y.; Liu, Y.-N.; Zhang, F.; Gao, S. Places for play: Understanding human perception of playability in cities using street view images and deep learning. Comput. Environ. Urban Syst. 2021, 90, 101693. [Google Scholar] [CrossRef]
  76. Treisman, A.; Gelade, G. A feature-integration theory of attention. Cogn. Psychol. 1980, 12, 97–136. [Google Scholar] [CrossRef]
  77. Fechner, G. Elemente der Psychophysik; Breitkopf & Härtel: Leipzig, Germany, 1860. [Google Scholar] [CrossRef]
  78. Zhao, Y.; Zhang, B.; Li, J. A study on the water environment spatial paradigm of Zhangqiu Ancient Town based on ecological resilience. J. West. Hum. Settl. 2024, 39, 163–170. (In Chinese) [Google Scholar] [CrossRef]
  79. Ge, D.; Xu, W.; Gao, N. Research on regeneration strategies and mechanisms of ancient towns based on the authenticity of subject and object: A case study of Qianyuan Ancient Town, Zhejiang. J. Zhejiang Univ. (Sci. Ed.) 2018, 45, 251–260. (In Chinese) [Google Scholar] [CrossRef]
  80. Guo, W.; Huang, Z. A field theory to the production of multi-dimensional space of cultural heritage tourism destination—A case study of Zhouzhuang Ancient Town. Hum. Geogr. 2013, 28, 117–124. (In Chinese) [Google Scholar] [CrossRef]
  81. Wang, Y. Tourism development models comparative study and sustainable development countermeasures of ancient towns in South China. J. Cent. China Norm. Univ. (Nat. Sci.) 2006, 52, 104–109. (In Chinese) [Google Scholar] [CrossRef]
  82. Mao, Q. Construction of water town residential culture system from the perspective of urbanization development: A case study of Nanxun Town, Huzhou. People’s Trib. 2015, 24, 96–97. (In Chinese) [Google Scholar] [CrossRef]
  83. Ruan, Y.; Shao, Y.; Lin, L. The characteristics, values and the preservation planning of the towns in Jiangnan water region. Urban Plann. Forum 2002, 1, 79–84. (In Chinese) [Google Scholar] [CrossRef]
  84. China News Zhejiang. Zhejiang Nanxun Ancient Town Welcomes Its 10 Millionth Visitor Three Months Ahead of Schedule. Available online: https://www.zj.chinanews.com.cn/jzkzj/2024-07-01/detail-ihecwwfa4571526.shtml (accessed on 29 November 2024).
  85. Vanore, M.; Triches, M. (Eds.) #CURACITTÀ VENEZIA: Vs Marghera e la città-paesaggio; Quodlibet: Macerata, Italy, 2021. [Google Scholar]
  86. Liu, Y. China National Nutrition and Chronic Disease Status Report (2020). Acta Nutr. Sin. 2020, 42, 521. (In Chinese) [Google Scholar] [CrossRef]
  87. Tsai, V.J.D.; Chang, C.T. Three-dimensional positioning from Google street view panoramas. IET Image Process. 2013, 7, 229–239. [Google Scholar] [CrossRef]
  88. Ogawa, Y.; Oki, T.; Zhao, C.; Sekimoto, Y.; Shimizu, C. Evaluating the subjective perceptions of streetscapes using street-view images. Landsc. Urban Plan. 2024, 247, 105073. [Google Scholar] [CrossRef]
  89. Chen, L.; Lu, Y.; Ye, Y.; Xiao, Y.; Yang, L. Examining the association between the built environment and pedestrian volume using street view images. Cities 2022, 127, 103734. [Google Scholar] [CrossRef]
  90. Yao, Y.; Liang, Z.; Yuan, Z.; Liu, P.; Bie, Y.; Zhang, J.; Wang, R.; Wang, J.; Guan, Q. A human-machine adversarial scoring framework for urban perception assessment using street-view images. Int. J. Geogr. Inf. Sci. 2019, 33, 2363–2384. [Google Scholar] [CrossRef]
  91. Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; IEEE: New York, NY, USA, 2015; pp. 3431–3440. [Google Scholar] [CrossRef]
  92. Ewing, R.; Handy, S. Measuring the unmeasurable: Urban design qualities related to walkability. J. Urban Des. 2009, 14, 65–84. [Google Scholar] [CrossRef]
  93. Tao, Y.; Wang, Y.; Wang, X.; Tian, G.; Zhang, S. Measuring the correlation between human activity density and streetscape perceptions: An analysis based on Baidu street view images in Zhengzhou, China. Land 2022, 11, 400. [Google Scholar] [CrossRef]
  94. Yang, Y.; Wang, Q.; Wu, D.; Hang, T.; Ding, H.; Wu, Y.; Liu, Q. Constructing child-friendly cities: Comprehensive evaluation of street-level child-friendliness using the method of empathy-based stories, street view images, and deep learning. Cities 2024, 154, 105385. [Google Scholar] [CrossRef]
  95. Ashihara, Y. The Aesthetic Townscape; MIT Press: Cambridge, MA, USA, 1983; pp. 73–81. [Google Scholar] [CrossRef]
  96. Liu, Y.; Xiao, T.; Liu, Y.; Yao, Y.; Wang, R. Natural outdoor environments and subjective well-being in Guangzhou, China: Comparing different measures of access. Urban For. Urban Green. 2021, 59, 127027. [Google Scholar] [CrossRef]
  97. Yang, L. A Study on the emotional expression of lighting culture in Jiangnan water towns. Mod. Decorat. (Theory) 2015, 31, 185. (In Chinese) [Google Scholar]
  98. Liu, W.; Hu, X.; Song, Z.; Yuan, X. Identifying the integrated visual characteristics of greenway landscape: A focus on human perception. Sustain. Cities Soc. 2023, 99, 104937. [Google Scholar] [CrossRef]
  99. Mittal, H.; Pandey, A.C.; Saraswat, M.; Kumar, S.; Pal, R.; Modwel, G. A comprehensive survey of image segmentation: Clustering methods, performance parameters, and benchmark datasets. Multimed. Tools Appl. 2022, 81, 35001–35026. [Google Scholar] [CrossRef]
  100. Shao, Y.; Yin, Y.; Xue, Z. Evaluation and comparison of streetscape comfort in Beijing and Shanghai based on a big data approach with street images. Landsc. Archit. 2021, 28, 53–59. (In Chinese) [Google Scholar] [CrossRef]
  101. Nasar, J.L. Visual Preferences in Urban Street Scenes: A Cross-Cultural Comparison between Japan and the United States. In Environmental Aesthetics; Nasar, J.L., Ed.; Cambridge University Press: Cambridge, UK, 1988; pp. 260–274. ISBN 978-0-521-34124-0. [Google Scholar] [CrossRef]
  102. Liaw, A.; Wiener, M. Classification and regression by random forest. R News 2002, 2, 18–22. [Google Scholar] [CrossRef]
  103. Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
  104. Li, Y.; Song, X.; Sun, L.; Zhuang, C.C.; Liu, J.; Yang, M. Exploring urbanization strategies by dissecting aggregate crowd behaviors: A case study in China. Systems 2024, 12, 459. [Google Scholar] [CrossRef]
  105. Xu, Y.; Lu, L. Probing the long-term evolution of traditional village tourism destinations from a glocalisation perspective: A case study of Wuzhen in Zhejiang province, China. Habitat Int. 2024, 148, 103073. [Google Scholar] [CrossRef]
  106. Wu, Y.; Yan, C.; Li, F.; Hong, Q.; Liu, J. Effects of waterfront spatial visual elements on the perception of waterscape attraction. South Hortic. 2022, 33, 60–65. (In Chinese) [Google Scholar] [CrossRef]
  107. Kaplan, R.; Kaplan, S. The Experience of Nature: A Psychological Perspective; Cambridge University Press: New York, NY, USA, 1989. [Google Scholar]
  108. Nasar, J.L. Urban design aesthetics: The evaluative qualities of building exteriors. Environ. Behav. 1994, 26, 377–401. [Google Scholar] [CrossRef]
  109. Wang, R.; Niu, Q.; Gao, H.; Huang, S.; Xu, Z. Quantitative research and evaluation on street interface of historic district—A case of Sanhe Ancient Town in Hefei. Urban. Archit. 2020, 17, 15–19,56. (In Chinese) [Google Scholar] [CrossRef]
  110. Zuo, H.; Li, Z.; Yu, X.; Wang, D.; Cui, W. A visual quantitative study on “second contour” of the historic blocks: Taking the Tunxi Ancient Street, Anhui for example. Mod. Urban Res. 2019, 1, 88–93. (In Chinese) [Google Scholar] [CrossRef]
  111. Shen, H.; Aziz, N.F.; Liu, J.; Huang, M.; Yu, L.; Yang, R. From text to insights: Leveraging NLP to assess how landscape features shape tourist perceptions and emotions toward traditional villages. Environ. Res. Commun. 2024, 6, 115006. [Google Scholar] [CrossRef]
  112. Ye, Y.; Zhong, C.; Suel, E. Unpacking the perceived cycling safety of road environment using street view imagery and cycle accident data. Accid. Anal. Prev. 2024, 205, 107677. [Google Scholar] [CrossRef] [PubMed]
  113. Ren, L.; Wang, Y. A study on placeness evaluation of the Suzhou Pingjiang Historical and Cultural Block: From the perspective of tourist perceptions. South Archit. 2024, 44, 20–31. (In Chinese) [Google Scholar] [CrossRef]
  114. Wei, K. The node of ancient towns: A reflection of the gathering of folk cultural context—A perspective on the architectural style of Huanglongxi Ancient Town. J. Southwest Univ. Natl. (Humanit. Soc. Sci.) 2008, 30, 208–211. (In Chinese) [Google Scholar]
  115. Hsieh, C.-M. A multiscale walkability assessment approach creating walkable streets: A case study of high-density city, Macau. Res. Transp. Bus. Manag. 2024, 57, 101217. [Google Scholar] [CrossRef]
  116. Li, Z.; Ma, Y.; Weng, S. The post-modern authentic tourist experience and its generation mechanism in thematic historic town: A case study of Wuzhen west scenic zone. Tour. Trib. 2023, 38, 42–52. (In Chinese) [Google Scholar] [CrossRef]
  117. Chen, Z.; Zhu, Z.; Chen, Y.; Xu, Y.; Lan, Y.; Fu, W.; Ding, G.; Dong, J. Quantitative analysis on the impact of the street greening on landscapes in ancient towns—Taking Dayan Ancient Town and Shuhe Ancient Town in Lijiang as cases. J. Shandong Agric. Univ. (Nat. Sci. Ed.) 2016, 47, 911–916. (In Chinese) [Google Scholar] [CrossRef]
  118. Dong, W.; Dai, D.; Shen, P.; Zhang, R.; Liu, M. How Public Urban Space Enhance Restoration Benefits through Combined Multisensory Effects: A Systematic Review. Land 2024, 13, 2018. [Google Scholar] [CrossRef]
  119. Chen, Z.M.; Huang, R.; Huang, Y.; Chen, Z.; Ye, Y. The measurements of fine-scale street walkability and precise design control: An evidence-based approach based on virtual reality and wearable bio-sensors. Chin. Landsc. Archit. 2022, 38, 70–75. (In Chinese) [Google Scholar] [CrossRef]
Figure 1. Research framework.
Figure 1. Research framework.
Buildings 15 02091 g001
Figure 2. Location of the study area in Huzhou, China.
Figure 2. Location of the study area in Huzhou, China.
Buildings 15 02091 g002
Figure 3. Acquiring street-view images.
Figure 3. Acquiring street-view images.
Buildings 15 02091 g003
Figure 4. Structure of the encoder–decoder networks in the DeepLabv3+ model.
Figure 4. Structure of the encoder–decoder networks in the DeepLabv3+ model.
Buildings 15 02091 g004
Figure 5. The performance of the old Deeplabv3+, in the context of Nanxun Old Town, trained on the Cityscapes dataset.
Figure 5. The performance of the old Deeplabv3+, in the context of Nanxun Old Town, trained on the Cityscapes dataset.
Buildings 15 02091 g005
Figure 6. Example of semantic segmentation and proportions of environmental elements.
Figure 6. Example of semantic segmentation and proportions of environmental elements.
Buildings 15 02091 g006
Figure 7. Hierarchical clustering results of the street-view images.
Figure 7. Hierarchical clustering results of the street-view images.
Buildings 15 02091 g007
Figure 8. Spatial distribution of the attractiveness scores (* p < 0.10; *** p < 0.01).
Figure 8. Spatial distribution of the attractiveness scores (* p < 0.10; *** p < 0.01).
Buildings 15 02091 g008
Figure 9. The proportions of environmental elements at different locations.
Figure 9. The proportions of environmental elements at different locations.
Buildings 15 02091 g009
Figure 10. Views at different points. ((a) shows a viewpoint on a waterfront platform; (b) shows a viewpoint on a bridge; (c) shows a viewpoint along the riverbank; (d) a lane viewpoint with buildings on both sides).
Figure 10. Views at different points. ((a) shows a viewpoint on a waterfront platform; (b) shows a viewpoint on a bridge; (c) shows a viewpoint along the riverbank; (d) a lane viewpoint with buildings on both sides).
Buildings 15 02091 g010
Figure 11. The correlation between environmental elements and attractiveness scores. The red area represents the 95% confidence interval.
Figure 11. The correlation between environmental elements and attractiveness scores. The red area represents the 95% confidence interval.
Buildings 15 02091 g011aBuildings 15 02091 g011b
Figure 12. SHAP summary plot.
Figure 12. SHAP summary plot.
Buildings 15 02091 g012
Figure 13. The nonlinear effects of environmental elements on attractiveness scores. The red area represents the 95% confidence interval.
Figure 13. The nonlinear effects of environmental elements on attractiveness scores. The red area represents the 95% confidence interval.
Buildings 15 02091 g013
Figure 14. Views at different points ((a) shows a street-view image with low water proportion; (b) shows a street-view image with high water proportion on a waterfront platform; (c) shows a well-scaled lane space; (d) shows an oversized space; (e) shows a highly rated scenario characterized by the presence of lanterns; (f) and (g) show lifeless spatial scenarios; (h) shows a riverside space featuring Meirenkao; (i) shows a side view of a bridge; (j) shows a sloped surface of a bridge; (k) shows the landscape from the top of the bridge; (l) shows a scene with modern architecture).
Figure 14. Views at different points ((a) shows a street-view image with low water proportion; (b) shows a street-view image with high water proportion on a waterfront platform; (c) shows a well-scaled lane space; (d) shows an oversized space; (e) shows a highly rated scenario characterized by the presence of lanterns; (f) and (g) show lifeless spatial scenarios; (h) shows a riverside space featuring Meirenkao; (i) shows a side view of a bridge; (j) shows a sloped surface of a bridge; (k) shows the landscape from the top of the bridge; (l) shows a scene with modern architecture).
Buildings 15 02091 g014
Table 1. A review of selected studies on the environmental elements of old towns.
Table 1. A review of selected studies on the environmental elements of old towns.
AuthorResearch ContentRelated Environmental Elements
Shen, Aziz, Lv, 2025 [31]Panoramic technology was used to identify the key landscape features affecting traditional village esthetics and propose sustainable development strategies.Water
Vegetation
Ground
Building
Shen, Aziz, Omar et al., 2024 [32]The authors investigated tourism’s impact on residents’ visual landscape perceptions in Huangdu Dong Village via online data.Water
Building
Person
Meng, Liu, Zeng et al., 2024 [47]Panoramic images and deep learning were used to quantify certain indicators of public space quality in villages in Beijing’s Fangshan District. Vegetation
Building
Sky
Xu, Lan, 2023 [51]The authors analyzed environmental influences (spatial layout, visual perception, etc.) on tourist activity in Wuzhen Xizha using spatial models and Baidu data. Shop sign
Ren, Leng, 2022 [48]Diachronic and synchronic approaches were integrated to renovate the historic streetscape of Sanjiang Avenue in Nanchang.Building
Lantern
Shop sign
Jiang, Zhang et al., 2021 [49]The authors evaluated authenticity in Grand Canal cultural blocks with respect to four aspects by using multi-source data across different renewal modes. Water
Building
Lantern
Boat
Vegetation
Zhang, Yang, 2019 [2]The authors examined tourist and resident perceptions of Tongli Old Town’s landscape through surveys, photo analysis, field interviews, and tour route tracking.Bridge
Water
Boat
Table 2. Performance comparison for the different models.
Table 2. Performance comparison for the different models.
ModelFCNPSPNETNewly Trained
Deeplabv3+ (Ours)
Overall Accuracy (%)86.0086.6088.00
Table 3. Semantic segmentation labels.
Table 3. Semantic segmentation labels.
Label ClassificationLabel
Space-defining elementsSky
Water
Building
Riverbank
Ground
Foreground and space-defining elementsBridge
Foreground elementsVegetation
Fence
Pot
Lantern
Boat
Shop sign
Signboard
Person
Other
Table 4. Consistency test results for the questionnaire ratings..
Table 4. Consistency test results for the questionnaire ratings..
TypeICCp-Value
ICC(1,k)0.7400.022 **
ICC(2,k)0.7540.007 ***
ICC(3,k)0.7990.007 ***
Note: ** p < 0.05; *** p < 0.01.
Table 5. Variable importance.
Table 5. Variable importance.
VariableImportanceWeight (Rescaled)
water20.50281.0000
boat2.85820.1394
fence4.42190.2157
ground10.80690.5271
sky4.74820.2316
lantern5.59670.2730
vegetation4.67680.2280
shop sign5.78410.2821
person7.75410.3782
pot2.42320.1182
bridge3.96390.1777
building4.05740.1934
signboard3.88960.1979
riverbank12.21440.5957
other6.54830.3193
Table 6. Regression model performance.
Table 6. Regression model performance.
MSE (Train)RMSE (Train)MAE (Train)MAPE (Train)MSE (Test)RMSE (Test)MAE (Test)MAPE (Test)Training R2Testing R2
0.03030.17400.13703.69780.04230.20560.17454.62450.83580.7487
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, C.; Cao, H.; Xia, Z.; You, X.; Wang, Z. Understanding the Influence of Environmental Elements on Spatial Attractiveness in a Jiangnan Water Town Through Computer Vision Techniques. Buildings 2025, 15, 2091. https://doi.org/10.3390/buildings15122091

AMA Style

Xu C, Cao H, Xia Z, You X, Wang Z. Understanding the Influence of Environmental Elements on Spatial Attractiveness in a Jiangnan Water Town Through Computer Vision Techniques. Buildings. 2025; 15(12):2091. https://doi.org/10.3390/buildings15122091

Chicago/Turabian Style

Xu, Chenpeng, Hongshi Cao, Zhengwei Xia, Xinjie You, and Zixuan Wang. 2025. "Understanding the Influence of Environmental Elements on Spatial Attractiveness in a Jiangnan Water Town Through Computer Vision Techniques" Buildings 15, no. 12: 2091. https://doi.org/10.3390/buildings15122091

APA Style

Xu, C., Cao, H., Xia, Z., You, X., & Wang, Z. (2025). Understanding the Influence of Environmental Elements on Spatial Attractiveness in a Jiangnan Water Town Through Computer Vision Techniques. Buildings, 15(12), 2091. https://doi.org/10.3390/buildings15122091

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop