Effects of Street-Level Visual Perception on Different Types of Leisure Activity Intensity in Waterfront Spaces: A Case Study of the Core Section of the Pearl River, Guangzhou

Pan, Yudan; Chen, Yang; Cao, Jin

doi:10.3390/land15050849

Open AccessArticle

Effects of Street-Level Visual Perception on Different Types of Leisure Activity Intensity in Waterfront Spaces: A Case Study of the Core Section of the Pearl River, Guangzhou

by

Yudan Pan

^1,2,*

,

Yang Chen

¹

and

Jin Cao

¹

School of Architecture, South China University of Technology, Guangzhou 510641, China

²

Architectural Design Research Institute of South China University of Technology, Guangzhou 510641, China

^*

Author to whom correspondence should be addressed.

Land 2026, 15(5), 849; https://doi.org/10.3390/land15050849 (registering DOI)

Submission received: 12 April 2026 / Revised: 6 May 2026 / Accepted: 8 May 2026 / Published: 15 May 2026

(This article belongs to the Special Issue Computational Design and Planning for Socio-Environmental Sustainability of Landscapes and Communities: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

As urban waterfront public spaces have increasingly become important settings for residents’ daily leisure activities, there remains a lack of empirical evidence based on objective image data regarding how street-level visual environments influence different types of leisure activities. The existing studies have largely relied on macro-scale built environment indicators and paid limited attention to micro-scale visual perception from the pedestrian perspective. To address this gap, this study focuses on the core waterfront section of the Pearl River in Guangzhou. Behavioral observations were conducted across nine spatial units during different time periods on weekdays and weekends, yielding 54 samples of passive, active, and social activity intensity. Meanwhile, 109 street-view sampling points were established, generating 436 pedestrian-view images. Using Mask2Former with an ADE20K pre-trained model, visual environmental indicators—including the Green View Index (GVI), Sky View Index (SVI), built environment proportion, road proportion, and visual diversity (Entropy)—were extracted. Spearman correlation and multiple linear regression were applied to examine their effects on activity intensity. The results show that leisure activities are generally more active in the evening and on weekends, with social activities exhibiting the strongest temporal variation. Active activities remain relatively stable, passive activities show temporal dependence, and social activities display localized high-intensity clustering. Regression results reveal differentiated environmental responses: visual diversity has a stable positive effect on passive activities, active activities show weak associations with visual variables, and social activities are the most sensitive, with GVI, SVI, and built proportion showing significant negative effects, while visual diversity shows a significant positive effect. The social activity model also demonstrates the highest explanatory power (Adj. R² = 0.488). Overall, this study develops a street-view semantic segmentation-based method for quantifying waterfront visual environments, demonstrates the critical role of visual environmental composition in shaping activity patterns, and provides empirical support for the fine-grained and activity-oriented optimization of waterfront public spaces.

Keywords:

waterfront public space; street-level visual environment; street-view semantic segmentation; leisure activity types; activity intensity; Mask2Former; built environment perception

1. Introduction

1.1. Research Background

Waterfront public spaces not only serve as ecological transition zones between water and land but also create important interfaces between natural environments and urban populations [1]. As places where ecological systems and everyday social life intersect, they play a vital role in promoting public health and enhancing human well-being [2]. The existing studies have shown that waterfront public spaces offer significant benefits for psychological restoration, particularly by improving feelings of pleasure, relaxation, and vitality [3]. Meanwhile, access to waterfront public spaces has been found to be positively associated with physical activity [4], indicating their potential to encourage exercise participation and support healthier lifestyles [5]. In contemporary cities, waterfront public spaces are also crucial to the revitalization and development of urban vitality [1,6]. Due to their unique spatial and environmental characteristics, these spaces often exhibit higher levels of spatial vitality and provide greater opportunities for diverse public activities and activity types [7,8].

In environmental behavior research, activities occurring in public spaces are far from homogeneous; different activity types exhibit distinct spatial needs and environmental preferences [9]. For instance, leisure activities tend to depend more on the presence of tall shrubs and supportive seating facilities, whereas physical activities are more closely associated with the diversity of paving conditions and street lighting arrangements [10]. Empirical studies have further shown that when waterfront public spaces are classified into different functional categories, such as commercial spaces and art exhibition spaces, the frequency distribution of activity types differs significantly across spatial types [11]. Gehl also emphasized that the quality of space affects different forms of behavior to varying degrees, and that urban design and spatial organization can shape both the types of activities that emerge and the duration for which they are sustained [12].

As urban planning has evolved, there has been an increasing emphasis on creating more human-centered urban spaces [13]. Against this backdrop, traditional approaches to evaluating the built environment through objective, aggregate indicators have become increasingly limited, as they are unable to capture pedestrians’ lived perceptual experiences at the human scale. The existing studies suggest that street-level perception is one of the most effective ways to examine visual perception in urban street spaces [14]. Paying attention to street-level perception is important for enhancing urban livability and understanding its close relationship with residents’ psychological health and social well-being [15]. By offering a realistic human perspective on urban landscapes, street-view imagery can capture the essential qualities of the urban environment and more accurately reflect how people perceive their surroundings in everyday life [16,17]. This enables researchers to better identify how different visual elements influence human activities.

Recent research on visual perception, urban interfaces, and public space suggests that pedestrian-level environmental experience is shaped not only by individual visual elements, but also by the configuration of spatial edges, interface permeability, public–private boundaries, and the relationship between infrastructure and public space. Al Mushayt et al. proposed a morphological–visual perception approach for decoding the public/private edge of arterial streets, emphasizing that street interfaces play an important role in structuring perceived urban experience [18]. Similarly, Gehl highlighted the importance of ground-floor interfaces and “soft edges” in supporting public life and close-range urban experience [19]. For waterfront areas, this perspective is particularly relevant because waterfront spaces are often shaped by the interaction of infrastructure, open water, promenades, vegetation, building frontages, and public activity spaces. Studies on waterfront transformation further indicate that waterfront regeneration involves not only physical renewal, but also the reconfiguration of infrastructure, public space, and spatial experience [20]. Therefore, understanding activity patterns in waterfront public spaces requires examining not only isolated visual elements, but also how visual composition and spatial interfaces jointly shape pedestrian perception and public-space use.

1.2. Research Gaps

Although previous studies have demonstrated the health, recreational, and vitality benefits of waterfront public spaces, the mechanisms through which fine-grained street-level visual environments influence specific types of leisure activities remain insufficiently understood. In particular, limited empirical evidence explains how perceived greenness, openness, built interfaces, circulation space, and visual diversity differently affect passive, active, and social activities. The existing studies on visual perception and urban interfaces provide important theoretical insights, but these perspectives have rarely been integrated with objective street-view semantic segmentation and on-site behavioral observation in waterfront settings. This gap makes it difficult for planners and designers to move beyond general waterfront quality improvement and develop activity-oriented spatial interventions based on specific environment–behavior relationships.

First, many of the existing studies have primarily concentrated on measuring overall activity levels, typically employing manual behavior coding methods [10,21], mobile phone signal data [17,22], and heat map big data [23,24,25,26] to evaluate the spatial and temporal distribution of regional vitality. However, these studies have largely overlooked the different mechanisms through which environmental factors affect various activities. Cellular signal data provided by large mobile operators are highly ubiquitous and representative, and can reflect the travel behavior of most people [27]. Nevertheless, research based on such data has been limited to aspects such as movement trajectories, pedestrian flow density, and duration of stay. It may also be biased towards non-smartphone user groups. Heat maps can estimate the density of human activities and reflect users’ geographic locations and numbers during a target time period [23]. They are considered an effective representation of dynamic population distribution [28]. However, they cannot distinguish specific user activities.

Secondly, many of the existing studies tend to focus on specific activities, such as jogging [29,30] and walking [31,32,33]. While some studies have started to pay attention to activity classification, research in the context of waterfront environments is still very limited, with activities often categorized based on intensity [34]. Understanding the relationship between pedestrians’ outdoor activities and the physical environment in urban public spaces is crucial for improving the precision of public space design. However, current studies generally lack systematic comparison and quantitative analyses of response patterns between different activity types and environmental factors [10,11,35]. For instance, Chen et al. investigated the impact of visual accessibility on TN activities, but did not further investigate the varying dependence of individual TN activities (e.g., walking, viewing the environment, and social interaction) on physical environmental characteristics [35].

Currently, the use of street-view images to study activities in urban public spaces is expanding rapidly. Many studies use semantic segmentation techniques to identify urban environmental features in street-view images and link them to emotional perception [11,15,23], the spatial quality of street blocks [16,36], and the assessment of vitality [17,37]. However, research on waterfront public spaces, particularly empirical explorations of the critical chain from ‘semantic segmentation indicators’ to ‘activity types’, still exhibits a significant gap. For instance, Yang et al. employed semantic segmentation to extract urban environmental elements from street-view images and linked them to block vitality assessment, yet did not directly connect these elements to specific behavioral types [17]. Similarly, Zhao et al. used street-view images to evaluate the visual quality of the built environment, establishing a relationship between objective visual elements and subjective perception. However, they did not map perception outcomes to specific behavioral choices [38].

1.3. Research Questions and Innovation

Based on the above research gaps, this study integrates street-view image data and in situ activity observation data from the core section of the Pearl River in Guangzhou. It examines how perceived street-level visual environments influence different types of waterfront leisure activities. Specifically, this study addresses the following three questions: (1) Does the street-level visual environment influence leisure activities in waterfront areas? (2) Do different types of activities respond differently to visual environmental elements? (3) Which street-level visual indicators are more important for specific activity types?

The main contributions of this study are threefold. First, it links three types of leisure activities with street-view-derived perceptual indicators, thereby moving beyond the limitation of using overall activity volume alone. Second, it uses pedestrian-perspective street-view data to capture micro-scale perceived environmental quality that is difficult to represent through conventional aggregate built-environment indicators. Third, it establishes an analytical link between semantic segmentation indicators and activity types, providing empirical support for the fine-grained improvement of public space quality along the core waterfront section of the Pearl River.

2. Study Area and Data

2.1. Study Area Overview

Guangzhou is located in the Pearl River Delta in southern China and is one of the key cities in the Guangdong–Hong Kong–Macao Greater Bay Area. The Pearl River runs through the city center and forms an important natural landscape axis as well as a structural backbone of Guangzhou’s urban development. As a carrier of the city’s historical evolution, the Pearl River waterfront has long supported multiple functions, including shipping, transportation, commerce, trade, and public urban life. Over time, it has evolved into an important public waterfront space that integrates ecological landscapes, public activities, and urban image-making.

In recent years, China’s urban development has gradually shifted from ‘incremental expansion’ to ‘stock-based renewal’. As an important stock spatial resource of the city, waterfront spaces have become a key direction of urban renewal in terms of quality enhancement and vitality revitalization. Against this backdrop, Guangzhou has continuously promoted the integrated management and regeneration of public spaces along the Pearl River waterfront. In 2017, the Guangzhou Municipal People’s Government proposed the construction of the ‘Pearl River Waterfront High-Quality Development Belt’, emphasizing the integration of waterfront spaces, enhancement of public spaces, and optimization of industrial functions, with the aim of promoting the formation of an internationally influential urban waterfront development axis along both banks of the Pearl River. Subsequently, the Guangzhou Territorial Spatial Master Plan (2021–2035) proposed creating a continuous and accessible waterfront public space by improving the greenway network, enhancing public space quality, and strengthening urban landscape corridors, with the broader goal of developing a world-class waterfront vitality zone.

Driven by the above planning and renewal policies, a continuous waterfront public space system has gradually taken shape along the Pearl River, making the riverfront one of Guangzhou’s most active public life corridors. This study selects the core waterfront space along the north bank of the Pearl River as the research area, as shown in Figure 1. Spanning several central districts, including Liwan, Yuexiu, Tianhe, and Haizhu, this area is one of the waterfront spaces in Guangzhou with the highest concentration of urban functions and the most active public activities. The study area follows an east–west linear pattern along the river, with a total length of approximately 13 km, extending from the vicinity of Shamian Island in Liwan District in the west to the vicinity of Pazhou Bridge in the east. Based on spatial landscape characteristics and differences in urban functions, the study area can be broadly divided into three typical sections:

(1): The Traditional Landscape Section (Western Section): This section extends approximately 3.5 km from Shamian Island in Liwan District to Haizhu Square in Yuexiu District. It retains a concentration of historic and cultural landscapes, as well as traditional waterfront areas. The spatial scale is relatively compact, with pedestrian activities and viewing-related stays frequently observed.
(2): Modern Urban Section (Middle Section): This section extends approximately 4.5 km from the eastern side of Haizhu Square to Guangzhou Bridge. Adjacent to the Zhujiang New Town central business district, it is characterized by modern urban landscapes, dense built interfaces, and a high concentration of commercial, tourism, and leisure activities.
(3): Daily Leisure Section (Eastern Section): This section extends approximately 5 km from the eastern side of Guangzhou Bridge to Pazhou Bridge. It is dominated by riverside greenways and open public spaces. With a large number of residential communities and public service facilities nearby, this section serves as an important setting for residents’ daily recreation and slow-paced activities.

From a spatial perspective, the northern bank of the core Pearl River waterfront contains a diverse range of waterfront public spaces. Based on their spatial structure and functional characteristics, these spaces can be classified into six categories.

(1): Waterfront promenade space. This type of space is characterized by continuous riverside promenades or greenway systems. Usually equipped with stone benches, tree pits, and guardrails, it forms a continuous corridor for slow-mobility activities. It is the most widely distributed space type within the study area and is mainly used for daily leisure activities such as walking, jogging, and sightseeing.
(2): Open lawn space. This type of space is characterized by large lawns as the dominant landscape element and is usually connected to the waterfront promenade system. With a high degree of openness, it provides places for residents to stay, rest, and enjoy family activities. Open lawn spaces are highly flexible and can support various leisure activities, such as picnics and informal gatherings.
(3): Waterfront plaza space. This type of space is characterized by hard paving and is typically integrated with waterfront viewing platforms or urban public nodes. It provides open and aggregate spatial conditions, allowing it to accommodate larger public events. Such spaces often serve as important activity nodes and pedestrian gathering areas in waterfront districts.
(4): Shaded leisure space. This type of space is characterized by tree shade as the main environmental feature and is usually formed by combining shaded walkways, small resting facilities, and waterfront viewing nodes to create a comfortable space for staying. With favorable shade and environmental comfort, this space type is well-suited to passive leisure activities and short-term stays.
(5): Recreational sports space. This type of space is usually equipped with fitness facilities, sports grounds, or exercise equipment, such as public fitness zones and small sports courts. It provides residents with places for daily exercise and physical activity and serves as an important vitality node within the waterfront public space system.
(6): Under-bridge composite space. This type of space is located beneath cross-river bridges and is characterized by the semi-open environment created by the bridge structure. It is usually integrated with promenades, resting facilities, or sports amenities to create an area for various activities. Thanks to its distinctive spatial structure and sheltered conditions, the under-bridge space has become a recognizable venue for activities within the waterfront public space system.

Overall, the various types of waterfront space mentioned above differ significantly in terms of spatial scale, landscape conditions, built interfaces, and facility configurations. Collectively, they form a diverse pattern of spatial environments along the core waterfront segment of the Pearl River. Within this pattern, the study area demonstrates a high density of public activities, as well as clear variations in activity types and intensity levels. Different waterfront sections also vary in greenery, built-interface proportion, circulation structure, and visual complexity, creating a rich spatial context for examining the relationship between the built environment and leisure activities.

Therefore, the core waterfront area of the Pearl River is an important site for urban renewal and waterfront management in Guangzhou, as well as being one of the city’s major public spaces for recreational activities. Selecting this area as the study site allows this research to capture the spatial characteristics of typical central waterfront public spaces in Guangzhou and to examine how built-environment and visual-perception factors are associated with different types of leisure activity.

2.2. Division of Behavioral Segments and Behavioral Observation Data

2.2.1. Division of Behavioral Segments

Waterfront public spaces often exhibit distinct linear spatial characteristics. Compared to traditional methods that use administrative units or regular grid-based analysis units, behavior units that are defined based on the structure of pedestrian space better reflect the true relationship between pedestrian activities and environmental perception within waterfront public spaces. This study, therefore, uses the north bank waterside promenade of the Pearl River as a basis, dividing the study area along the river into several continuous behavior survey units (hereafter referred to as ‘behavior segments’).

First, field surveys and preliminary activity observations were used to identify different spatial types within the study area, such as waterfront promenades, plaza nodes, and under-bridge spaces. Representative spatial segments were selected as behavior survey units. Second, the selected behavior survey units were spatially verified by combining crowd activity heat maps for weekdays and weekends (using Baidu heat map data from 23 and 28 September 2025, which corresponded to the dates of behavioral observations) (Figure 2). This procedure helped ensure that the chosen samples covered typical spatial characteristics and did not simply represent inactive spaces with limited public activity. Using this approach, three units were selected from the western, middle, and eastern sections, resulting in nine representative behavior survey units (A1–A3, B1–B3, and C1–C3) for subsequent behavioral observation and environmental analysis.

In terms of scale, each behavioral segment was defined with a length of approximately 200–300 m. This range was selected with reference to analytical units commonly used in studies of pedestrian spatial perception and street environments. From the perspective of environmental cognition, pedestrians form an overall impression of surrounding elements, such as vegetation, built interfaces, facilities, and crowd activities, while moving through a continuous walking environment. Previous studies suggest that a spatial range of approximately 200–300 m is suitable for capturing pedestrian-scale environmental perception and has been used in street-level analysis [39,40]. At this scale, local variations in environmental elements can be identified while reducing the excessive heterogeneity that may arise from larger spatial units.

Related empirical studies on street-level visual environments and pedestrian behavior have also adopted comparable spatial segmentation scales. For instance, Huang et al. (2024) examined street environments and walking preferences in Tokyo using street segments of approximately 250 m, which aligned with street-view image sampling and pedestrian perception ranges [39]. Similarly, Wang et al. (2023) analyzed the effects of street landscape features on perceived safety and aesthetic evaluation using street segments of 150–300 m [40]. Drawing on these studies, this research defined behavioral segments of approximately 200–300 m to capture local variations in the waterfront environment while remaining consistent with commonly used pedestrian-scale analytical units.

This segmentation approach helps capture variations in environmental conditions along the waterfront and provides a consistent analytical unit for linking behavioral observation data with street-level visual environmental indicators.

2.2.2. Collection of Behavioral Observation Data

Behavioral data were collected through on-site observations. A structured behavioral annotation approach was used to record leisure activities within each behavioral segment. Observations were conducted on one weekday and one weekend day (23 and 28 September 2025), and each observation day was divided into three time periods: morning (08:00–10:00), afternoon (15:00–17:00), and evening (19:00–21:00).

Within each time period, three independent instantaneous scans were conducted for each behavioral segment. During each scan, the number of people engaged in different activity states, such as walking, staying, socializing, taking photos, and running, was recorded. The scans were arranged at intervals within each observation period to capture short-term changes while reducing the possibility of double-counting. The three records collected within the same period were then averaged to construct the activity intensity indicator for the corresponding segment and time period. During fieldwork, a standardized recording form was used to define activity categories, counting rules, and recording items, ensuring comparability across time periods, survey units, and observers. Before the formal observations, observers were trained, and pilot observations were conducted to improve recording consistency. After data collection, the records were cross-checked to identify potential inconsistencies and improve data reliability.

Based on their behavioral characteristics, the observed leisure activities were classified into three categories: passive, active, and social activities. The number of people in each category was recorded separately during each scan. At the same time, the total number of people in each behavioral segment was also recorded to provide an overall activity scale reference indicator.

2.3. Data of Street-Level Images and Built-Environment

2.3.1. Collection of Street-Level Images

In order to characterize the actual visual environment of the waterfront space at the pedestrian scale, this study collected environmental data through the acquisition of street-level images on site. On the morning of 23 September 2025, data collectors took continuous photographs from west to east along the waterfront promenade in the study area. At each capture point, four cameras were used to obtain images of 0°, 90°, 180°, and 270° directions. Each image was taken at an approximate pedestrian eye height of 1.6 m, with the camera orientations kept roughly parallel to the riverside path in order to approximate a pedestrian’s panoramic perception of the environment. Images were captured during a period of good weather and stable lighting, and all data were collected in a single session to minimize the environmental impact on image quality.

Street-level images were collected at 100 m intervals along the waterfront promenade. This sampling interval was selected to balance pedestrian-scale visual perception, spatial variation in the waterfront environment, and the feasibility of field data collection. Previous studies on street-level perception and pedestrian environments have commonly adopted similar sampling intervals, suggesting that this scale can capture meaningful changes in street interfaces while avoiding excessive data redundancy. Therefore, the 100 m interval was considered appropriate for representing the visual environment encountered by pedestrians along the linear waterfront space.

From the perspective of human visual perception, pedestrians perceive continuous urban environments through movement and repeated visual scanning. Previous research suggests that an approximate range of 80–120 m may allow pedestrians to integrate visual information from surrounding interfaces during walking [41].

At the level of street morphology, a 100 m interval has also been used to capture changes in street-interface characteristics. For example, previous research has shown that physical street features, such as building setbacks, greenery continuity, and interface permeability, may vary within a range of approximately 50–150 m [42]. A 100 m sampling interval can therefore provide a reasonable balance between recording spatial variation and avoiding unnecessary sampling redundancy. This conclusion has been corroborated in multiple urban case studies, including assessments of spatial quality in historic districts and analyses of the visual environment of commercial pedestrian streets [43].

Furthermore, multiple studies on street vitality and walking preferences provide additional support for the 100 m interval. In the Journal of Urban Planning and Development, Wang et al. compared the effects of different sampling densities on street perception scores, finding that a 100 m interval preserves statistical significance while minimizing computational burden [40]. Similarly, in the study of walking preferences in Tokyo, Huang et al. used a 100 m grid for street-level sampling and successfully revealed non-linear relationships between street-level elements and walking behavior [39].

Taken together, these studies support the use of a 100 m interval as a practical and theoretically informed sampling strategy. For linear public spaces such as waterfront promenades, this interval helps balance spatial representativeness, data manageability, and the need to approximate pedestrians’ sequential visual experience along the route.

Following this method, we obtained 109 street-level sampling points, which covered the entire study area. Images were captured in four directions at each point, collecting a total of 436 street-level images for subsequent semantic segmentation and calculation of visual environment indicators.

2.3.2. The Built Environment and Spatial Data

The collected street-level images were first preprocessed. EXIF information was extracted from the photographs in Python (Version 3.9; Python Software Foundation, Wilmington, DE, USA) to obtain shooting locations and timestamps. The latitude and longitude information of each street-level image was then compiled into a data table and imported into ArcGIS Pro (Version 3.5.2; Esri, Redlands, CA, USA) to generate georeferenced sampling points.

In addition to the street-level images, we collected spatial base data for the study area, including GIS layers for the waterfront promenade, the road network, and spatial boundaries. These datasets were managed in ArcGIS Pro under a unified coordinate system, forming the basis for subsequent spatial analyses and visualization of results.

3. Methods

3.1. Behavioral Observations and Construction of Activity Intensity Indicators (Dependent Variable)

Based on the behavioral observation data described in Section 2.2, this study constructed activity intensity indicators to represent the levels of different types of leisure activity. An instantaneous scan sampling method was used to record the number of people engaged in different activity states within each behavioral segment under varying temporal conditions.

Based on their behavioral characteristics, the observed leisure activities were classified into three categories:

(1): Passive activities, which are characterized mainly by staying in one place, such as sitting, leaning, sightseeing, and resting.
(2): Active activities, which are primarily characterized by bodily movement and displacement, such as walking, running, and fitness activities.
(3): Social activities, which are primarily characterized by interpersonal interaction, such as talking, gathering, and group interaction.

For each behavioral segment, the average number of participants in each activity type was calculated across the three scan records for a given day type (weekday or weekend) and time period (morning, afternoon, or evening). This average then served as the activity intensity index (passive intensity, active intensity, and social intensity) for the corresponding activity type during that period for the behavior segment. This index represents the average activity level per scan and was used as a continuous variable in subsequent statistical analyses rather than as an exact estimate of the total number of participants.

3.2. Street-Level Image Processing and Extraction of Perceived Environmental Indicators (Independent Variables)

The street-level image data were processed in Python. First, the collected street-level images were converted to a unified format and standardized in terms of file naming. EXIF information was then extracted from each image to obtain shooting time and geographic coordinates. This established a correspondence between each image and its spatial location in order to provide a foundation for subsequent spatial matching and indicator aggregation.

In addition, a semantic segmentation approach based on deep learning was employed to perform pixel-level semantic parsing of the street-level images. The Mask2Former semantic segmentation model, pretrained on the ADE20K dataset, was used to process the images in this study. This model was applied to identify 150 semantic categories, such as vegetation, sky, buildings, roads, and other environmental features (Figure 3).

Based on the semantic segmentation results, Python scripts were written to count the number of pixels in each semantic category in every image, and to compute their pixel proportions.

p_{(i, c)} = n_{(i, c)} / N_{i}

In the above equation:

For the i street-level image (or i sample), c represents the set of semantic segmentation categories.

p_{(i, c)}

represent the pixel proportion of semantic category c in image i.

n_{(i, c)}

represent the pixel count of semantic category c in image i.

N_{i}

represent the total number of pixels in image i (or the number of pixels considered in the calculation).

To construct environmental indicators with perceptual relevance, fine-grained ADE20K semantic categories were merged into broader environmental categories, including vegetation, sky, built environment, road space, and natural elements. The specific aggregation rules are shown in Table 1.

On this basis, a multidimensional street-level visual environment indicator system was constructed, as detailed in Table 2.

These indicators jointly form a multidimensional representation system of the waterfront street-level visual environment [39,41,43].

In this study, “visual perception” does not refer to subjective psychological evaluation collected through questionnaires, but to the environmental information that can be visually accessed from the pedestrian field of view. Therefore, the semantic-segmentation indicators are used as objective proxies for pedestrian-level visual experience rather than as direct measures of subjective perception. From this perspective, GVI reflects the degree to which greenery enters the pedestrian visual field and thus approximates perceived naturalness; SVI reflects the exposure of sky in the visual field and is associated with perceived openness; Built captures the visible presence of buildings, walls, bridges, and other hard interfaces, thereby indicating perceived enclosure, interface intensity, or built-up pressure; Road represents the visible pedestrian and circulation surface, which relates to perceived movement space and spatial accessibility; and Entropy reflects the diversity and balance of visible semantic elements, corresponding to the richness and complexity of visual information encountered by pedestrians. Together, these indicators translate the pixel-level outputs of semantic segmentation into interpretable dimensions of pedestrian-scale spatial experience.

The selection of GVI, SVI, Built, Road, and Entropy was therefore theory-driven rather than determined solely by statistical performance. These indicators correspond to five key dimensions of pedestrian-scale visual experience that have been widely discussed in environmental behavior, urban design, and street-interface studies: naturalness, openness, built-interface intensity, circulation support, and visual complexity. GVI was selected to represent the natural and restorative dimension of the visual environment; SVI was used to capture openness and exposure; Built was included to reflect the role of hard spatial interfaces, enclosure, and public–private edge conditions; Road was selected because visible pedestrian and circulation space is closely related to movement support and spatial accessibility; and Entropy was included to capture the compositional richness and heterogeneity of the overall visual scene. These dimensions are particularly relevant to waterfront public spaces, where vegetation, open water, promenades, built frontages, and public activity spaces jointly shape pedestrian perception and leisure behavior. Thus, the selected indicators provide a theoretically grounded framework for linking street-level visual environments with passive, active, and social activities.

Given the study’s aim of examining the relationship between the street-level visual environment and leisure activity intensity, and considering both theoretical relevance and the exploratory correlation results, GVI, SVI, Road, Built, and Entropy were selected as explanatory variables for the subsequent regression models.

3.3. Spatial Matching and Construction of Analysis Units

In order to establish spatial associations between behavioral activities and the street-level visual environment, this study employed behavioral segments as the primary analysis unit. In ArcGIS Pro, street-level sampling points and their corresponding visual indicators were managed together with behavioral segment spatial data.

As street-level images reflect visual perception across a certain spatial extent, while behavioral observation data represent aggregated activity characteristics at the segment scale, this study employed a buffer approach to reconcile these two spatial scales. For each behavioral segment, a fixed-distance buffer of 50 m was created around the segment. From a behavioral perspective, this buffer represents the immediate decision-making environment in which pedestrians can directly observe surrounding visual cues and make on-site behavioral choices, such as whether to stay, continue walking, approach a node, or engage in social interaction. In waterfront public spaces, near-field cues such as vegetation, seating facilities, pedestrian surfaces, building interfaces, and open spaces are more likely to influence immediate activity responses than distant environmental features. The visual indicators of the street-level sampling points within each buffer were then spatially overlaid and aggregated.

We derived street-level visual environment characteristics at the behavioral segment scale by computing the mean of street-level indicators within each buffer, thereby providing a consistent spatial analysis unit for subsequent statistical analysis.

3.4. Statistical Analysis Methods

After constructing the dependent and independent variables, a series of statistical analyses was conducted to examine the relationship between the street-level visual environment and various types of leisure activity.

First, descriptive statistical analyses were performed to characterize the distribution of activity intensity across activity types, time periods, and day types, thereby identifying basic temporal differences and activity patterns.

Second, a Spearman correlation analysis was conducted to examine the bivariate relationships between indicators of the street-level visual environment and the intensity of different activity types. This provided an initial identification of potential influencing factors.

Building on this, separate multiple regression models were constructed using the logarithm-transformed intensities of passive, active, and social activities as dependent variables. Street-level visual environment indicators were included as the primary independent variables, while day type and time period were included as control variables. Modelling each activity type separately allowed us to compare the direction and relative magnitude of the effects of different street-level visual elements across various leisure activities.

3.5. Summary of the Methods and Workflow

This study examined how the street-level visual environment of waterfront spaces affects different types of leisure activities through the following workflow: behavioral observation, street-level image collection, semantic segmentation, spatial matching, and statistical analysis (Figure 4). This workflow provides a consistent procedure for linking pedestrian-level visual indicators with observed activity intensity, thereby supporting an empirical analysis of activity-specific environmental responses in waterfront spaces.

3.6. Reliability and Robustness Analysis

To assess the reliability and robustness of the findings, the statistical models were further examined in terms of coefficient stability, sensitivity to spatial scale, and spatial autocorrelation.

3.6.1. Bootstrap Coefficient Stability Test

To further examine whether the multiple regression results were sensitive to sampling variability, a bootstrap resampling procedure was used to assess the stability of the estimated coefficients. Compared with conventional parametric tests, the bootstrap method does not rely on strong distributional assumptions and is therefore particularly suitable for evaluating parameter stability under relatively limited sample sizes. It thus serves as a supplementary robustness check for the regression results reported in this study. For each resampled dataset, the regression coefficients of all explanatory variables were recorded, and their bootstrap mean, standard deviation, 95% confidence interval, and the proportion of positive or negative coefficient signs were further calculated to evaluate the stability of both coefficient magnitude and direction. In this study, whether the 95% bootstrap confidence interval crossed zero was used as the primary criterion for coefficient stability, while the proportion of positive or negative signs was used as a supplementary indicator to assess the consistency of the estimated direction of effects.

The test results are presented in Table 3. Overall, the coefficient stability varied across models for different types of leisure activities. In the passive activity model, only visual diversity (Entropy) exhibited a relatively stable positive effect, with a Bootstrap mean of 5.034, a 95% confidence interval of [1.204, 9.042], and a positive coefficient in 99.55% of the resampling iterations. This indicates that the promoting effect of visual diversity on passive activity intensity is robust. By contrast, the confidence intervals for the Green View Index, Sky View Index, built environment proportion, and road space proportion all crossed zero, suggesting that the estimated directions of their effects were more sensitive to sampling fluctuations.

For the active activity model, the 95% bootstrap confidence intervals of all explanatory variables crossed zero, indicating that the statistical associations between active activities and the street-level visual environment were generally weak and that the model estimates were relatively sensitive to sampling variability. This finding is consistent with the relatively low explanatory power of the active activity model in the regression analysis, suggesting that active activities are less responsive to the micro-scale visual environment.

For the social activity model, the core explanatory variables exhibited strong coefficient stability. The Bootstrap means of the Green View Index, Sky View Index, and built environment proportion were −33.236, −33.970, and −41.034, respectively; their 95% confidence intervals did not cross zero, and the coefficients remained negative in more than 99% of the resampling iterations. By contrast, visual diversity had a Bootstrap mean of 11.969, a 95% confidence interval of [6.526, 17.185], and remained positive in 99.90% of the resampling iterations. These results suggest that the core findings of the social activity model—the negative effects of GVI, SVI, and built environment proportion, and the positive effect of entropy—are robust. In comparison, the direction of the effect of road space proportion in the social activity model was less stable.

Overall, the bootstrap coefficient stability test further identifies the most robust environmental predictors across the three activity models. In the passive activity model, Entropy was the only variable that maintained a stable positive effect, indicating that visual diversity is the most reliable predictor of passive activity intensity. In the active activity model, none of the explanatory variables showed stable effects, as all 95% bootstrap confidence intervals crossed zero, confirming the relatively weak and unstable association between active activities and the street-level visual environment. In the social activity model, GVI, SVI, and Built remained consistently negative, while Entropy remained consistently positive across repeated resampling. These variables, therefore, represent the most stable and critical environmental predictors in this study. Taken together, the bootstrap results corroborate the primary findings that different types of leisure activities respond differently to the street-level visual environment, with the social activity model exhibiting the highest coefficient stability.

3.6.2. HC3 Robust Standard Error Test

To further examine whether the regression results were affected by heteroskedasticity and potential bias in standard error estimation under a relatively small sample size, this study re-estimated the models using HC3 (heteroskedasticity-consistent covariance estimator, HC3) robust standard errors based on the benchmark ordinary least squares (OLS) regressions. Compared with conventional OLS standard errors, the HC3 method provides more conservative and robust significance estimates when heteroskedasticity is present or the sample size is relatively limited. It is therefore widely used in robustness analysis of regression models.

Specifically, while keeping the dependent variables, explanatory variables, and model specifications unchanged, HC3 robust standard errors were recalculated for the passive, active, and social activity models and then compared with the original OLS results. The main purpose was to assess whether the coefficient signs and significance levels of the explanatory variables changed substantially. If the coefficient directions and significance conclusions of the core variables remained generally consistent after applying HC3 robust standard errors, the main findings of this study could be considered robust to heteroskedasticity and potential standard error bias under a relatively small sample size.

The test results are presented in Table 4. After applying HC3 robust standard errors, the signs of the core variable coefficients remained unchanged across all models, and the main significant conclusions were consistent with those of the benchmark regressions. Specifically, in the passive activity model, only visual diversity (Entropy) remained significantly positively associated with activity intensity, whereas the Green View Index (GVI), Sky View Index (SVI), built environment proportion (Built), and road space proportion (Road) did not reach statistical significance. In the active activity model, none of the explanatory variables remained significant after HC3 correction, indicating that the statistical association between active activities and the street-level visual environment was generally weak. In the social activity model, the Green View Index, Sky View Index, and built environment proportion remained significantly negatively associated with activity intensity, while visual diversity continued to show a significant positive effect; road space proportion still did not pass the significance test. These results indicate that even after using more conservative robust standard errors, the main conclusion of this study—that different types of leisure activities respond differently to the street-level visual environment—remains largely unchanged. The key explanatory variables in the social activity model continue to show relatively high stability.

Overall, the robust standard error test using HC3 further supports the reliability of the benchmark regression results. This indicates that the main findings of the study are not driven by heteroskedasticity or underestimation of standard errors due to a small sample size. Therefore, the results show satisfactory statistical robustness.

3.6.3. Spatial Scale Sensitivity Analysis (Buffer)

To assess whether the relationship between street-level visual environmental indicators and leisure activity intensity was influenced by the choice of spatial analysis scale, a spatial scale sensitivity analysis was conducted. For each behavioral segment, fixed-distance buffers of 25 m, 50 m, and 100 m were created. Visual environmental indicators from street-view sampling points within each buffer were then spatially overlaid, statistically aggregated, and used to estimate multiple regression models under each spatial scale.

The model fitting results under different spatial scales are presented in Table 5. Overall, the changes in the coefficient of determination (R²) and adjusted R² across the different scales were limited, indicating that the explanatory power of the street-level visual environment for leisure activity intensity remained relatively stable across spatial scales. This suggests that the study’s conclusions do not depend on a single scale specification and therefore demonstrate good scale robustness.

Further comparison indicates that the 50 m buffer scale provided a relatively favorable and balanced model performance across the three activity types. As shown in Table 5, the models at this scale maintained relatively high adjusted R² values, while the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) remained at comparatively low levels, suggesting a reasonable balance between explanatory power and model complexity. Although some model-fit indicators varied slightly across the 25 m, 50 m, and 100 m scales, the 50 m buffer offered a more appropriate compromise by integrating statistical performance with behavioral interpretability. Compared with the 25 m scale, the 50 m buffer can capture a more complete range of surrounding environmental information that may be perceived by individuals during on-site activities. Compared with the 100 m scale, it reduces the inclusion of distant environmental information that may exceed the range of immediate behavioral perception and weaken the correspondence between local environmental characteristics and actual activity responses.

Figure 5 shows the comparison of regression coefficients across different spatial scales. As can be seen, the effects of the street-level visual indicators largely remained consistent across the three scales, with no substantial sign reversals. Among these indicators, the Green View Index (GVI) and visual diversity (Entropy) consistently exhibited stable positive effects in most models, while built environment proportion (Built) showed a consistent negative effect. These results imply that the mechanisms by which the street-level visual environment influences leisure activities are consistent across different spatial scales.

From the perspective of behavioral perception, leisure activities in waterfront spaces are primarily generated through walking, staying, and social interaction, and the associated process of environmental perception typically occurs at the human scale. Previous studies have shown that pedestrians’ perception of urban space during movement is mainly derived from near-range visual information, and that the distance within which individuals can effectively recognize spatial form and social activities is usually concentrated within a range of several tens of meters [44]. According to isovist theory, environmental cognition depends on the extent of visible space that can be directly perceived from a given observation point, whereas environmental information beyond the immediate visual field plays a limited role in shaping behavioral responses [45]. On this basis, the 50 m buffer can be interpreted as a near-field behavioral response zone rather than merely a statistical aggregation distance. For passive activities, nearby seating facilities, shade, landscape elements, and viewing conditions may directly affect whether individuals stop, stay, rest, or watch the waterfront landscape. For active activities, visible pedestrian surfaces, path continuity, and the clarity of movement space within the near field may influence walking comfort, route choice, and the willingness to continue moving. For social activities, nearby interfaces, semi-enclosed spaces, public facilities, and activity nodes may affect whether people gather, communicate, or engage in group interaction. Therefore, the 50 m buffer captures the spatial range within which visual cues are most likely to be perceived, interpreted, and translated into immediate behavioral responses. Compared with the 25 m buffer, it includes a more complete set of environmental cues relevant to behavioral decision-making; compared with the 100 m buffer, it reduces the inclusion of distant visual information that may be visible in the broader spatial context but is less directly involved in on-site activity choices. This explains why the 50 m scale is not only statistically appropriate, but also behaviorally meaningful for matching street-level visual indicators with observed leisure activity intensity.

Overall, variations in spatial scale did not alter the study’s main conclusions. However, the 50 m scale demonstrated greater validity in statistical performance and behavioral–perceptual logic. Therefore, all subsequent analyses and discussions are based on the 50 m buffer scale.

3.6.4. Spatial Autocorrelation Test (Moran’s I)

To further examine whether the multiple regression models were affected by spatial dependence, a spatial autocorrelation test was conducted on the model residuals. Specifically, the global Moran’s I statistic was used to assess spatial autocorrelation in the residuals from regression models for passive, active, and social activity intensities.

Considering that the same observation point was repeatedly recorded under different day types (weekday and weekend) and time periods (morning, afternoon, and evening), multiple observational samples corresponded to a single spatial location. To ensure the uniqueness of spatial units in constructing the spatial weights matrix, this study used spatial location as the basic unit of analysis and aggregated the model residuals for each observation point by taking their mean, thereby obtaining a single residual value for each unique spatial coordinate.

On this basis, a spatial weights matrix was constructed using the geographic coordinates of the observation points. Spatial adjacency was defined using the K-nearest neighbors (KNN) method, and the weights were row-standardized. Given that the study area exhibits a linear, belt-like spatial pattern along the waterfront of the Pearl River and that the number of spatial samples is relatively limited, Moran’s I was further calculated repeatedly under different neighborhood scales (K = 2–7) to test the scale sensitivity of spatial dependence. The results for K = 4 are presented in Table 6, and the complete results across different neighborhood scales are reported in Table A1.

Only under a few small neighborhood specifications (e.g., K = 2) did the residuals of some activity models exhibit weakly significant spatial autocorrelation. However, under most neighborhood settings (K ≥ 3), Moran’s I values of the residuals for all three activity models failed to reach statistical significance (p > 0.05). Overall, the spatial autocorrelation lacked scale stability and did not exhibit persistent spatial clustering.

These results indicate that the constructed multiple regression models have adequately controlled for the effects of spatial heterogeneity in the study area and that the model residuals do not exhibit systematic spatial dependence. Therefore, the conventional ordinary least squares (OLS) regression model is appropriate and statistically reliable in the present study, and there is no need to further introduce spatial models for correction.

Taken together, the robustness checks indicate that the model specification has acceptable statistical stability and limited residual spatial dependence.

4. Results

4.1. Differences in Activity Intensity Across Time Periods

Figure 6 illustrates the intensity distributions of the three types of leisure activities—passive, active, and social activities—across three time periods: morning, afternoon, and evening. Each time period included 18 observational samples (n = 18).

Generally, activity intensity varied markedly across time periods. For passive activities, both the median and mean values showed a gradual upward trend from morning to evening. The evening period exhibited a higher upper quartile and greater dispersion, indicating that passive leisure activities, such as sitting, resting, and viewing, were more strongly concentrated at night.

For active activities, the distribution remained relatively stable in the morning and afternoon. However, median and mean values increased moderately in the evening and showed a certain degree of dispersion, suggesting that although the overall differences were limited, the intensity of active leisure activities was slightly higher during the evening.

Social activities exhibited the most pronounced temporal variation. Their median intensity remained relatively low in the morning and afternoon, whereas both the median and mean increased substantially in the evening, accompanied by several high-intensity observations. The larger upper quartile and longer upper whisker during the evening further indicate that social interactions were more frequent and intensive during this period.

Overall, the intensities of all three types of leisure activities were generally higher during the evening, with temporal differences being particularly pronounced for social activities.

4.2. Differences in Activity Intensity Between Weekdays and Weekends

Figure 7 compares the intensity differences in the three types of leisure activities between weekdays and weekends. Each group contains 27 observational samples (n = 27).

Across all activity types, leisure activity intensity was generally higher on weekends than on weekdays. For passive activities, the weekend distribution showed a higher median and a wider interquartile range, indicating a greater concentration of staying-oriented leisure behaviors on weekends.

For active activities, the distributions differed noticeably between weekdays and weekends. On weekdays, activity intensity was relatively stable and concentrated. However, on weekends, the observations were more widely dispersed, indicating greater variability in participation in this type of activity.

Social activities also showed differences between weekdays and weekends. Unlike passive activities, the increase in median intensity was relatively limited, but the degree of dispersion increased substantially. On weekends, the interquartile range of social activities became wider, and several high-intensity outliers emerged, leading to a clear increase in the mean. This suggests that social activities on weekends did not intensify uniformly across all spatial segments; rather, highly concentrated social interactions tended to occur at particular spatial nodes or during specific time periods.

By contrast, passive activities showed a more systematic increase in intensity on weekends, as reflected in both a higher median and a wider interquartile range, indicating that staying-oriented leisure behaviors were more widespread on weekends.

4.3. Overall Intensity Characteristics of Different Types of Leisure Activities

Figure 8 presents the overall intensity distributions of passive, active, and social activities. Each group contained 54 samples. Among the three activity types, active activities had the highest median intensity, followed by passive activities, whereas social activities showed the lowest median intensity. However, social activities exhibited the greatest dispersion, with a larger number of extremely high-intensity observations.

The distribution of passive activities was relatively concentrated, with a comparatively broad interquartile range around the median, suggesting that participation intensity was generally stable but still subject to some variation. Active activities, by contrast, were more tightly concentrated around the median, indicating a relatively stable intensity pattern.

Social activities exhibited the greatest dispersion, and several high-intensity outliers substantially increased the mean. This suggests that although social activities did not remain highly intensive across all observations, concentrated episodes of social interaction occurred at specific locations or during particular periods.

Overall, the three activity types exhibited distinct intensity structures: active activities were characterized by relatively stable and moderate intensity, passive activities showed a certain degree of temporal dependence, and social activities were marked by intermittent episodes of high-intensity behavior.

4.4. Mechanisms Through Which the Built Environment Influences the Intensity of Different Types of Leisure Activities

4.4.1. Correlation Analysis

To preliminarily examine the relationships between built environment factors and the intensity of different types of leisure activities, this study conducted Spearman correlation analysis between the three activity intensity indicators (Passive, Active, and Social) and the street-level visual environmental indicators (GVI, SVI, Road, Built, and Entropy). Spearman correlation analysis is well-suited to capturing monotonic relationships between variables and is less sensitive to the influence of outliers.

Figure 9 presents the correlation matrix between the intensities of the three types of leisure activities and the built environment variables (Figure A1). The results clearly show differentiated correlation patterns between the built environment factors and the three activity types.

First, regarding natural visual environmental indicators, the Green View Index (GVI) was significantly and positively correlated with both active and social activities, with the strongest bivariate association observed for social activities (ρ = 0.42, p < 0.01). This suggests that areas with greater visible greenness tended to show higher levels of social activity in the correlation analysis. However, this result should be interpreted as an initial bivariate relationship rather than an independent effect after controlling for other environmental variables. GVI also showed a moderate positive correlation with active activities (ρ = 0.29, p < 0.05), suggesting that greater visual greenness may be associated with movement-related activities such as walking, jogging, and cycling. In contrast, the correlation between GVI and passive activities was weaker (ρ = 0.10), indicating that staying-oriented behaviors showed a weaker bivariate association with visual greenness.

Second, the Sky View Index (SVI) was significantly negatively correlated with social activities (ρ = −0.31, p < 0.05), while its correlations with active and passive activities were not statistically significant. This pattern suggests that spaces with greater sky exposure may be less associated with social interaction, possibly because highly open spaces provide weaker spatial boundaries or fewer conditions for stable gathering. By contrast, social activities may be more likely to occur in places that offer a certain degree of enclosure and node-like spatial support.

In terms of spatial structure indicators, the proportion of road space (Road) was significantly positively correlated with passive activities (ρ = 0.35, p < 0.05). This suggests that areas with a higher visible proportion of pedestrian and circulation space tended to support more staying and resting behaviors, possibly because such spaces provide better accessibility or more usable open space along the waterfront promenade. By contrast, the relationships between the proportion of road space and both active and social activities were relatively weak.

The built environment proportion (Built) was significantly negatively correlated with both passive and social activities. Specifically, the correlation coefficients were ρ = −0.40 (p < 0.01) for social activities and ρ = −0.35 (p < 0.01) for passive activities. These results suggest that areas with a higher visible proportion of built interfaces tended to show lower levels of staying-oriented leisure behaviors and social interaction.

Additionally, visual diversity (Entropy) exhibited a significant positive correlation with active activities (ρ = 0.38, p < 0.01), while its correlations with passive and social activities were comparatively weak. This suggests that more diverse and visually complex environments tended to be associated with higher levels of movement-oriented activity in the bivariate analysis.

Overall, the correlation analysis shows that different street-level visual environmental indicators were associated with the intensities of the three types of leisure activities in different directions and to varying degrees. These exploratory findings provide a basis for the subsequent multiple regression models, which further examine the independent associations between visual environmental indicators and activity intensity.

4.4.2. Multiple Regression Results

Building on the correlation analysis, multiple linear regression models were developed to systematically examine the effects of the built environment on different types of leisure activities. These models used the intensities of the three types of leisure activities as dependent variables and street-level visual environmental indicators as independent variables. Figure 10 and Table 7 present the regression coefficients of the environmental variables and their corresponding 95% confidence intervals.

The regression results show that the directions and magnitudes of the effects of different built environment factors varied substantially across the three types of leisure activities.

For passive activities, visual diversity (Entropy) was the only environmental indicator that showed a statistically significant positive association, as indicated by its positive coefficient and confidence interval that did not cross zero. This result suggests that spaces with richer and more varied visual environments tended to support staying-oriented behaviors. By contrast, the coefficients of GVI, SVI, Built, and Road were not statistically significant in the passive activity model. Although Built showed a negative coefficient, this result should be interpreted only as a weak tendency rather than a robust effect. Overall, the passive activity model indicates that visual diversity was the most reliable visual environmental predictor of staying-oriented leisure behavior.

For active activities, none of the street-level visual environmental indicators reached statistical significance in the multiple regression model. Although GVI, Road, and Entropy showed positive coefficients, their confidence intervals crossed zero, indicating that these associations were not sufficiently stable after controlling for other visual environmental indicators and temporal factors. This suggests that active activities may be less directly explained by the micro-scale visual environment measured in this study. Instead, they may depend more on functional conditions such as path continuity, route connectivity, circulation efficiency, and the availability of exercise facilities, which were not fully captured by the selected visual indicators.

For social activities, visual diversity (Entropy) showed the strongest positive effect, indicating that spaces with richer visual elements and more heterogeneous environmental compositions were more likely to support social gathering. By contrast, GVI, SVI, and Built exhibited significant negative effects after controlling for other visual environmental indicators and temporal factors. This result should be interpreted as a conditional effect rather than as evidence that greenness or openness is inherently unfavorable to social interaction. It suggests that, when visual diversity and other spatial characteristics are held constant, areas with higher proportions of visible greenery or sky may function more as viewing, walking, or individual staying spaces, whereas social activities are more likely to cluster in visually complex spaces with interface support, activity nodes, facilities, and an appropriate degree of enclosure. Similarly, an excessively high proportion of built interfaces may compress public activity space and weaken spatial openness, thereby inhibiting social interaction.

Overall, the multiple regression results indicate that different types of leisure activities responded differently to street-level visual environmental indicators. Visual diversity showed the most stable positive association with passive and social activities, whereas active activities were less consistently explained by the selected visual variables. These findings suggest that the relationship between the waterfront visual environment and leisure activity intensity is activity-specific rather than uniform across all activity types.

4.4.3. Differences in Environmental Responses Across Activity Types

Taken together, the correlation analysis and multiple regression results indicate that different types of leisure activities varied in their environmental sensitivity and behavioral requirements, leading to distinct response patterns to the street-level visual environment.

Passive activities were most consistently associated with visual diversity. The positive effect of Entropy suggests that staying-oriented leisure behaviors were more likely to occur in spaces with richer visual information and a more varied environmental composition. Although Road and Built showed positive and negative tendencies, respectively, their effects were not statistically stable in the regression and robustness tests. Therefore, passive activities appear to depend primarily on the perceived richness of the immediate visual environment, while other spatial structural factors may play a secondary role.

Active activities showed weaker and less stable associations with the selected visual environmental indicators. Although GVI, Road, and Entropy displayed positive coefficients in some analyses, these effects did not remain statistically significant in the multiple regression and robustness tests. This suggests that active activities may be less directly driven by micro-scale visual composition and may depend more on functional spatial conditions, such as path continuity, route connectivity, circulation efficiency, and the availability of exercise facilities.

Social activities exhibited the strongest and most differentiated response to the street-level visual environment. Visual diversity had a significant positive association with social activity intensity, suggesting that spaces with richer visual information and more heterogeneous environmental compositions were more likely to support gathering and interaction. At the same time, GVI, SVI, and Built showed significant negative effects after controlling for other variables, indicating that social activities are not simply promoted by greater greenness, openness, or built-interface intensity. Instead, they appear to be related to a more balanced spatial configuration that combines visual richness, appropriate enclosure, interface support, and activity-supportive facilities.

Overall, the three types of leisure activities exhibited distinct response patterns to the street-level visual environment. Passive activities were most consistently related to visual diversity, active activities showed relatively weak and unstable associations with the selected visual indicators, and social activities were most sensitive to the combined effects of visual diversity, openness, greenness, and built interfaces. These findings suggest that the environmental–behavior relationship in waterfront public spaces is activity-specific and cannot be adequately explained by a single visual environmental indicator.

5. Discussion

5.1. Differential Response Mechanisms of Different Activity Types to the Perceived Environment

This study shows that the street-level visual environment does not affect all types of leisure activities in waterfront public spaces in a homogeneous manner; rather, it exerts clearly differentiated effects on passive, active, and social activities. This suggests that the environment–behavior relationship in waterfront public spaces is characterized by substantial activity-type heterogeneity, with different activities varying in their sensitivity to environmental factors and in the pathways through which these factors exert influence.

First, passive activities showed the most consistent response to visual diversity. The regression and robustness results indicate that Entropy had a stable positive association with passive activity intensity. This suggests that people are more likely to stay, rest, and enjoy the view in waterfront spaces with richer visual landscape elements and more varied environmental compositions. Passive activities are often closely tied to the immediate spatial experience, as staying behavior requires not only basic comfort but also sufficient visual attraction and reasons for remaining in place. Compared with spaces dominated by a single landscape element, composite environments that combine vegetation, water views, open spaces, facilities, and varied spatial interfaces may be more capable of forming attractive staying areas. This finding is consistent with environmental behavior research suggesting that staying behaviors depend on spatial attractiveness [44], and further indicates that passive leisure in waterfront spaces responds strongly to perceptible environmental richness.

Second, active activities showed weaker and less stable associations with the street-level visual environment than passive and social activities. Although GVI, Road, and Entropy displayed positive tendencies in some analyses, these associations did not remain consistently significant in the regression models or robustness checks. This suggests that active activities may be less directly shaped by micro-scale visual composition and more dependent on functional spatial conditions, such as the continuity of the waterfront slow-mobility system, circulation efficiency, route organization, and the availability of exercise facilities. In this sense, the environmental requirements of active activities may be better understood as use-supportive rather than primarily visually attractive.

In contrast, social activities exhibited the strongest and most differentiated response to the street-level visual environment. The results showed that visual diversity was positively associated with social activity intensity, whereas GVI, SVI, and Built showed significant negative associations. These findings may appear counterintuitive at first glance, especially the negative associations between greenness, sky visibility, and social activity. However, they should not be interpreted as evidence that greenness or openness is inherently unfavorable to social interaction. Rather, they suggest that, in the Guangzhou waterfront context, social activities are not simply attracted to greener or more open spaces, but depend more on whether the visual and spatial environment provides suitable conditions for gathering, staying, and interaction.

One possible explanation is that highly green or highly open waterfront areas may function primarily as spaces for viewing, walking, or individual staying rather than as social gathering nodes. In such areas, vegetation and open sky may enhance visual comfort and landscape quality, but they do not necessarily provide the spatial edges, seating facilities, semi-enclosed settings, or functional support required for social clustering. This may help explain why GVI showed a negative association with social activity after other visual environmental variables were controlled. Similarly, a high SVI may indicate an overly open spatial condition. Although openness can improve visibility and landscape exposure, excessive openness may weaken spatial boundaries, reduce the sense of enclosure, and make it more difficult for stable gathering or interaction spaces to form.

The climatic context of Guangzhou may further help explain this pattern. As a hot and humid subtropical city, Guangzhou’s waterfront spaces with high sky exposure may experience stronger solar radiation and reduced daytime thermal comfort when sufficient shade is absent. Under these conditions, overly open spaces may be less attractive for prolonged social gatherings, even when they provide broad views. By contrast, socially active nodes may require a more balanced spatial condition that combines visual richness, partial enclosure, shade, seating, lighting, and other activity-supportive facilities. Evening social activities may also be shaped by nighttime lighting, perceived safety, and the availability of comfortable places to sit or gather, factors that are not fully captured by GVI or SVI.

The negative association between Built and social activity also requires careful interpretation. This result does not imply that built interfaces are unimportant for social interaction. Rather, it suggests that an excessively high proportion of hard-built interfaces may compress public activity space, reduce environmental intimacy, and weaken the openness needed for comfortable gathering [46]. Social activities may therefore depend on an appropriate configuration of built edges rather than on a high built proportion itself. In addition, unmeasured factors such as seating provision, shading quality, nighttime lighting, thermal comfort, commercial frontage, facility configuration, and programmed activities may mediate the relationship between visual indicators and social activities. Overall, these findings suggest that social interaction depends less on any single visual attribute and more on the combined configuration of visual diversity, spatial enclosure, interface support, climatic comfort, and activity-supportive facilities.

5.2. The Role of the Built Environment and Visual Diversity

Among the environmental variables examined in this study, visual diversity emerged as one of the most explanatory and stable indicators. The results show that visual diversity was positively associated with passive activities and remained a strong positive predictor in the social activity model. This suggests that the compositional structure of multiple visual elements in the street-level environment may be more informative for explaining leisure activity patterns than the increase or decrease in any single environmental element.

From the perspective of environmental perception, visual diversity reflects how different landscape and spatial elements are combined within street-level scenes. Spaces that include vegetation, water views, open areas, built interfaces, and public facilities can provide richer environmental information and a stronger sense of spatial legibility, stayability, and exploration. For passive activities, such visual richness may increase environmental attraction and encourage people to stay, observe, and rest. For social activities, visual diversity may strengthen the sense of place and interaction potential of a space, making it more likely to function as a node for communication and gathering.

By contrast, the effects of individual environmental elements were less stable. For example, although GVI showed positive tendencies in some analyses, its independent association was not robust after other environmental variables were controlled, particularly in the active activity model. This suggests that leisure activities in waterfront public spaces are unlikely to be driven by a single landscape element alone. Rather, they appear to be shaped by the combined configuration of multiple visual environmental factors. In this sense, the compositional structure of the visual environment may be more informative than the optimization of any single indicator, which distinguishes this study from conventional macro-scale built environment analyses.

The significant negative association between Built and social activity further suggests that stronger built interfaces are not necessarily more supportive of social interaction in waterfront public spaces. A highly visible proportion of built interfaces may reduce spatial openness, compress public activity space, and weaken environmental intimacy and the sense of place. However, this does not mean that built interfaces themselves are unimportant. Rather, their role depends on how they are configured with open spaces, landscape elements, pedestrian areas, and public facilities. Activity-supportive waterfront environments are therefore not simply those with the highest greenness or the strongest built presence, but those in which different elements form an appropriate hierarchy, rhythm, and interface support.

Therefore, the findings on visual diversity suggest that improving waterfront vitality should not rely solely on adding individual environmental elements. Greater attention should be paid to the compositional relationships and spatial hierarchy among vegetation, water views, built interfaces, facilities, and open spaces. Compared with single-element landscape optimization, strategies that enhance visual diversity and spatial organization may be more effective in improving the attractiveness and activity-supporting capacity of waterfront public spaces.

5.3. Planning and Design Implications

This study shows that different types of leisure activities respond differently to the visual environment of waterfront spaces. Accordingly, the optimization of waterfront public spaces should move beyond uniform design strategies and adopt more activity-specific, hierarchical, and context-sensitive interventions.

First, for passive activities, design interventions should focus on improving both stayability and visual appeal. The results indicate that visual diversity is consistently associated with passive activities. The results indicate that visual diversity is consistently associated with passive activities. In waterfront renewal, this implies the need for a coordinated arrangement of vegetation, water views, paving, seating, shading structures, and viewing points to create staying environments with spatial layering and perceptible variation. Rather than simply increasing greenery, design efforts should aim to provide diverse place experiences that support observation, relaxation, and short-term staying.

For active activities, planning and design should place greater emphasis on circulation support and spatial continuity. Given the weak and unstable statistical association between active activities and the selected visual indicators, these activities appear to depend more on functional conditions, such as path-system integrity, route continuity, spatial accessibility, and the availability of exercise facilities. In waterfront slow-mobility spaces, design efforts should therefore prioritize continuous, safe, and legible walking and running routes, reduce spatial fragmentation and circulation conflicts, and improve route coherence by linking key activity nodes. Visual landscape improvements may enhance the overall experience, but they are unlikely to be sufficient on their own to support active activities.

When designing spaces for social activities, the design should focus on creating spaces with an appropriate degree of enclosure, interface support, and visual complexity. The findings suggest that social activities are more likely to occur in environments with rich visual elements, clearly defined interface relationships, and spatial structures that are not overly open. Socially oriented nodes may therefore benefit from small-scale plazas, stay-supportive edges, semi-enclosed resting areas, and multifunctional facilities. These elements can strengthen the sense of place and create more favorable conditions for interaction. At the same time, overly decorative or excessively open designs should be avoided, as they may weaken the balance between openness and social aggregation.

Finally, from the perspective of overall spatial optimization, waterfront design should shift from simply increasing greenery toward improving spatial structure and environmental composition. The findings suggest that enhancing spatial vitality is not equivalent to maximizing any single indicator; rather, it depends on the coordinated organization of multiple environmental elements. Future waterfront design could therefore follow a logic of enhancing visual diversity, shaping spatial hierarchy, and matching environmental conditions with activity types. Such an approach can help create composite spatial environments that combine landscape attractiveness, staying opportunities, and potential for social interaction.

More broadly, the proposed framework can serve as a diagnostic and evaluative tool for waterfront planning and design practice. Before spatial renewal, street-view semantic segmentation can be used to map the distribution of key visual environmental attributes, such as greenness, openness, built interfaces, circulation space, and visual diversity, and to identify segments where visual composition does not align with observed activity demand. By linking these indicators with activity observation data, planners and designers can further diagnose whether low-activity areas are associated with insufficient stay-supportive visual richness, weak circulation support, a lack of social interfaces, excessive openness, or built-interface pressure. After design intervention, the same workflow can be repeated to compare changes in visual environmental indicators and activity intensity, thereby supporting post-occupancy evaluation and adaptive management. In this sense, the contribution of this study is not limited to design suggestions for the Pearl River waterfront; it also offers a replicable evidence-based workflow that connects pedestrian-level visual diagnosis, activity-type differentiation, and targeted spatial optimization.

The framework may also be transferable to other waterfronts and linear public spaces, such as riverfront promenades, urban greenways, coastal walkways, and linear parks, because it relies on widely available street-view imagery, reproducible semantic segmentation procedures, and observable activity data. However, the empirical coefficients and design interpretations should not be generalized without contextual calibration. Differences in climate, cultural preferences, facility provision, management practices, and daily activity rhythms may affect how visual environmental attributes are translated into behavior. Future applications in other cities should therefore retain the same analytical logic while recalibrating activity classifications, observation periods, spatial buffer scales, and locally relevant environmental variables.

5.4. Limitations and Future Research

This study examined how the micro-scale street-level visual environment is associated with different types of leisure activities in waterfront public spaces. However, several limitations should be acknowledged and addressed in future research.

(1): The behavioral data used in this study were derived from on-site observations conducted during a limited number of time periods. Although these data captured activity differences between weekdays and weekends and across different times of day, they cannot fully reflect variations in activity patterns across different seasons, weather conditions, and longer temporal scales. Future research could integrate long-term behavioral observations, mobile signaling data, or spatiotemporal social media data to support more continuous and dynamic analyses of activity patterns in waterfront public spaces.
(2): A second limitation concerns the temporal mismatch between behavioral observations and street-view image acquisition. The behavioral observations covered morning, afternoon, and evening periods, whereas the street-view images used for semantic segmentation were collected during daytime. Therefore, the visual environmental indicators derived from these images should be interpreted as representations of relatively stable physical and structural visual characteristics, such as vegetation, sky visibility, built interfaces, and road space, rather than as real-time perceptual conditions for each observation period. This issue is particularly relevant to evening activities, especially social activities, because nighttime visual experience may differ substantially from daytime street-view conditions. Future research could incorporate nighttime street-view imagery, multi-period image collection, lighting measurements, and dynamic environmental monitoring to better capture time-specific visual experiences and their relationships with leisure activity intensity.
(3): This study primarily focused on visual environmental characteristics derived from street-view imagery and did not incorporate thermal comfort, noise, wind conditions, or other microclimatic factors into a unified analytical framework. In practice, public activities are shaped by multiple environmental dimensions. This issue is particularly relevant to waterfront spaces in Guangzhou, where hot and humid climatic conditions may interact with visual openness, shading, and spatial enclosure to influence activity occurrence. For example, areas with higher sky visibility may provide broader views, but without sufficient shade, they may also experience stronger solar exposure and reduced daytime thermal comfort, which may make them less suitable for prolonged social gatherings. In addition, several activity-supportive factors that may be particularly relevant to social activities, such as seating availability, shading quality, nighttime lighting, perceived safety, commercial frontage, facility configuration, and programmed activities, were not directly measured in this study. These factors may partly mediate the relationship between street-level visual indicators and social activity intensity. Future research could incorporate environmental sensor data, meteorological monitoring data, facility audits, and subjective perception surveys to develop a more comprehensive environment–behavior analytical framework.
(4): This study focused on the core waterfront section of the Pearl River in Guangzhou as a case study. Although this area is representative of central waterfront public spaces in Guangzhou, the applicability of the findings to waterfront spaces in other cities requires further validation. Spatial form, cultural preferences, facility provision, management practices, and public activity patterns may vary across cities. Future research could conduct comparative studies across different cities and types of waterfront spaces to test the transferability and broader generalizability of the findings.
(5): This study examined the relationship between the visual environment and activity types by combining street-view semantic segmentation with regression analysis. As such, the mechanisms revealed here should be understood as statistical associations rather than strictly causal relationships. Future research could further incorporate experimental designs, quasi-natural experiments, or behavioral data with higher spatiotemporal resolution to more rigorously identify the dynamic causal pathways linking environmental optimization and changes in activity patterns.

6. Conclusions

6.1. Main Findings

Focusing on the core waterfront section along the northern bank of the Pearl River in Guangzhou, this study recorded three types of leisure activities—passive, active, and social activities—through field observations, and extracted visual environmental indicators of the waterfront space using a street-view image semantic segmentation approach. Based on correlation analysis and multiple regression models, the study examined how different types of leisure activity intensity respond to the street-level visual environment in waterfront public spaces. The main findings are as follows:

(1): Leisure activities in waterfront public spaces exhibited clear temporal variations. Overall, all three activity types were more active in the evening and on weekends, with social activities showing the most pronounced temporal differences and a tendency toward nighttime clustering. Passive activities increased more substantially on weekends, whereas active activities remained relatively stable across different temporal conditions. These findings indicate that the Pearl River waterfront functions not only as an important setting for everyday leisure but also as a space shaped by distinct temporal clustering and daily-life rhythms. It should be noted that the street-view indicators used in this study mainly represent relatively stable structural visual characteristics and do not fully capture time-specific nighttime perceptual conditions.
(2): Different types of leisure activities exhibited distinct overall intensity structures. Active activities showed relatively stable intensity levels and a comparatively high median, indicating that physical activities such as walking and jogging have a sustained presence in waterfront spaces. Passive activities displayed stronger temporal dependence, with their intensity varying more substantially across different times of day and day types. Social activities, although characterized by a relatively low overall median intensity, exhibited high-intensity clustering at specific spatial nodes and during particular time periods, revealing pronounced node and place dependence.
(3): The multiple regression results indicate that different types of leisure activities responded differently to the street-level visual environment. Visual diversity (Entropy) showed a stable positive association with passive activities, suggesting that environmental richness is an important micro-scale condition for supporting staying behaviors. Active activities showed generally weak and unstable associations with the selected visual environmental variables, indicating that they may depend more on non-visual conditions, such as spatial continuity and functional support. Social activities were the most sensitive to the visual environment: GVI, SVI, and Built showed significant negative associations, whereas visual diversity showed a significant positive effect. This suggests that social activities are more likely to occur in environments characterized by rich visual elements, complex spatial structures, and an appropriate degree of enclosure.
(4): Visual diversity is a key micro-scale environmental variable for explaining differences in leisure activities within waterfront public spaces. Compared with any single environmental element, the compositional structure of the visual environment appears to play a more important role in shaping spatial attractiveness and activity organization. When waterfront spaces simultaneously incorporate multiple perceptible visual elements, such as vegetation, water, open space, built interfaces, and public facilities, they are more likely to form places that support staying, interaction, and spatial legibility, thereby enhancing both the vitality of public space use and the diversity of behaviors.

Overall, the findings of this study indicate that leisure activities in waterfront public spaces are associated not only with temporal conditions but also with the spatial configuration of the environment. Different activity types exhibit distinct response patterns to visual environmental characteristics. As an important representation of micro-scale spatial quality, the structure of the street-level visual environment plays an important role in shaping activity patterns in waterfront public spaces.

6.2. Research Contributions

From the perspective of street-level visual environments, this study examines how different types of leisure activities respond to micro-scale visual environmental conditions in waterfront public spaces. Its contributions can be summarized in methodological, empirical, and practical dimensions.

Methodologically, this study integrates street-view image semantic segmentation with behavioral observation data to construct a quantitative framework for assessing the micro-scale visual environment of waterfront public spaces. By extracting indicators such as the Green View Index (GVI), Sky View Index (SVI), built environment proportion (Built), road space proportion (Road), and visual diversity (Entropy), this framework identifies environmental elements from a pedestrian perspective and links them to different activity types. In doing so, it extends conventional public space research that has predominantly relied on macro-scale built environment indicators.

Empirically, this study reveals the differentiated response mechanisms of various types of leisure activities to the visual environment, demonstrating that passive, active, and social activities are not driven by the same environmental factors. Specifically, passive activities are primarily promoted by visual diversity, active activities exhibit relatively weak and unstable associations with the visual environment, whereas social activities are the most sensitive to visual conditions. These findings enrich the research perspective on the “environment–behavior” relationship in waterfront public spaces and provide new empirical evidence for understanding the spatial preferences of different activity types.

In terms of planning practice, this study provides empirical support for shifting waterfront public space design from uniform greening toward activity-oriented spatial optimization. The findings indicate that enhancing the vitality of public spaces is not equivalent to increasing a single landscape element; rather, it depends on integrated environmental characteristics such as visual diversity, interface organization, and spatial complexity. Accordingly, the optimization of waterfront public spaces should give greater attention to the differentiated environmental needs of various activity types and improve both leisure activity support and overall spatial quality through hierarchical and scenario-based design strategies.

Overall, by integrating behavioral observations with street-level visual environment analysis, this study develops an analytical framework that links the micro-scale environment of waterfront public spaces with activity types. It provides a useful perspective for understanding the fine-grained mechanisms underlying how fine-grained visual environmental conditions are associated with waterfront activity patterns and offers a basis for future research on the refined design and evaluation of waterfront environments.

Author Contributions

Conceptualization, Y.P. and Y.C.; methodology, Y.P. and Y.C.; software, Y.C.; validation, Y.P. and Y.C.; formal analysis, Y.C.; investigation, Y.C.; resources, Y.C. and J.C.; data curation, Y.C. and J.C.; writing—original draft preparation, Y.P., Y.C. and J.C.; writing—review and editing, Y.P., Y.C. and J.C.; visualization, Y.C.; supervision, Y.P.; project administration, Y.P.; funding acquisition, Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Global Moran’s I test results of regression residuals under different neighborhood sizes.

Residual	K	Moran’s I	p-Value	Z-Score
Passive activity	2	−0.4239	0.021	−1.88092
Active activity	2	−0.42824	0.111	−1.0833
Social activity	2	−0.46022	0.046	−1.20892
Passive activity	3	−0.25758	0.136	−1.16034
Active activity	3	−0.26224	0.298	−0.67437
Social activity	3	−0.36173	0.06	−1.25238
Passive activity	4	−0.27149	0.025	−1.6478
Active activity	4	−0.14655	0.454	−0.15854
Social activity	4	−0.22336	0.231	−0.76701
Passive activity	5	−0.21356	0.067	−1.2247
Active activity	5	−0.0937	0.281	0.309455
Social activity	5	−0.22644	0.042	−1.07583
Passive activity	6	−0.16864	0.232	−0.79314
Active activity	6	−0.14041	0.441	−0.21793
Social activity	6	−0.16862	0.208	−0.72773
Passive activity	7	−0.14436	0.332	−0.52533
Active activity	7	−0.12	0.335	0.117069
Social activity	7	−0.14159	0.37	−0.45681

Figure A1. Full Spearman correlation matrix among activity intensity and environmental variables. Values indicate Spearman’s correlation coefficients; * p < 0.05; ** p < 0.01; *** p < 0.001.

References

Niu, Y.; Mi, X.; Wang, Z. Vitality Evaluation of the Waterfront Space in the Ancient City of Suzhou. Front. Archit. Res. 2021, 10, 729–740. [Google Scholar] [CrossRef]
Yu, P.; Zhang, Y. A Satisfaction Study of Waterfront Public Spaces in Winter Cities from a Demand Perspective: A KANO-IPA Model Analysis Based on Northeastern China. Land 2025, 14, 92. [Google Scholar] [CrossRef]
Luo, J.; Yuan, Z.; Xu, L.; Xu, W. Assessing the Impact of Waterfront Environments on Public Well-Being through Digital Twin Technology. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 4536–4553. [Google Scholar] [CrossRef]
Gascon, M.; Zijlema, W.; Vert, C.; White, M.P.; Nieuwenhuijsen, M.J. Outdoor Blue Spaces, Human Health and Well-Being: A Systematic Review of Quantitative Studies. Int. J. Hyg. Environ. Health 2017, 220, 1207–1221. [Google Scholar] [CrossRef]
Pasanen, T.P.; White, M.P.; Wheeler, B.W.; Garrett, J.K.; Elliott, L.R. Neighbourhood Blue Space, Health and Wellbeing: The Mediating Role of Different Types of Physical Activity. Environ. Int. 2019, 131, 105016. [Google Scholar] [CrossRef]
Cheung, D.M.; Tang, B.S. Social Order, Leisure, or Tourist Attraction? The Changing Planning Missions for Waterfront Space in Hong Kong. Habitat Int. 2015, 47, 231–240. [Google Scholar] [CrossRef]
Zhang, J.; Hu, X.; Wang, J. Spatial Vitality Variation in Community Parks and Their Influencing Factors. PLoS ONE 2025, 20, e0312941. [Google Scholar] [CrossRef] [PubMed]
Ye, Y.; Qiu, H. Environmental and Social Benefits, and Their Coupling Coordination in Urban Wetland Parks. Urban For. Urban Green. 2021, 60, 127043. [Google Scholar] [CrossRef]
Milias, V.; Teeuwen, R.; Bozzon, A.; Psyllidis, A. Crowdsourcing the Influence of Physical Features on the Likely Use of Public Open Spaces. Comput. Urban Sci. 2024, 4, 15. [Google Scholar] [CrossRef]
Chen, L.; Ma, Y. How Do Ecological and Recreational Features of Waterfront Space Affect Its Vitality? Developing Coupling Coordination and Enhancing Waterfront Vitality. Int. J. Environ. Res. Public Health 2023, 20, 1196. [Google Scholar] [CrossRef]
Huang, Y.; Zheng, B. Social Media Users’ Visual and Emotional Preferences of Internet-Famous Sites in Urban Riverfront Public Spaces: A Case Study in Changsha, China. Land 2024, 13, 930. [Google Scholar] [CrossRef]
Gehl, J. Life Between Buildings: Using Public Space, 1st ed.; Island Press: Washington, DC, USA, 2011. [Google Scholar]
Xia, Y.; Yabuki, N.; Fukuda, T. Development of a System for Assessing the Quality of Urban Street-Level Greenery Using Street View Images and Deep Learning. Urban For. Urban Green. 2021, 59, 126995. [Google Scholar] [CrossRef]
Cheng, L.; Chu, S.; Zong, W.; Li, S.; Wu, J.; Li, M. Use of Tencent Street View Imagery for Visual Perception of Streets. ISPRS Int. J. Geo-Inf. 2017, 6, 265. [Google Scholar] [CrossRef]
Lu, Z.; Lu, Y.; Chen, Y.; Chen, S. Effects of Streetscapes on Residents’ Sentiments during Heatwaves in Shanghai: Evidence from Multi-Source Data and Interpretable Machine Learning for Urban Sustainability. Sustainability 2025, 17, 10281. [Google Scholar] [CrossRef]
Li, K. Research on the factors influencing the spatial quality of high-density urban streets: A framework using deep learning, street scene images, and principal component analysis. Land 2024, 13, 1161. [Google Scholar] [CrossRef]
Yang, Y.; Ma, Y.; Jiao, H. Exploring the Correlation between Block Vitality and Block Environment Based on Multisource Big Data: Taking Wuhan City as an Example. Land 2021, 10, 984. [Google Scholar] [CrossRef]
Al Mushayt, N.S.; Dal Cin, F.; Barreiros Proenca, S. New Lens to Reveal the Street Interface. A Morphological-Visual Perception Methodological Contribution for Decoding the Public/Private Edge of Arterial Streets. Sustainability 2021, 13, 11442. [Google Scholar] [CrossRef]
Gehl, J.; Kaefer, L.J.; Reigstad, S. Close Encounters with Buildings. Urban Des. Int. 2006, 11, 29–47. [Google Scholar] [CrossRef]
Dal Cin, F.; Hooimeijer, F.; Matos Silva, M. Planning the Urban Waterfront Transformation, from Infrastructures to Public Space Design in a Sea-Level Rise Scenario: The European Union Prize for Contemporary Architecture Case. Water 2021, 13, 218. [Google Scholar] [CrossRef]
Mehta, V. Lively Streets: Determining Environmental Characteristics to Support Social Behavior. J. Plan. Educ. Res. 2007, 27, 165–187. [Google Scholar] [CrossRef]
Zhang, J.; Li, X.; Lian, H.; Li, H.; Zhang, J. Day–Night Synergy between Built Environment and Thermal Comfort and Its Impact on Pedestrian Street Vitality: Beijing–Chengdu Comparison. Buildings 2025, 15, 2118. [Google Scholar] [CrossRef]
Zhang, Y.; Lu, Y.; Li, H.; Zhai, G. Effects of Street-Level Thermal Comfort on Collective Behaviors within the Diurnal Cycle: The Moderating Effect of Streetscape Perception. Urban Clim. 2025, 61, 102485. [Google Scholar] [CrossRef]
Zhang, S.; Lu, J.; Guo, R.; Yang, Y. Exploring the Relationship between Visual Perception of the Urban Riverfront Core Landscape Area and the Vitality of Riverfront Road: A Case Study of Guangzhou. Land 2024, 13, 2142. [Google Scholar] [CrossRef]
Zhou, Z.; Yang, F.; Li, J.; Li, J.; Zou, Z. Identification of Critical Areas of Openness–Vitality Intensity Imbalance in Waterfront Spaces and Prioritization of Interventions: A Case Study of Xiangjiang River in Changsha, China. Land 2024, 13, 686. [Google Scholar] [CrossRef]
Cui, T.; Ye, Y.; Zhuang, Y.; Lin, Q.; Yan, M.; Zhang, L.; Zhu, L. A Study of the Changing Characteristics and Influencing Factors of Holiday Visitor Vitality in Urban Parks: The Case of Fuzhou, China. PLoS ONE 2024, 19, e0311546. [Google Scholar] [CrossRef]
Klimek, R. Towards Recognising Individual Behaviours from Pervasive Mobile Datasets in Urban Spaces. Sustainability 2019, 11, 1563. [Google Scholar] [CrossRef]
Fang, L.; Huang, J.; Zhang, Z.; Nitivattananon, V. Data-Driven Framework for Delineating Urban Population Dynamic Patterns: Case Study on Xiamen Island, China. Sustain. Cities Soc. 2020, 62, 102365. [Google Scholar] [CrossRef]
Shi, T.; Gao, F. Utilizing Multi-Source Geospatial Big Data to Examine How Environmental Factors Attract Outdoor Jogging Activities. Remote Sens. 2024, 16, 3056. [Google Scholar] [CrossRef]
Yang, W.; Hu, J.; Liu, Y.; Guo, W. Examining the Influence of Neighborhood and Street-Level Built Environment on Fitness Jogging in Chengdu, China: A Massive GPS Trajectory Data Analysis. J. Transp. Geogr. 2023, 108, 103575. [Google Scholar] [CrossRef]
Shi, H.; Zhang, L.; Ma, D.; Zhang, M.; Wang, M.; Wei, Z. Exploring Recreational Walking and Its Correlated Built Environment Factors in River Corridor Space through a Trajectory Sematic-Based Approach. Urban For. Urban Green. 2025, 107, 128767. [Google Scholar] [CrossRef]
Wei, Z.; Cao, K.; Kwan, M.-P.; Jiang, Y.; Feng, Q. Measuring the Age-Friendliness of Streets’ Walking Environment Using Multi-Source Big Data: A Case Study in Shanghai, China. Cities 2024, 148, 104829. [Google Scholar] [CrossRef]
Li, X.; Santi, P.; Courtney, T.K.; Verma, S.K.; Ratti, C. Investigating the Association between Streetscapes and Human Walking Activities Using Google Street View and Human Trajectory Data. Trans. GIS 2018, 22, 1029–1044. [Google Scholar] [CrossRef]
Liya, F.; Zhouni, H.; Wenhui, Z.; Tao, Z. Association between Public Space and Resident Outdoor Activity Behavior in Urban Areas Surrounding Lakes. Sci. Rep. 2025, 15, 44871. [Google Scholar] [CrossRef]
Li, T.; Huang, X.; Zhu, Y.; Wang, J. Human Behavior Patterns in Meso-Scale Waterfront Public Spaces from a Visual Accessibility Perspective—A Case Study of Xiaoqinhuai Historic District, Yangzhou (China). Buildings 2025, 15, 3247. [Google Scholar] [CrossRef]
Liu, Z.; Zhang, Y.; He, X.; Zhang, D.; Ai, S. Towards Sustainable Historic Waterfront Streets: Integrating Semantic Segmentation and sDNA for Visual Perception Evaluation and Optimization in Liaocheng City, China. Sustainability 2026, 18, 1099. [Google Scholar] [CrossRef]
Ewing, R.; Hajrasouliha, A.; Neckerman, K.M.; Purciel-Hill, M.; Greene, W. Streetscape Features Related to Pedestrian Activity. J. Plan. Educ. Res. 2016, 36, 5–15. [Google Scholar] [CrossRef]
Zhao, X.; Lu, Y.; Lin, G. An Integrated Deep Learning Approach for Assessing the Visual Qualities of Built Environments Utilizing Street View Images. Eng. Appl. Artif. Intell. 2024, 130, 107805. [Google Scholar] [CrossRef]
Huang, L.; Oki, T.; Muto, S.; Ogawa, Y. Unveiling the non-linear influence of eye-level streetscape factors on walking preference: Evidence from tokyo. ISPRS Int. J. Geo-Inf. 2024, 13, 131. [Google Scholar] [CrossRef]
Wang, R.; Lu, T.; Wan, C.; Sun, X.; Jiang, W. Measuring the Effects of Streetscape Characteristics on Perceived Safety and Aesthetic Appreciation of Pedestrians. J. Urban Plan. Dev. 2023, 149, 05023020. [Google Scholar] [CrossRef]
Wang, Y.; Xiu, C. Spatial quality evaluation of historical blocks based on street view image data: A case study of the fangcheng district. Buildings 2023, 13, 1612. [Google Scholar] [CrossRef]
Cortinovis, C.; Geneletti, D. A performance-based planning approach integrating supply and demand of urban ecosystem services. Landsc. Urban Plan. 2020, 201, 103842. [Google Scholar] [CrossRef]
Li, X.; Pang, C. A spatial visual quality evaluation method for an urban commercial pedestrian street based on streetscape images-taking tianjin binjiang road as an example. Sustainability 2024, 16, 1139. [Google Scholar] [CrossRef]
Gehl, J. Cities for People; Island Press: Washington, DC, USA, 2013. [Google Scholar]
Benedikt, M. To Take Hold of Space: Isovists and Isovist Fields. Environ. Plan. B Plan. Des. 1979, 6, 47–65. [Google Scholar] [CrossRef]
Whyte, W.H. The Social Life of Small Urban Spaces; Conservation Foundation: Washington, DC, USA, 1980. [Google Scholar]

Figure 1. Study area.

Figure 2. (a) Observation units and street view points; (b1,b2) evidence for unit selection (weekday vs. weekend heatmap overlay); (c) photos of representative units.

Figure 3. Semantic segmentation pipeline (Mask2Former + ADE20K).

Figure 4. Overall methodological workflow.

Figure 5. Comparison of regression coefficients under different spatial buffer scales. (Note: The dashed vertical line indicates the reference line at x = 0, representing a regression coefficient of zero.)

Figure 6. Distribution of passive, active, and social activity intensity across morning, afternoon, and evening periods.

Figure 7. Distribution of passive, active, and social activity intensity on weekdays and weekends.

Figure 8. Overall intensity characteristics of different leisure activity types.

Figure 9. Spearman Correlations Between Activity Intensity and Environmental Indicators. Values indicate Spearman’s correlation coefficients; * p < 0.05; ** p < 0.01.

Figure 10. OLS regression coefficients and 95% confidence intervals for environmental predictors. Points indicate estimated regression coefficients, horizontal lines indicate 95% confidence intervals, and the vertical grey line represents zero. Confidence intervals that do not cross zero indicate statistically significant associations.

Table 1. Semantic Category Merging Rules.

Environmental Indicator Categories	Expression	Included ADE20K Semantic Categories
Vegetation	G	tree, grass, plant, field, flower, bush, forest, lawn
Sky	Sky	sky
Built environment	B	building, wall, bridge, tower
Road space	R	road, sidewalk
Natural elements	N	tree, grass, plant, river, lake, water
Entropy categories	K	All semantic categories in the image (categories with a pixel proportion greater than 0)

Table 2. Definition and Calculation Formulas of Environmental Indicators.

Indicator	Calculation Formula	Indicator Meaning
GVI	${G V I}_{i} = \sum_{c \in G} p_{i, c}$	Represents the visible proportion of vegetation in the pedestrian field of view; proxy for perceived greenness and naturalness.
SVI	${S V I}_{i} = p_{i, s k y}$	Represents the visible proportion of sky in the pedestrian field of view; proxy for perceived openness and exposure.
Road ratio	${R o a d}_{i} = \sum_{c \in R} p_{i, c}$	Represents the visible proportion of road and pedestrian surfaces; proxy for circulation space and movement accessibility.
Built ratio	${B u i l t}_{i} = \sum_{c \in B} p_{i, c}$	Represents the visible proportion of built interfaces; proxy for built intensity, visual enclosure, and interface pressure.
Entropy	$E n t r o p y_{i} = - \sum_{k \in K} p_{i, k} \ln (p_{i, k} + ε)$ $k$ represents the set of all semantic categories in the image. $ε$ is a very small constant introduced to avoid instability in logarithmic calculations.	Represents the diversity and balance of visible semantic elements; proxy for visual richness, complexity, and compositional heterogeneity.

Table 3. Bootstrap Coefficient Stability Results for Models of Different Types of Leisure Activities.

Model	Variable	boot_mean	ci_low_2.5%	ci_high_97.5%	pos_ratio	neg_ratio	cross_zero
Passive	const	−5.803	−11.088	−0.449	0.016	0.985	FALSE
	GVI	0.446	−8.426	9.075	0.544	0.457	TRUE
	SVI	2.312	−5.581	9.621	0.746	0.255	TRUE
	Built	−5.117	−13.140	2.419	0.098	0.902	TRUE
	Road	1.394	−1.850	4.760	0.797	0.203	TRUE
	Entropy	5.034	1.204	9.042	0.996	0.005	FALSE
Active	const	−2.623	−5.454	0.564	0.047	0.954	TRUE
	GVI	4.880	−3.969	14.411	0.854	0.146	TRUE
	SVI	1.735	−5.667	9.936	0.662	0.338	TRUE
	Built	1.260	−6.261	9.348	0.620	0.380	TRUE
	Road	0.646	−1.880	3.081	0.698	0.303	TRUE
	Entropy	2.111	−1.010	5.010	0.924	0.077	TRUE
Social	const	3.025	−3.045	9.570	0.835	0.166	TRUE
	GVI	−33.236	−48.090	−16.136	0.004	0.997	FALSE
	SVI	−33.970	−46.718	−19.035	0.003	0.998	FALSE
	Built	−41.034	−54.923	−25.384	0.002	0.998	FALSE
	Road	1.479	−2.823	6.128	0.740	0.261	TRUE
	Entropy	11.969	6.526	17.185	0.999	0.001	FALSE

Table 4. HC3 Robust Standard Error Results for Models of Different Leisure Activity Types.

Model	Variable	coef	se_ols	p_ols	se_hc3	p_hc3
Passive	const	−5.77537	2.536395	0.027283	2.949018	0.056008
	GVI	0.706994	6.421382	0.912789	4.628351	0.879233
	SVI	2.506285	5.423813	0.646105	3.996	0.533501
	Built	−4.89628	5.961915	0.415559	4.114623	0.23991
	Road	1.377955	1.58836	0.389966	1.806352	0.449292
	Entropy	4.94347	2.09747	0.022557	2.132069	0.02472
Active	const	−2.6393	1.620315	0.109884	1.575491	0.100394
	GVI	4.819994	4.102145	0.245793	4.798645	0.320199
	SVI	1.743976	3.464872	0.617034	4.037244	0.667696
	Built	1.14215	3.808625	0.76556	4.097309	0.781629
	Road	0.623579	1.014686	0.541752	1.262959	0.623739
	Entropy	2.150572	1.339918	0.115053	1.527683	0.165653
Social	const	3.170311	3.425124	0.359282	3.370598	0.351631
	GVI	−33.5291	8.671374	0.000331	8.273921	0.000185
	SVI	−34.2928	7.324267	2.36 × 10⁻⁵	6.99646	1.13 × 10⁻⁵
	Built	−41.3287	8.050915	5.11 × 10⁻⁶	7.658111	2.06 × 10⁻⁶
	Road	1.452505	2.144907	0.501538	2.347761	0.539054
	Entropy	11.99983	2.832404	0.000102	2.867176	0.000121

Table 5. Model performance under different spatial buffer scales.

Scale	Model	N	R²	Adj_R²	AIC	BIC
25 m	Passive_mean_log	54	0.295934	0.170766	148.5998	166.5007
	Active_mean_log	54	0.446451	0.348042	88.42639	106.3272
	Social_mean_log	54	0.530782	0.447366	177.189	195.0899
50 m	Passive_mean_log	54	0.435528	0.335178	136.6668	154.5676
	Active_mean_log	54	0.446327	0.347896	88.43851	106.3394
	Social_mean_log	54	0.581501	0.507101	171.0118	188.9127
100 m	Passive_mean_log	54	0.395205	0.287685	140.3928	158.2936
	Active_mean_log	54	0.467362	0.372671	86.34696	104.2478
	Social_mean_log	54	0.597434	0.525866	168.9158	186.8167

Table 6. Spatial autocorrelation test of regression residuals.

Activity Type	K	Moran’s I	Z-Score	p-Value
Passive activity	4	−0.271487889	−1.6478	0.025
Active activity	4	−0.14654682	−0.15854	0.454
Social activity	4	−0.223357696	−0.76701	0.231

Table 7. OLS regression results for environmental predictors.

Model	Term	Coef	SE	p	Sig	CI_low	CI_high
Passive activity	const	−5.77537	2.536395	0.027283	*	−10.8751	−0.6756
	GVI	0.706994	6.421382	0.912789		−12.2041	13.61805
	SVI	2.506285	5.423813	0.646105		−8.39902	13.41159
	Built	−4.89628	5.961915	0.415559		−16.8835	7.090951
	Road	1.377955	1.58836	0.389966		−1.81566	4.571568
	Entropy	4.94347	2.09747	0.022557	*	0.726223	9.160716
Active activity	const	−2.6393	1.620315	0.109884		−5.89716	0.618558
	GVI	4.819994	4.102145	0.245793		−3.42792	13.06791
	SVI	1.743976	3.464872	0.617034		−5.22262	8.710568
	Built	1.14215	3.808625	0.76556		−6.5156	8.799905
	Road	0.623579	1.014686	0.541752		−1.41658	2.663741
	Entropy	2.150572	1.339918	0.115053		−0.54351	4.844658
Social activity	const	3.170311	3.425124	0.359282		−3.71636	10.05698
	GVI	−33.5291	8.671374	0.000331	***	−50.9641	−16.0941
	SVI	−34.2928	7.324267	2.36 × 10⁻⁵	***	−49.0193	−19.5664
	Built	−41.3287	8.050915	5.11 × 10⁻⁶	***	−57.5161	−25.1412
	Road	1.452505	2.144907	0.501538		−2.86012	5.765129
	Entropy	11.99983	2.832404	0.000102	***	6.304895	17.69476

Note: Coef. = regression coefficient; SE = standard error; CI = confidence interval. * p < 0.05; *** p < 0.001.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pan, Y.; Chen, Y.; Cao, J. Effects of Street-Level Visual Perception on Different Types of Leisure Activity Intensity in Waterfront Spaces: A Case Study of the Core Section of the Pearl River, Guangzhou. Land 2026, 15, 849. https://doi.org/10.3390/land15050849

AMA Style

Pan Y, Chen Y, Cao J. Effects of Street-Level Visual Perception on Different Types of Leisure Activity Intensity in Waterfront Spaces: A Case Study of the Core Section of the Pearl River, Guangzhou. Land. 2026; 15(5):849. https://doi.org/10.3390/land15050849

Chicago/Turabian Style

Pan, Yudan, Yang Chen, and Jin Cao. 2026. "Effects of Street-Level Visual Perception on Different Types of Leisure Activity Intensity in Waterfront Spaces: A Case Study of the Core Section of the Pearl River, Guangzhou" Land 15, no. 5: 849. https://doi.org/10.3390/land15050849

APA Style

Pan, Y., Chen, Y., & Cao, J. (2026). Effects of Street-Level Visual Perception on Different Types of Leisure Activity Intensity in Waterfront Spaces: A Case Study of the Core Section of the Pearl River, Guangzhou. Land, 15(5), 849. https://doi.org/10.3390/land15050849

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Effects of Street-Level Visual Perception on Different Types of Leisure Activity Intensity in Waterfront Spaces: A Case Study of the Core Section of the Pearl River, Guangzhou

Abstract

1. Introduction

1.1. Research Background

1.2. Research Gaps

1.3. Research Questions and Innovation

2. Study Area and Data

2.1. Study Area Overview

2.2. Division of Behavioral Segments and Behavioral Observation Data

2.2.1. Division of Behavioral Segments

2.2.2. Collection of Behavioral Observation Data

2.3. Data of Street-Level Images and Built-Environment

2.3.1. Collection of Street-Level Images

2.3.2. The Built Environment and Spatial Data

3. Methods

3.1. Behavioral Observations and Construction of Activity Intensity Indicators (Dependent Variable)

3.2. Street-Level Image Processing and Extraction of Perceived Environmental Indicators (Independent Variables)

3.3. Spatial Matching and Construction of Analysis Units

3.4. Statistical Analysis Methods

3.5. Summary of the Methods and Workflow

3.6. Reliability and Robustness Analysis

3.6.1. Bootstrap Coefficient Stability Test

3.6.2. HC3 Robust Standard Error Test

3.6.3. Spatial Scale Sensitivity Analysis (Buffer)

3.6.4. Spatial Autocorrelation Test (Moran’s I)

4. Results

4.1. Differences in Activity Intensity Across Time Periods

4.2. Differences in Activity Intensity Between Weekdays and Weekends

4.3. Overall Intensity Characteristics of Different Types of Leisure Activities

4.4. Mechanisms Through Which the Built Environment Influences the Intensity of Different Types of Leisure Activities

4.4.1. Correlation Analysis

4.4.2. Multiple Regression Results

4.4.3. Differences in Environmental Responses Across Activity Types

5. Discussion

5.1. Differential Response Mechanisms of Different Activity Types to the Perceived Environment

5.2. The Role of the Built Environment and Visual Diversity

5.3. Planning and Design Implications

5.4. Limitations and Future Research

6. Conclusions

6.1. Main Findings

6.2. Research Contributions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI