2. Study Area and Data
2.1. Study Area Overview
Guangzhou is located in the Pearl River Delta in southern China and is one of the key cities in the Guangdong–Hong Kong–Macao Greater Bay Area. The Pearl River runs through the city center and forms an important natural landscape axis as well as a structural backbone of Guangzhou’s urban development. As a carrier of the city’s historical evolution, the Pearl River waterfront has long supported multiple functions, including shipping, transportation, commerce, trade, and public urban life. Over time, it has evolved into an important public waterfront space that integrates ecological landscapes, public activities, and urban image-making.
In recent years, China’s urban development has gradually shifted from ‘incremental expansion’ to ‘stock-based renewal’. As an important stock spatial resource of the city, waterfront spaces have become a key direction of urban renewal in terms of quality enhancement and vitality revitalization. Against this backdrop, Guangzhou has continuously promoted the integrated management and regeneration of public spaces along the Pearl River waterfront. In 2017, the Guangzhou Municipal People’s Government proposed the construction of the ‘Pearl River Waterfront High-Quality Development Belt’, emphasizing the integration of waterfront spaces, enhancement of public spaces, and optimization of industrial functions, with the aim of promoting the formation of an internationally influential urban waterfront development axis along both banks of the Pearl River. Subsequently, the Guangzhou Territorial Spatial Master Plan (2021–2035) proposed creating a continuous and accessible waterfront public space by improving the greenway network, enhancing public space quality, and strengthening urban landscape corridors, with the broader goal of developing a world-class waterfront vitality zone.
Driven by the above planning and renewal policies, a continuous waterfront public space system has gradually taken shape along the Pearl River, making the riverfront one of Guangzhou’s most active public life corridors. This study selects the core waterfront space along the north bank of the Pearl River as the research area, as shown in
Figure 1. Spanning several central districts, including Liwan, Yuexiu, Tianhe, and Haizhu, this area is one of the waterfront spaces in Guangzhou with the highest concentration of urban functions and the most active public activities. The study area follows an east–west linear pattern along the river, with a total length of approximately 13 km, extending from the vicinity of Shamian Island in Liwan District in the west to the vicinity of Pazhou Bridge in the east. Based on spatial landscape characteristics and differences in urban functions, the study area can be broadly divided into three typical sections:
- (1)
The Traditional Landscape Section (Western Section): This section extends approximately 3.5 km from Shamian Island in Liwan District to Haizhu Square in Yuexiu District. It retains a concentration of historic and cultural landscapes, as well as traditional waterfront areas. The spatial scale is relatively compact, with pedestrian activities and viewing-related stays frequently observed.
- (2)
Modern Urban Section (Middle Section): This section extends approximately 4.5 km from the eastern side of Haizhu Square to Guangzhou Bridge. Adjacent to the Zhujiang New Town central business district, it is characterized by modern urban landscapes, dense built interfaces, and a high concentration of commercial, tourism, and leisure activities.
- (3)
Daily Leisure Section (Eastern Section): This section extends approximately 5 km from the eastern side of Guangzhou Bridge to Pazhou Bridge. It is dominated by riverside greenways and open public spaces. With a large number of residential communities and public service facilities nearby, this section serves as an important setting for residents’ daily recreation and slow-paced activities.
From a spatial perspective, the northern bank of the core Pearl River waterfront contains a diverse range of waterfront public spaces. Based on their spatial structure and functional characteristics, these spaces can be classified into six categories.
- (1)
Waterfront promenade space. This type of space is characterized by continuous riverside promenades or greenway systems. Usually equipped with stone benches, tree pits, and guardrails, it forms a continuous corridor for slow-mobility activities. It is the most widely distributed space type within the study area and is mainly used for daily leisure activities such as walking, jogging, and sightseeing.
- (2)
Open lawn space. This type of space is characterized by large lawns as the dominant landscape element and is usually connected to the waterfront promenade system. With a high degree of openness, it provides places for residents to stay, rest, and enjoy family activities. Open lawn spaces are highly flexible and can support various leisure activities, such as picnics and informal gatherings.
- (3)
Waterfront plaza space. This type of space is characterized by hard paving and is typically integrated with waterfront viewing platforms or urban public nodes. It provides open and aggregate spatial conditions, allowing it to accommodate larger public events. Such spaces often serve as important activity nodes and pedestrian gathering areas in waterfront districts.
- (4)
Shaded leisure space. This type of space is characterized by tree shade as the main environmental feature and is usually formed by combining shaded walkways, small resting facilities, and waterfront viewing nodes to create a comfortable space for staying. With favorable shade and environmental comfort, this space type is well-suited to passive leisure activities and short-term stays.
- (5)
Recreational sports space. This type of space is usually equipped with fitness facilities, sports grounds, or exercise equipment, such as public fitness zones and small sports courts. It provides residents with places for daily exercise and physical activity and serves as an important vitality node within the waterfront public space system.
- (6)
Under-bridge composite space. This type of space is located beneath cross-river bridges and is characterized by the semi-open environment created by the bridge structure. It is usually integrated with promenades, resting facilities, or sports amenities to create an area for various activities. Thanks to its distinctive spatial structure and sheltered conditions, the under-bridge space has become a recognizable venue for activities within the waterfront public space system.
Overall, the various types of waterfront space mentioned above differ significantly in terms of spatial scale, landscape conditions, built interfaces, and facility configurations. Collectively, they form a diverse pattern of spatial environments along the core waterfront segment of the Pearl River. Within this pattern, the study area demonstrates a high density of public activities, as well as clear variations in activity types and intensity levels. Different waterfront sections also vary in greenery, built-interface proportion, circulation structure, and visual complexity, creating a rich spatial context for examining the relationship between the built environment and leisure activities.
Therefore, the core waterfront area of the Pearl River is an important site for urban renewal and waterfront management in Guangzhou, as well as being one of the city’s major public spaces for recreational activities. Selecting this area as the study site allows this research to capture the spatial characteristics of typical central waterfront public spaces in Guangzhou and to examine how built-environment and visual-perception factors are associated with different types of leisure activity.
2.2. Division of Behavioral Segments and Behavioral Observation Data
2.2.1. Division of Behavioral Segments
Waterfront public spaces often exhibit distinct linear spatial characteristics. Compared to traditional methods that use administrative units or regular grid-based analysis units, behavior units that are defined based on the structure of pedestrian space better reflect the true relationship between pedestrian activities and environmental perception within waterfront public spaces. This study, therefore, uses the north bank waterside promenade of the Pearl River as a basis, dividing the study area along the river into several continuous behavior survey units (hereafter referred to as ‘behavior segments’).
First, field surveys and preliminary activity observations were used to identify different spatial types within the study area, such as waterfront promenades, plaza nodes, and under-bridge spaces. Representative spatial segments were selected as behavior survey units. Second, the selected behavior survey units were spatially verified by combining crowd activity heat maps for weekdays and weekends (using Baidu heat map data from 23 and 28 September 2025, which corresponded to the dates of behavioral observations) (
Figure 2). This procedure helped ensure that the chosen samples covered typical spatial characteristics and did not simply represent inactive spaces with limited public activity. Using this approach, three units were selected from the western, middle, and eastern sections, resulting in nine representative behavior survey units (A1–A3, B1–B3, and C1–C3) for subsequent behavioral observation and environmental analysis.
In terms of scale, each behavioral segment was defined with a length of approximately 200–300 m. This range was selected with reference to analytical units commonly used in studies of pedestrian spatial perception and street environments. From the perspective of environmental cognition, pedestrians form an overall impression of surrounding elements, such as vegetation, built interfaces, facilities, and crowd activities, while moving through a continuous walking environment. Previous studies suggest that a spatial range of approximately 200–300 m is suitable for capturing pedestrian-scale environmental perception and has been used in street-level analysis [
39,
40]. At this scale, local variations in environmental elements can be identified while reducing the excessive heterogeneity that may arise from larger spatial units.
Related empirical studies on street-level visual environments and pedestrian behavior have also adopted comparable spatial segmentation scales. For instance, Huang et al. (2024) examined street environments and walking preferences in Tokyo using street segments of approximately 250 m, which aligned with street-view image sampling and pedestrian perception ranges [
39]. Similarly, Wang et al. (2023) analyzed the effects of street landscape features on perceived safety and aesthetic evaluation using street segments of 150–300 m [
40]. Drawing on these studies, this research defined behavioral segments of approximately 200–300 m to capture local variations in the waterfront environment while remaining consistent with commonly used pedestrian-scale analytical units.
This segmentation approach helps capture variations in environmental conditions along the waterfront and provides a consistent analytical unit for linking behavioral observation data with street-level visual environmental indicators.
2.2.2. Collection of Behavioral Observation Data
Behavioral data were collected through on-site observations. A structured behavioral annotation approach was used to record leisure activities within each behavioral segment. Observations were conducted on one weekday and one weekend day (23 and 28 September 2025), and each observation day was divided into three time periods: morning (08:00–10:00), afternoon (15:00–17:00), and evening (19:00–21:00).
Within each time period, three independent instantaneous scans were conducted for each behavioral segment. During each scan, the number of people engaged in different activity states, such as walking, staying, socializing, taking photos, and running, was recorded. The scans were arranged at intervals within each observation period to capture short-term changes while reducing the possibility of double-counting. The three records collected within the same period were then averaged to construct the activity intensity indicator for the corresponding segment and time period. During fieldwork, a standardized recording form was used to define activity categories, counting rules, and recording items, ensuring comparability across time periods, survey units, and observers. Before the formal observations, observers were trained, and pilot observations were conducted to improve recording consistency. After data collection, the records were cross-checked to identify potential inconsistencies and improve data reliability.
Based on their behavioral characteristics, the observed leisure activities were classified into three categories: passive, active, and social activities. The number of people in each category was recorded separately during each scan. At the same time, the total number of people in each behavioral segment was also recorded to provide an overall activity scale reference indicator.
2.3. Data of Street-Level Images and Built-Environment
2.3.1. Collection of Street-Level Images
In order to characterize the actual visual environment of the waterfront space at the pedestrian scale, this study collected environmental data through the acquisition of street-level images on site. On the morning of 23 September 2025, data collectors took continuous photographs from west to east along the waterfront promenade in the study area. At each capture point, four cameras were used to obtain images of 0°, 90°, 180°, and 270° directions. Each image was taken at an approximate pedestrian eye height of 1.6 m, with the camera orientations kept roughly parallel to the riverside path in order to approximate a pedestrian’s panoramic perception of the environment. Images were captured during a period of good weather and stable lighting, and all data were collected in a single session to minimize the environmental impact on image quality.
Street-level images were collected at 100 m intervals along the waterfront promenade. This sampling interval was selected to balance pedestrian-scale visual perception, spatial variation in the waterfront environment, and the feasibility of field data collection. Previous studies on street-level perception and pedestrian environments have commonly adopted similar sampling intervals, suggesting that this scale can capture meaningful changes in street interfaces while avoiding excessive data redundancy. Therefore, the 100 m interval was considered appropriate for representing the visual environment encountered by pedestrians along the linear waterfront space.
From the perspective of human visual perception, pedestrians perceive continuous urban environments through movement and repeated visual scanning. Previous research suggests that an approximate range of 80–120 m may allow pedestrians to integrate visual information from surrounding interfaces during walking [
41].
At the level of street morphology, a 100 m interval has also been used to capture changes in street-interface characteristics. For example, previous research has shown that physical street features, such as building setbacks, greenery continuity, and interface permeability, may vary within a range of approximately 50–150 m [
42]. A 100 m sampling interval can therefore provide a reasonable balance between recording spatial variation and avoiding unnecessary sampling redundancy. This conclusion has been corroborated in multiple urban case studies, including assessments of spatial quality in historic districts and analyses of the visual environment of commercial pedestrian streets [
43].
Furthermore, multiple studies on street vitality and walking preferences provide additional support for the 100 m interval. In the Journal of Urban Planning and Development, Wang et al. compared the effects of different sampling densities on street perception scores, finding that a 100 m interval preserves statistical significance while minimizing computational burden [
40]. Similarly, in the study of walking preferences in Tokyo, Huang et al. used a 100 m grid for street-level sampling and successfully revealed non-linear relationships between street-level elements and walking behavior [
39].
Taken together, these studies support the use of a 100 m interval as a practical and theoretically informed sampling strategy. For linear public spaces such as waterfront promenades, this interval helps balance spatial representativeness, data manageability, and the need to approximate pedestrians’ sequential visual experience along the route.
Following this method, we obtained 109 street-level sampling points, which covered the entire study area. Images were captured in four directions at each point, collecting a total of 436 street-level images for subsequent semantic segmentation and calculation of visual environment indicators.
2.3.2. The Built Environment and Spatial Data
The collected street-level images were first preprocessed. EXIF information was extracted from the photographs in Python (Version 3.9; Python Software Foundation, Wilmington, DE, USA) to obtain shooting locations and timestamps. The latitude and longitude information of each street-level image was then compiled into a data table and imported into ArcGIS Pro (Version 3.5.2; Esri, Redlands, CA, USA) to generate georeferenced sampling points.
In addition to the street-level images, we collected spatial base data for the study area, including GIS layers for the waterfront promenade, the road network, and spatial boundaries. These datasets were managed in ArcGIS Pro under a unified coordinate system, forming the basis for subsequent spatial analyses and visualization of results.
3. Methods
3.1. Behavioral Observations and Construction of Activity Intensity Indicators (Dependent Variable)
Based on the behavioral observation data described in
Section 2.2, this study constructed activity intensity indicators to represent the levels of different types of leisure activity. An instantaneous scan sampling method was used to record the number of people engaged in different activity states within each behavioral segment under varying temporal conditions.
Based on their behavioral characteristics, the observed leisure activities were classified into three categories:
- (1)
Passive activities, which are characterized mainly by staying in one place, such as sitting, leaning, sightseeing, and resting.
- (2)
Active activities, which are primarily characterized by bodily movement and displacement, such as walking, running, and fitness activities.
- (3)
Social activities, which are primarily characterized by interpersonal interaction, such as talking, gathering, and group interaction.
For each behavioral segment, the average number of participants in each activity type was calculated across the three scan records for a given day type (weekday or weekend) and time period (morning, afternoon, or evening). This average then served as the activity intensity index (passive intensity, active intensity, and social intensity) for the corresponding activity type during that period for the behavior segment. This index represents the average activity level per scan and was used as a continuous variable in subsequent statistical analyses rather than as an exact estimate of the total number of participants.
3.2. Street-Level Image Processing and Extraction of Perceived Environmental Indicators (Independent Variables)
The street-level image data were processed in Python. First, the collected street-level images were converted to a unified format and standardized in terms of file naming. EXIF information was then extracted from each image to obtain shooting time and geographic coordinates. This established a correspondence between each image and its spatial location in order to provide a foundation for subsequent spatial matching and indicator aggregation.
In addition, a semantic segmentation approach based on deep learning was employed to perform pixel-level semantic parsing of the street-level images. The Mask2Former semantic segmentation model, pretrained on the ADE20K dataset, was used to process the images in this study. This model was applied to identify 150 semantic categories, such as vegetation, sky, buildings, roads, and other environmental features (
Figure 3).
Based on the semantic segmentation results, Python scripts were written to count the number of pixels in each semantic category in every image, and to compute their pixel proportions.
In the above equation:
For the i street-level image (or i sample), c represents the set of semantic segmentation categories.
represent the pixel proportion of semantic category c in image i.
represent the pixel count of semantic category c in image i.
represent the total number of pixels in image i (or the number of pixels considered in the calculation).
To construct environmental indicators with perceptual relevance, fine-grained ADE20K semantic categories were merged into broader environmental categories, including vegetation, sky, built environment, road space, and natural elements. The specific aggregation rules are shown in
Table 1.
On this basis, a multidimensional street-level visual environment indicator system was constructed, as detailed in
Table 2.
These indicators jointly form a multidimensional representation system of the waterfront street-level visual environment [
39,
41,
43].
In this study, “visual perception” does not refer to subjective psychological evaluation collected through questionnaires, but to the environmental information that can be visually accessed from the pedestrian field of view. Therefore, the semantic-segmentation indicators are used as objective proxies for pedestrian-level visual experience rather than as direct measures of subjective perception. From this perspective, GVI reflects the degree to which greenery enters the pedestrian visual field and thus approximates perceived naturalness; SVI reflects the exposure of sky in the visual field and is associated with perceived openness; Built captures the visible presence of buildings, walls, bridges, and other hard interfaces, thereby indicating perceived enclosure, interface intensity, or built-up pressure; Road represents the visible pedestrian and circulation surface, which relates to perceived movement space and spatial accessibility; and Entropy reflects the diversity and balance of visible semantic elements, corresponding to the richness and complexity of visual information encountered by pedestrians. Together, these indicators translate the pixel-level outputs of semantic segmentation into interpretable dimensions of pedestrian-scale spatial experience.
The selection of GVI, SVI, Built, Road, and Entropy was therefore theory-driven rather than determined solely by statistical performance. These indicators correspond to five key dimensions of pedestrian-scale visual experience that have been widely discussed in environmental behavior, urban design, and street-interface studies: naturalness, openness, built-interface intensity, circulation support, and visual complexity. GVI was selected to represent the natural and restorative dimension of the visual environment; SVI was used to capture openness and exposure; Built was included to reflect the role of hard spatial interfaces, enclosure, and public–private edge conditions; Road was selected because visible pedestrian and circulation space is closely related to movement support and spatial accessibility; and Entropy was included to capture the compositional richness and heterogeneity of the overall visual scene. These dimensions are particularly relevant to waterfront public spaces, where vegetation, open water, promenades, built frontages, and public activity spaces jointly shape pedestrian perception and leisure behavior. Thus, the selected indicators provide a theoretically grounded framework for linking street-level visual environments with passive, active, and social activities.
Given the study’s aim of examining the relationship between the street-level visual environment and leisure activity intensity, and considering both theoretical relevance and the exploratory correlation results, GVI, SVI, Road, Built, and Entropy were selected as explanatory variables for the subsequent regression models.
3.3. Spatial Matching and Construction of Analysis Units
In order to establish spatial associations between behavioral activities and the street-level visual environment, this study employed behavioral segments as the primary analysis unit. In ArcGIS Pro, street-level sampling points and their corresponding visual indicators were managed together with behavioral segment spatial data.
As street-level images reflect visual perception across a certain spatial extent, while behavioral observation data represent aggregated activity characteristics at the segment scale, this study employed a buffer approach to reconcile these two spatial scales. For each behavioral segment, a fixed-distance buffer of 50 m was created around the segment. From a behavioral perspective, this buffer represents the immediate decision-making environment in which pedestrians can directly observe surrounding visual cues and make on-site behavioral choices, such as whether to stay, continue walking, approach a node, or engage in social interaction. In waterfront public spaces, near-field cues such as vegetation, seating facilities, pedestrian surfaces, building interfaces, and open spaces are more likely to influence immediate activity responses than distant environmental features. The visual indicators of the street-level sampling points within each buffer were then spatially overlaid and aggregated.
We derived street-level visual environment characteristics at the behavioral segment scale by computing the mean of street-level indicators within each buffer, thereby providing a consistent spatial analysis unit for subsequent statistical analysis.
3.4. Statistical Analysis Methods
After constructing the dependent and independent variables, a series of statistical analyses was conducted to examine the relationship between the street-level visual environment and various types of leisure activity.
First, descriptive statistical analyses were performed to characterize the distribution of activity intensity across activity types, time periods, and day types, thereby identifying basic temporal differences and activity patterns.
Second, a Spearman correlation analysis was conducted to examine the bivariate relationships between indicators of the street-level visual environment and the intensity of different activity types. This provided an initial identification of potential influencing factors.
Building on this, separate multiple regression models were constructed using the logarithm-transformed intensities of passive, active, and social activities as dependent variables. Street-level visual environment indicators were included as the primary independent variables, while day type and time period were included as control variables. Modelling each activity type separately allowed us to compare the direction and relative magnitude of the effects of different street-level visual elements across various leisure activities.
3.5. Summary of the Methods and Workflow
This study examined how the street-level visual environment of waterfront spaces affects different types of leisure activities through the following workflow: behavioral observation, street-level image collection, semantic segmentation, spatial matching, and statistical analysis (
Figure 4). This workflow provides a consistent procedure for linking pedestrian-level visual indicators with observed activity intensity, thereby supporting an empirical analysis of activity-specific environmental responses in waterfront spaces.
3.6. Reliability and Robustness Analysis
To assess the reliability and robustness of the findings, the statistical models were further examined in terms of coefficient stability, sensitivity to spatial scale, and spatial autocorrelation.
3.6.1. Bootstrap Coefficient Stability Test
To further examine whether the multiple regression results were sensitive to sampling variability, a bootstrap resampling procedure was used to assess the stability of the estimated coefficients. Compared with conventional parametric tests, the bootstrap method does not rely on strong distributional assumptions and is therefore particularly suitable for evaluating parameter stability under relatively limited sample sizes. It thus serves as a supplementary robustness check for the regression results reported in this study. For each resampled dataset, the regression coefficients of all explanatory variables were recorded, and their bootstrap mean, standard deviation, 95% confidence interval, and the proportion of positive or negative coefficient signs were further calculated to evaluate the stability of both coefficient magnitude and direction. In this study, whether the 95% bootstrap confidence interval crossed zero was used as the primary criterion for coefficient stability, while the proportion of positive or negative signs was used as a supplementary indicator to assess the consistency of the estimated direction of effects.
The test results are presented in
Table 3. Overall, the coefficient stability varied across models for different types of leisure activities. In the passive activity model, only visual diversity (Entropy) exhibited a relatively stable positive effect, with a Bootstrap mean of 5.034, a 95% confidence interval of [1.204, 9.042], and a positive coefficient in 99.55% of the resampling iterations. This indicates that the promoting effect of visual diversity on passive activity intensity is robust. By contrast, the confidence intervals for the Green View Index, Sky View Index, built environment proportion, and road space proportion all crossed zero, suggesting that the estimated directions of their effects were more sensitive to sampling fluctuations.
For the active activity model, the 95% bootstrap confidence intervals of all explanatory variables crossed zero, indicating that the statistical associations between active activities and the street-level visual environment were generally weak and that the model estimates were relatively sensitive to sampling variability. This finding is consistent with the relatively low explanatory power of the active activity model in the regression analysis, suggesting that active activities are less responsive to the micro-scale visual environment.
For the social activity model, the core explanatory variables exhibited strong coefficient stability. The Bootstrap means of the Green View Index, Sky View Index, and built environment proportion were −33.236, −33.970, and −41.034, respectively; their 95% confidence intervals did not cross zero, and the coefficients remained negative in more than 99% of the resampling iterations. By contrast, visual diversity had a Bootstrap mean of 11.969, a 95% confidence interval of [6.526, 17.185], and remained positive in 99.90% of the resampling iterations. These results suggest that the core findings of the social activity model—the negative effects of GVI, SVI, and built environment proportion, and the positive effect of entropy—are robust. In comparison, the direction of the effect of road space proportion in the social activity model was less stable.
Overall, the bootstrap coefficient stability test further identifies the most robust environmental predictors across the three activity models. In the passive activity model, Entropy was the only variable that maintained a stable positive effect, indicating that visual diversity is the most reliable predictor of passive activity intensity. In the active activity model, none of the explanatory variables showed stable effects, as all 95% bootstrap confidence intervals crossed zero, confirming the relatively weak and unstable association between active activities and the street-level visual environment. In the social activity model, GVI, SVI, and Built remained consistently negative, while Entropy remained consistently positive across repeated resampling. These variables, therefore, represent the most stable and critical environmental predictors in this study. Taken together, the bootstrap results corroborate the primary findings that different types of leisure activities respond differently to the street-level visual environment, with the social activity model exhibiting the highest coefficient stability.
3.6.2. HC3 Robust Standard Error Test
To further examine whether the regression results were affected by heteroskedasticity and potential bias in standard error estimation under a relatively small sample size, this study re-estimated the models using HC3 (heteroskedasticity-consistent covariance estimator, HC3) robust standard errors based on the benchmark ordinary least squares (OLS) regressions. Compared with conventional OLS standard errors, the HC3 method provides more conservative and robust significance estimates when heteroskedasticity is present or the sample size is relatively limited. It is therefore widely used in robustness analysis of regression models.
Specifically, while keeping the dependent variables, explanatory variables, and model specifications unchanged, HC3 robust standard errors were recalculated for the passive, active, and social activity models and then compared with the original OLS results. The main purpose was to assess whether the coefficient signs and significance levels of the explanatory variables changed substantially. If the coefficient directions and significance conclusions of the core variables remained generally consistent after applying HC3 robust standard errors, the main findings of this study could be considered robust to heteroskedasticity and potential standard error bias under a relatively small sample size.
The test results are presented in
Table 4. After applying HC3 robust standard errors, the signs of the core variable coefficients remained unchanged across all models, and the main significant conclusions were consistent with those of the benchmark regressions. Specifically, in the passive activity model, only visual diversity (Entropy) remained significantly positively associated with activity intensity, whereas the Green View Index (GVI), Sky View Index (SVI), built environment proportion (Built), and road space proportion (Road) did not reach statistical significance. In the active activity model, none of the explanatory variables remained significant after HC3 correction, indicating that the statistical association between active activities and the street-level visual environment was generally weak. In the social activity model, the Green View Index, Sky View Index, and built environment proportion remained significantly negatively associated with activity intensity, while visual diversity continued to show a significant positive effect; road space proportion still did not pass the significance test. These results indicate that even after using more conservative robust standard errors, the main conclusion of this study—that different types of leisure activities respond differently to the street-level visual environment—remains largely unchanged. The key explanatory variables in the social activity model continue to show relatively high stability.
Overall, the robust standard error test using HC3 further supports the reliability of the benchmark regression results. This indicates that the main findings of the study are not driven by heteroskedasticity or underestimation of standard errors due to a small sample size. Therefore, the results show satisfactory statistical robustness.
3.6.3. Spatial Scale Sensitivity Analysis (Buffer)
To assess whether the relationship between street-level visual environmental indicators and leisure activity intensity was influenced by the choice of spatial analysis scale, a spatial scale sensitivity analysis was conducted. For each behavioral segment, fixed-distance buffers of 25 m, 50 m, and 100 m were created. Visual environmental indicators from street-view sampling points within each buffer were then spatially overlaid, statistically aggregated, and used to estimate multiple regression models under each spatial scale.
The model fitting results under different spatial scales are presented in
Table 5. Overall, the changes in the coefficient of determination (
R2) and adjusted
R2 across the different scales were limited, indicating that the explanatory power of the street-level visual environment for leisure activity intensity remained relatively stable across spatial scales. This suggests that the study’s conclusions do not depend on a single scale specification and therefore demonstrate good scale robustness.
Further comparison indicates that the 50 m buffer scale provided a relatively favorable and balanced model performance across the three activity types. As shown in
Table 5, the models at this scale maintained relatively high adjusted
R2 values, while the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) remained at comparatively low levels, suggesting a reasonable balance between explanatory power and model complexity. Although some model-fit indicators varied slightly across the 25 m, 50 m, and 100 m scales, the 50 m buffer offered a more appropriate compromise by integrating statistical performance with behavioral interpretability. Compared with the 25 m scale, the 50 m buffer can capture a more complete range of surrounding environmental information that may be perceived by individuals during on-site activities. Compared with the 100 m scale, it reduces the inclusion of distant environmental information that may exceed the range of immediate behavioral perception and weaken the correspondence between local environmental characteristics and actual activity responses.
Figure 5 shows the comparison of regression coefficients across different spatial scales. As can be seen, the effects of the street-level visual indicators largely remained consistent across the three scales, with no substantial sign reversals. Among these indicators, the Green View Index (GVI) and visual diversity (Entropy) consistently exhibited stable positive effects in most models, while built environment proportion (Built) showed a consistent negative effect. These results imply that the mechanisms by which the street-level visual environment influences leisure activities are consistent across different spatial scales.
From the perspective of behavioral perception, leisure activities in waterfront spaces are primarily generated through walking, staying, and social interaction, and the associated process of environmental perception typically occurs at the human scale. Previous studies have shown that pedestrians’ perception of urban space during movement is mainly derived from near-range visual information, and that the distance within which individuals can effectively recognize spatial form and social activities is usually concentrated within a range of several tens of meters [
44]. According to isovist theory, environmental cognition depends on the extent of visible space that can be directly perceived from a given observation point, whereas environmental information beyond the immediate visual field plays a limited role in shaping behavioral responses [
45]. On this basis, the 50 m buffer can be interpreted as a near-field behavioral response zone rather than merely a statistical aggregation distance. For passive activities, nearby seating facilities, shade, landscape elements, and viewing conditions may directly affect whether individuals stop, stay, rest, or watch the waterfront landscape. For active activities, visible pedestrian surfaces, path continuity, and the clarity of movement space within the near field may influence walking comfort, route choice, and the willingness to continue moving. For social activities, nearby interfaces, semi-enclosed spaces, public facilities, and activity nodes may affect whether people gather, communicate, or engage in group interaction. Therefore, the 50 m buffer captures the spatial range within which visual cues are most likely to be perceived, interpreted, and translated into immediate behavioral responses. Compared with the 25 m buffer, it includes a more complete set of environmental cues relevant to behavioral decision-making; compared with the 100 m buffer, it reduces the inclusion of distant visual information that may be visible in the broader spatial context but is less directly involved in on-site activity choices. This explains why the 50 m scale is not only statistically appropriate, but also behaviorally meaningful for matching street-level visual indicators with observed leisure activity intensity.
Overall, variations in spatial scale did not alter the study’s main conclusions. However, the 50 m scale demonstrated greater validity in statistical performance and behavioral–perceptual logic. Therefore, all subsequent analyses and discussions are based on the 50 m buffer scale.
3.6.4. Spatial Autocorrelation Test (Moran’s I)
To further examine whether the multiple regression models were affected by spatial dependence, a spatial autocorrelation test was conducted on the model residuals. Specifically, the global Moran’s I statistic was used to assess spatial autocorrelation in the residuals from regression models for passive, active, and social activity intensities.
Considering that the same observation point was repeatedly recorded under different day types (weekday and weekend) and time periods (morning, afternoon, and evening), multiple observational samples corresponded to a single spatial location. To ensure the uniqueness of spatial units in constructing the spatial weights matrix, this study used spatial location as the basic unit of analysis and aggregated the model residuals for each observation point by taking their mean, thereby obtaining a single residual value for each unique spatial coordinate.
On this basis, a spatial weights matrix was constructed using the geographic coordinates of the observation points. Spatial adjacency was defined using the K-nearest neighbors (KNN) method, and the weights were row-standardized. Given that the study area exhibits a linear, belt-like spatial pattern along the waterfront of the Pearl River and that the number of spatial samples is relatively limited, Moran’s I was further calculated repeatedly under different neighborhood scales (
K = 2–7) to test the scale sensitivity of spatial dependence. The results for
K = 4 are presented in
Table 6, and the complete results across different neighborhood scales are reported in
Table A1.
Only under a few small neighborhood specifications (e.g., K = 2) did the residuals of some activity models exhibit weakly significant spatial autocorrelation. However, under most neighborhood settings (K ≥ 3), Moran’s I values of the residuals for all three activity models failed to reach statistical significance (p > 0.05). Overall, the spatial autocorrelation lacked scale stability and did not exhibit persistent spatial clustering.
These results indicate that the constructed multiple regression models have adequately controlled for the effects of spatial heterogeneity in the study area and that the model residuals do not exhibit systematic spatial dependence. Therefore, the conventional ordinary least squares (OLS) regression model is appropriate and statistically reliable in the present study, and there is no need to further introduce spatial models for correction.
Taken together, the robustness checks indicate that the model specification has acceptable statistical stability and limited residual spatial dependence.
4. Results
4.1. Differences in Activity Intensity Across Time Periods
Figure 6 illustrates the intensity distributions of the three types of leisure activities—passive, active, and social activities—across three time periods: morning, afternoon, and evening. Each time period included 18 observational samples (
n = 18).
Generally, activity intensity varied markedly across time periods. For passive activities, both the median and mean values showed a gradual upward trend from morning to evening. The evening period exhibited a higher upper quartile and greater dispersion, indicating that passive leisure activities, such as sitting, resting, and viewing, were more strongly concentrated at night.
For active activities, the distribution remained relatively stable in the morning and afternoon. However, median and mean values increased moderately in the evening and showed a certain degree of dispersion, suggesting that although the overall differences were limited, the intensity of active leisure activities was slightly higher during the evening.
Social activities exhibited the most pronounced temporal variation. Their median intensity remained relatively low in the morning and afternoon, whereas both the median and mean increased substantially in the evening, accompanied by several high-intensity observations. The larger upper quartile and longer upper whisker during the evening further indicate that social interactions were more frequent and intensive during this period.
Overall, the intensities of all three types of leisure activities were generally higher during the evening, with temporal differences being particularly pronounced for social activities.
4.2. Differences in Activity Intensity Between Weekdays and Weekends
Figure 7 compares the intensity differences in the three types of leisure activities between weekdays and weekends. Each group contains 27 observational samples (
n = 27).
Across all activity types, leisure activity intensity was generally higher on weekends than on weekdays. For passive activities, the weekend distribution showed a higher median and a wider interquartile range, indicating a greater concentration of staying-oriented leisure behaviors on weekends.
For active activities, the distributions differed noticeably between weekdays and weekends. On weekdays, activity intensity was relatively stable and concentrated. However, on weekends, the observations were more widely dispersed, indicating greater variability in participation in this type of activity.
Social activities also showed differences between weekdays and weekends. Unlike passive activities, the increase in median intensity was relatively limited, but the degree of dispersion increased substantially. On weekends, the interquartile range of social activities became wider, and several high-intensity outliers emerged, leading to a clear increase in the mean. This suggests that social activities on weekends did not intensify uniformly across all spatial segments; rather, highly concentrated social interactions tended to occur at particular spatial nodes or during specific time periods.
By contrast, passive activities showed a more systematic increase in intensity on weekends, as reflected in both a higher median and a wider interquartile range, indicating that staying-oriented leisure behaviors were more widespread on weekends.
4.3. Overall Intensity Characteristics of Different Types of Leisure Activities
Figure 8 presents the overall intensity distributions of passive, active, and social activities. Each group contained 54 samples. Among the three activity types, active activities had the highest median intensity, followed by passive activities, whereas social activities showed the lowest median intensity. However, social activities exhibited the greatest dispersion, with a larger number of extremely high-intensity observations.
The distribution of passive activities was relatively concentrated, with a comparatively broad interquartile range around the median, suggesting that participation intensity was generally stable but still subject to some variation. Active activities, by contrast, were more tightly concentrated around the median, indicating a relatively stable intensity pattern.
Social activities exhibited the greatest dispersion, and several high-intensity outliers substantially increased the mean. This suggests that although social activities did not remain highly intensive across all observations, concentrated episodes of social interaction occurred at specific locations or during particular periods.
Overall, the three activity types exhibited distinct intensity structures: active activities were characterized by relatively stable and moderate intensity, passive activities showed a certain degree of temporal dependence, and social activities were marked by intermittent episodes of high-intensity behavior.
4.4. Mechanisms Through Which the Built Environment Influences the Intensity of Different Types of Leisure Activities
4.4.1. Correlation Analysis
To preliminarily examine the relationships between built environment factors and the intensity of different types of leisure activities, this study conducted Spearman correlation analysis between the three activity intensity indicators (Passive, Active, and Social) and the street-level visual environmental indicators (GVI, SVI, Road, Built, and Entropy). Spearman correlation analysis is well-suited to capturing monotonic relationships between variables and is less sensitive to the influence of outliers.
Figure 9 presents the correlation matrix between the intensities of the three types of leisure activities and the built environment variables (
Figure A1). The results clearly show differentiated correlation patterns between the built environment factors and the three activity types.
First, regarding natural visual environmental indicators, the Green View Index (GVI) was significantly and positively correlated with both active and social activities, with the strongest bivariate association observed for social activities (ρ = 0.42, p < 0.01). This suggests that areas with greater visible greenness tended to show higher levels of social activity in the correlation analysis. However, this result should be interpreted as an initial bivariate relationship rather than an independent effect after controlling for other environmental variables. GVI also showed a moderate positive correlation with active activities (ρ = 0.29, p < 0.05), suggesting that greater visual greenness may be associated with movement-related activities such as walking, jogging, and cycling. In contrast, the correlation between GVI and passive activities was weaker (ρ = 0.10), indicating that staying-oriented behaviors showed a weaker bivariate association with visual greenness.
Second, the Sky View Index (SVI) was significantly negatively correlated with social activities (ρ = −0.31, p < 0.05), while its correlations with active and passive activities were not statistically significant. This pattern suggests that spaces with greater sky exposure may be less associated with social interaction, possibly because highly open spaces provide weaker spatial boundaries or fewer conditions for stable gathering. By contrast, social activities may be more likely to occur in places that offer a certain degree of enclosure and node-like spatial support.
In terms of spatial structure indicators, the proportion of road space (Road) was significantly positively correlated with passive activities (ρ = 0.35, p < 0.05). This suggests that areas with a higher visible proportion of pedestrian and circulation space tended to support more staying and resting behaviors, possibly because such spaces provide better accessibility or more usable open space along the waterfront promenade. By contrast, the relationships between the proportion of road space and both active and social activities were relatively weak.
The built environment proportion (Built) was significantly negatively correlated with both passive and social activities. Specifically, the correlation coefficients were ρ = −0.40 (p < 0.01) for social activities and ρ = −0.35 (p < 0.01) for passive activities. These results suggest that areas with a higher visible proportion of built interfaces tended to show lower levels of staying-oriented leisure behaviors and social interaction.
Additionally, visual diversity (Entropy) exhibited a significant positive correlation with active activities (ρ = 0.38, p < 0.01), while its correlations with passive and social activities were comparatively weak. This suggests that more diverse and visually complex environments tended to be associated with higher levels of movement-oriented activity in the bivariate analysis.
Overall, the correlation analysis shows that different street-level visual environmental indicators were associated with the intensities of the three types of leisure activities in different directions and to varying degrees. These exploratory findings provide a basis for the subsequent multiple regression models, which further examine the independent associations between visual environmental indicators and activity intensity.
4.4.2. Multiple Regression Results
Building on the correlation analysis, multiple linear regression models were developed to systematically examine the effects of the built environment on different types of leisure activities. These models used the intensities of the three types of leisure activities as dependent variables and street-level visual environmental indicators as independent variables.
Figure 10 and
Table 7 present the regression coefficients of the environmental variables and their corresponding 95% confidence intervals.
The regression results show that the directions and magnitudes of the effects of different built environment factors varied substantially across the three types of leisure activities.
For passive activities, visual diversity (Entropy) was the only environmental indicator that showed a statistically significant positive association, as indicated by its positive coefficient and confidence interval that did not cross zero. This result suggests that spaces with richer and more varied visual environments tended to support staying-oriented behaviors. By contrast, the coefficients of GVI, SVI, Built, and Road were not statistically significant in the passive activity model. Although Built showed a negative coefficient, this result should be interpreted only as a weak tendency rather than a robust effect. Overall, the passive activity model indicates that visual diversity was the most reliable visual environmental predictor of staying-oriented leisure behavior.
For active activities, none of the street-level visual environmental indicators reached statistical significance in the multiple regression model. Although GVI, Road, and Entropy showed positive coefficients, their confidence intervals crossed zero, indicating that these associations were not sufficiently stable after controlling for other visual environmental indicators and temporal factors. This suggests that active activities may be less directly explained by the micro-scale visual environment measured in this study. Instead, they may depend more on functional conditions such as path continuity, route connectivity, circulation efficiency, and the availability of exercise facilities, which were not fully captured by the selected visual indicators.
For social activities, visual diversity (Entropy) showed the strongest positive effect, indicating that spaces with richer visual elements and more heterogeneous environmental compositions were more likely to support social gathering. By contrast, GVI, SVI, and Built exhibited significant negative effects after controlling for other visual environmental indicators and temporal factors. This result should be interpreted as a conditional effect rather than as evidence that greenness or openness is inherently unfavorable to social interaction. It suggests that, when visual diversity and other spatial characteristics are held constant, areas with higher proportions of visible greenery or sky may function more as viewing, walking, or individual staying spaces, whereas social activities are more likely to cluster in visually complex spaces with interface support, activity nodes, facilities, and an appropriate degree of enclosure. Similarly, an excessively high proportion of built interfaces may compress public activity space and weaken spatial openness, thereby inhibiting social interaction.
Overall, the multiple regression results indicate that different types of leisure activities responded differently to street-level visual environmental indicators. Visual diversity showed the most stable positive association with passive and social activities, whereas active activities were less consistently explained by the selected visual variables. These findings suggest that the relationship between the waterfront visual environment and leisure activity intensity is activity-specific rather than uniform across all activity types.
4.4.3. Differences in Environmental Responses Across Activity Types
Taken together, the correlation analysis and multiple regression results indicate that different types of leisure activities varied in their environmental sensitivity and behavioral requirements, leading to distinct response patterns to the street-level visual environment.
Passive activities were most consistently associated with visual diversity. The positive effect of Entropy suggests that staying-oriented leisure behaviors were more likely to occur in spaces with richer visual information and a more varied environmental composition. Although Road and Built showed positive and negative tendencies, respectively, their effects were not statistically stable in the regression and robustness tests. Therefore, passive activities appear to depend primarily on the perceived richness of the immediate visual environment, while other spatial structural factors may play a secondary role.
Active activities showed weaker and less stable associations with the selected visual environmental indicators. Although GVI, Road, and Entropy displayed positive coefficients in some analyses, these effects did not remain statistically significant in the multiple regression and robustness tests. This suggests that active activities may be less directly driven by micro-scale visual composition and may depend more on functional spatial conditions, such as path continuity, route connectivity, circulation efficiency, and the availability of exercise facilities.
Social activities exhibited the strongest and most differentiated response to the street-level visual environment. Visual diversity had a significant positive association with social activity intensity, suggesting that spaces with richer visual information and more heterogeneous environmental compositions were more likely to support gathering and interaction. At the same time, GVI, SVI, and Built showed significant negative effects after controlling for other variables, indicating that social activities are not simply promoted by greater greenness, openness, or built-interface intensity. Instead, they appear to be related to a more balanced spatial configuration that combines visual richness, appropriate enclosure, interface support, and activity-supportive facilities.
Overall, the three types of leisure activities exhibited distinct response patterns to the street-level visual environment. Passive activities were most consistently related to visual diversity, active activities showed relatively weak and unstable associations with the selected visual indicators, and social activities were most sensitive to the combined effects of visual diversity, openness, greenness, and built interfaces. These findings suggest that the environmental–behavior relationship in waterfront public spaces is activity-specific and cannot be adequately explained by a single visual environmental indicator.
5. Discussion
5.1. Differential Response Mechanisms of Different Activity Types to the Perceived Environment
This study shows that the street-level visual environment does not affect all types of leisure activities in waterfront public spaces in a homogeneous manner; rather, it exerts clearly differentiated effects on passive, active, and social activities. This suggests that the environment–behavior relationship in waterfront public spaces is characterized by substantial activity-type heterogeneity, with different activities varying in their sensitivity to environmental factors and in the pathways through which these factors exert influence.
First, passive activities showed the most consistent response to visual diversity. The regression and robustness results indicate that Entropy had a stable positive association with passive activity intensity. This suggests that people are more likely to stay, rest, and enjoy the view in waterfront spaces with richer visual landscape elements and more varied environmental compositions. Passive activities are often closely tied to the immediate spatial experience, as staying behavior requires not only basic comfort but also sufficient visual attraction and reasons for remaining in place. Compared with spaces dominated by a single landscape element, composite environments that combine vegetation, water views, open spaces, facilities, and varied spatial interfaces may be more capable of forming attractive staying areas. This finding is consistent with environmental behavior research suggesting that staying behaviors depend on spatial attractiveness [
44], and further indicates that passive leisure in waterfront spaces responds strongly to perceptible environmental richness.
Second, active activities showed weaker and less stable associations with the street-level visual environment than passive and social activities. Although GVI, Road, and Entropy displayed positive tendencies in some analyses, these associations did not remain consistently significant in the regression models or robustness checks. This suggests that active activities may be less directly shaped by micro-scale visual composition and more dependent on functional spatial conditions, such as the continuity of the waterfront slow-mobility system, circulation efficiency, route organization, and the availability of exercise facilities. In this sense, the environmental requirements of active activities may be better understood as use-supportive rather than primarily visually attractive.
In contrast, social activities exhibited the strongest and most differentiated response to the street-level visual environment. The results showed that visual diversity was positively associated with social activity intensity, whereas GVI, SVI, and Built showed significant negative associations. These findings may appear counterintuitive at first glance, especially the negative associations between greenness, sky visibility, and social activity. However, they should not be interpreted as evidence that greenness or openness is inherently unfavorable to social interaction. Rather, they suggest that, in the Guangzhou waterfront context, social activities are not simply attracted to greener or more open spaces, but depend more on whether the visual and spatial environment provides suitable conditions for gathering, staying, and interaction.
One possible explanation is that highly green or highly open waterfront areas may function primarily as spaces for viewing, walking, or individual staying rather than as social gathering nodes. In such areas, vegetation and open sky may enhance visual comfort and landscape quality, but they do not necessarily provide the spatial edges, seating facilities, semi-enclosed settings, or functional support required for social clustering. This may help explain why GVI showed a negative association with social activity after other visual environmental variables were controlled. Similarly, a high SVI may indicate an overly open spatial condition. Although openness can improve visibility and landscape exposure, excessive openness may weaken spatial boundaries, reduce the sense of enclosure, and make it more difficult for stable gathering or interaction spaces to form.
The climatic context of Guangzhou may further help explain this pattern. As a hot and humid subtropical city, Guangzhou’s waterfront spaces with high sky exposure may experience stronger solar radiation and reduced daytime thermal comfort when sufficient shade is absent. Under these conditions, overly open spaces may be less attractive for prolonged social gatherings, even when they provide broad views. By contrast, socially active nodes may require a more balanced spatial condition that combines visual richness, partial enclosure, shade, seating, lighting, and other activity-supportive facilities. Evening social activities may also be shaped by nighttime lighting, perceived safety, and the availability of comfortable places to sit or gather, factors that are not fully captured by GVI or SVI.
The negative association between Built and social activity also requires careful interpretation. This result does not imply that built interfaces are unimportant for social interaction. Rather, it suggests that an excessively high proportion of hard-built interfaces may compress public activity space, reduce environmental intimacy, and weaken the openness needed for comfortable gathering [
46]. Social activities may therefore depend on an appropriate configuration of built edges rather than on a high built proportion itself. In addition, unmeasured factors such as seating provision, shading quality, nighttime lighting, thermal comfort, commercial frontage, facility configuration, and programmed activities may mediate the relationship between visual indicators and social activities. Overall, these findings suggest that social interaction depends less on any single visual attribute and more on the combined configuration of visual diversity, spatial enclosure, interface support, climatic comfort, and activity-supportive facilities.
5.2. The Role of the Built Environment and Visual Diversity
Among the environmental variables examined in this study, visual diversity emerged as one of the most explanatory and stable indicators. The results show that visual diversity was positively associated with passive activities and remained a strong positive predictor in the social activity model. This suggests that the compositional structure of multiple visual elements in the street-level environment may be more informative for explaining leisure activity patterns than the increase or decrease in any single environmental element.
From the perspective of environmental perception, visual diversity reflects how different landscape and spatial elements are combined within street-level scenes. Spaces that include vegetation, water views, open areas, built interfaces, and public facilities can provide richer environmental information and a stronger sense of spatial legibility, stayability, and exploration. For passive activities, such visual richness may increase environmental attraction and encourage people to stay, observe, and rest. For social activities, visual diversity may strengthen the sense of place and interaction potential of a space, making it more likely to function as a node for communication and gathering.
By contrast, the effects of individual environmental elements were less stable. For example, although GVI showed positive tendencies in some analyses, its independent association was not robust after other environmental variables were controlled, particularly in the active activity model. This suggests that leisure activities in waterfront public spaces are unlikely to be driven by a single landscape element alone. Rather, they appear to be shaped by the combined configuration of multiple visual environmental factors. In this sense, the compositional structure of the visual environment may be more informative than the optimization of any single indicator, which distinguishes this study from conventional macro-scale built environment analyses.
The significant negative association between Built and social activity further suggests that stronger built interfaces are not necessarily more supportive of social interaction in waterfront public spaces. A highly visible proportion of built interfaces may reduce spatial openness, compress public activity space, and weaken environmental intimacy and the sense of place. However, this does not mean that built interfaces themselves are unimportant. Rather, their role depends on how they are configured with open spaces, landscape elements, pedestrian areas, and public facilities. Activity-supportive waterfront environments are therefore not simply those with the highest greenness or the strongest built presence, but those in which different elements form an appropriate hierarchy, rhythm, and interface support.
Therefore, the findings on visual diversity suggest that improving waterfront vitality should not rely solely on adding individual environmental elements. Greater attention should be paid to the compositional relationships and spatial hierarchy among vegetation, water views, built interfaces, facilities, and open spaces. Compared with single-element landscape optimization, strategies that enhance visual diversity and spatial organization may be more effective in improving the attractiveness and activity-supporting capacity of waterfront public spaces.
5.3. Planning and Design Implications
This study shows that different types of leisure activities respond differently to the visual environment of waterfront spaces. Accordingly, the optimization of waterfront public spaces should move beyond uniform design strategies and adopt more activity-specific, hierarchical, and context-sensitive interventions.
First, for passive activities, design interventions should focus on improving both stayability and visual appeal. The results indicate that visual diversity is consistently associated with passive activities. The results indicate that visual diversity is consistently associated with passive activities. In waterfront renewal, this implies the need for a coordinated arrangement of vegetation, water views, paving, seating, shading structures, and viewing points to create staying environments with spatial layering and perceptible variation. Rather than simply increasing greenery, design efforts should aim to provide diverse place experiences that support observation, relaxation, and short-term staying.
For active activities, planning and design should place greater emphasis on circulation support and spatial continuity. Given the weak and unstable statistical association between active activities and the selected visual indicators, these activities appear to depend more on functional conditions, such as path-system integrity, route continuity, spatial accessibility, and the availability of exercise facilities. In waterfront slow-mobility spaces, design efforts should therefore prioritize continuous, safe, and legible walking and running routes, reduce spatial fragmentation and circulation conflicts, and improve route coherence by linking key activity nodes. Visual landscape improvements may enhance the overall experience, but they are unlikely to be sufficient on their own to support active activities.
When designing spaces for social activities, the design should focus on creating spaces with an appropriate degree of enclosure, interface support, and visual complexity. The findings suggest that social activities are more likely to occur in environments with rich visual elements, clearly defined interface relationships, and spatial structures that are not overly open. Socially oriented nodes may therefore benefit from small-scale plazas, stay-supportive edges, semi-enclosed resting areas, and multifunctional facilities. These elements can strengthen the sense of place and create more favorable conditions for interaction. At the same time, overly decorative or excessively open designs should be avoided, as they may weaken the balance between openness and social aggregation.
Finally, from the perspective of overall spatial optimization, waterfront design should shift from simply increasing greenery toward improving spatial structure and environmental composition. The findings suggest that enhancing spatial vitality is not equivalent to maximizing any single indicator; rather, it depends on the coordinated organization of multiple environmental elements. Future waterfront design could therefore follow a logic of enhancing visual diversity, shaping spatial hierarchy, and matching environmental conditions with activity types. Such an approach can help create composite spatial environments that combine landscape attractiveness, staying opportunities, and potential for social interaction.
More broadly, the proposed framework can serve as a diagnostic and evaluative tool for waterfront planning and design practice. Before spatial renewal, street-view semantic segmentation can be used to map the distribution of key visual environmental attributes, such as greenness, openness, built interfaces, circulation space, and visual diversity, and to identify segments where visual composition does not align with observed activity demand. By linking these indicators with activity observation data, planners and designers can further diagnose whether low-activity areas are associated with insufficient stay-supportive visual richness, weak circulation support, a lack of social interfaces, excessive openness, or built-interface pressure. After design intervention, the same workflow can be repeated to compare changes in visual environmental indicators and activity intensity, thereby supporting post-occupancy evaluation and adaptive management. In this sense, the contribution of this study is not limited to design suggestions for the Pearl River waterfront; it also offers a replicable evidence-based workflow that connects pedestrian-level visual diagnosis, activity-type differentiation, and targeted spatial optimization.
The framework may also be transferable to other waterfronts and linear public spaces, such as riverfront promenades, urban greenways, coastal walkways, and linear parks, because it relies on widely available street-view imagery, reproducible semantic segmentation procedures, and observable activity data. However, the empirical coefficients and design interpretations should not be generalized without contextual calibration. Differences in climate, cultural preferences, facility provision, management practices, and daily activity rhythms may affect how visual environmental attributes are translated into behavior. Future applications in other cities should therefore retain the same analytical logic while recalibrating activity classifications, observation periods, spatial buffer scales, and locally relevant environmental variables.
5.4. Limitations and Future Research
This study examined how the micro-scale street-level visual environment is associated with different types of leisure activities in waterfront public spaces. However, several limitations should be acknowledged and addressed in future research.
- (1)
The behavioral data used in this study were derived from on-site observations conducted during a limited number of time periods. Although these data captured activity differences between weekdays and weekends and across different times of day, they cannot fully reflect variations in activity patterns across different seasons, weather conditions, and longer temporal scales. Future research could integrate long-term behavioral observations, mobile signaling data, or spatiotemporal social media data to support more continuous and dynamic analyses of activity patterns in waterfront public spaces.
- (2)
A second limitation concerns the temporal mismatch between behavioral observations and street-view image acquisition. The behavioral observations covered morning, afternoon, and evening periods, whereas the street-view images used for semantic segmentation were collected during daytime. Therefore, the visual environmental indicators derived from these images should be interpreted as representations of relatively stable physical and structural visual characteristics, such as vegetation, sky visibility, built interfaces, and road space, rather than as real-time perceptual conditions for each observation period. This issue is particularly relevant to evening activities, especially social activities, because nighttime visual experience may differ substantially from daytime street-view conditions. Future research could incorporate nighttime street-view imagery, multi-period image collection, lighting measurements, and dynamic environmental monitoring to better capture time-specific visual experiences and their relationships with leisure activity intensity.
- (3)
This study primarily focused on visual environmental characteristics derived from street-view imagery and did not incorporate thermal comfort, noise, wind conditions, or other microclimatic factors into a unified analytical framework. In practice, public activities are shaped by multiple environmental dimensions. This issue is particularly relevant to waterfront spaces in Guangzhou, where hot and humid climatic conditions may interact with visual openness, shading, and spatial enclosure to influence activity occurrence. For example, areas with higher sky visibility may provide broader views, but without sufficient shade, they may also experience stronger solar exposure and reduced daytime thermal comfort, which may make them less suitable for prolonged social gatherings. In addition, several activity-supportive factors that may be particularly relevant to social activities, such as seating availability, shading quality, nighttime lighting, perceived safety, commercial frontage, facility configuration, and programmed activities, were not directly measured in this study. These factors may partly mediate the relationship between street-level visual indicators and social activity intensity. Future research could incorporate environmental sensor data, meteorological monitoring data, facility audits, and subjective perception surveys to develop a more comprehensive environment–behavior analytical framework.
- (4)
This study focused on the core waterfront section of the Pearl River in Guangzhou as a case study. Although this area is representative of central waterfront public spaces in Guangzhou, the applicability of the findings to waterfront spaces in other cities requires further validation. Spatial form, cultural preferences, facility provision, management practices, and public activity patterns may vary across cities. Future research could conduct comparative studies across different cities and types of waterfront spaces to test the transferability and broader generalizability of the findings.
- (5)
This study examined the relationship between the visual environment and activity types by combining street-view semantic segmentation with regression analysis. As such, the mechanisms revealed here should be understood as statistical associations rather than strictly causal relationships. Future research could further incorporate experimental designs, quasi-natural experiments, or behavioral data with higher spatiotemporal resolution to more rigorously identify the dynamic causal pathways linking environmental optimization and changes in activity patterns.
6. Conclusions
6.1. Main Findings
Focusing on the core waterfront section along the northern bank of the Pearl River in Guangzhou, this study recorded three types of leisure activities—passive, active, and social activities—through field observations, and extracted visual environmental indicators of the waterfront space using a street-view image semantic segmentation approach. Based on correlation analysis and multiple regression models, the study examined how different types of leisure activity intensity respond to the street-level visual environment in waterfront public spaces. The main findings are as follows:
- (1)
Leisure activities in waterfront public spaces exhibited clear temporal variations. Overall, all three activity types were more active in the evening and on weekends, with social activities showing the most pronounced temporal differences and a tendency toward nighttime clustering. Passive activities increased more substantially on weekends, whereas active activities remained relatively stable across different temporal conditions. These findings indicate that the Pearl River waterfront functions not only as an important setting for everyday leisure but also as a space shaped by distinct temporal clustering and daily-life rhythms. It should be noted that the street-view indicators used in this study mainly represent relatively stable structural visual characteristics and do not fully capture time-specific nighttime perceptual conditions.
- (2)
Different types of leisure activities exhibited distinct overall intensity structures. Active activities showed relatively stable intensity levels and a comparatively high median, indicating that physical activities such as walking and jogging have a sustained presence in waterfront spaces. Passive activities displayed stronger temporal dependence, with their intensity varying more substantially across different times of day and day types. Social activities, although characterized by a relatively low overall median intensity, exhibited high-intensity clustering at specific spatial nodes and during particular time periods, revealing pronounced node and place dependence.
- (3)
The multiple regression results indicate that different types of leisure activities responded differently to the street-level visual environment. Visual diversity (Entropy) showed a stable positive association with passive activities, suggesting that environmental richness is an important micro-scale condition for supporting staying behaviors. Active activities showed generally weak and unstable associations with the selected visual environmental variables, indicating that they may depend more on non-visual conditions, such as spatial continuity and functional support. Social activities were the most sensitive to the visual environment: GVI, SVI, and Built showed significant negative associations, whereas visual diversity showed a significant positive effect. This suggests that social activities are more likely to occur in environments characterized by rich visual elements, complex spatial structures, and an appropriate degree of enclosure.
- (4)
Visual diversity is a key micro-scale environmental variable for explaining differences in leisure activities within waterfront public spaces. Compared with any single environmental element, the compositional structure of the visual environment appears to play a more important role in shaping spatial attractiveness and activity organization. When waterfront spaces simultaneously incorporate multiple perceptible visual elements, such as vegetation, water, open space, built interfaces, and public facilities, they are more likely to form places that support staying, interaction, and spatial legibility, thereby enhancing both the vitality of public space use and the diversity of behaviors.
Overall, the findings of this study indicate that leisure activities in waterfront public spaces are associated not only with temporal conditions but also with the spatial configuration of the environment. Different activity types exhibit distinct response patterns to visual environmental characteristics. As an important representation of micro-scale spatial quality, the structure of the street-level visual environment plays an important role in shaping activity patterns in waterfront public spaces.
6.2. Research Contributions
From the perspective of street-level visual environments, this study examines how different types of leisure activities respond to micro-scale visual environmental conditions in waterfront public spaces. Its contributions can be summarized in methodological, empirical, and practical dimensions.
Methodologically, this study integrates street-view image semantic segmentation with behavioral observation data to construct a quantitative framework for assessing the micro-scale visual environment of waterfront public spaces. By extracting indicators such as the Green View Index (GVI), Sky View Index (SVI), built environment proportion (Built), road space proportion (Road), and visual diversity (Entropy), this framework identifies environmental elements from a pedestrian perspective and links them to different activity types. In doing so, it extends conventional public space research that has predominantly relied on macro-scale built environment indicators.
Empirically, this study reveals the differentiated response mechanisms of various types of leisure activities to the visual environment, demonstrating that passive, active, and social activities are not driven by the same environmental factors. Specifically, passive activities are primarily promoted by visual diversity, active activities exhibit relatively weak and unstable associations with the visual environment, whereas social activities are the most sensitive to visual conditions. These findings enrich the research perspective on the “environment–behavior” relationship in waterfront public spaces and provide new empirical evidence for understanding the spatial preferences of different activity types.
In terms of planning practice, this study provides empirical support for shifting waterfront public space design from uniform greening toward activity-oriented spatial optimization. The findings indicate that enhancing the vitality of public spaces is not equivalent to increasing a single landscape element; rather, it depends on integrated environmental characteristics such as visual diversity, interface organization, and spatial complexity. Accordingly, the optimization of waterfront public spaces should give greater attention to the differentiated environmental needs of various activity types and improve both leisure activity support and overall spatial quality through hierarchical and scenario-based design strategies.
Overall, by integrating behavioral observations with street-level visual environment analysis, this study develops an analytical framework that links the micro-scale environment of waterfront public spaces with activity types. It provides a useful perspective for understanding the fine-grained mechanisms underlying how fine-grained visual environmental conditions are associated with waterfront activity patterns and offers a basis for future research on the refined design and evaluation of waterfront environments.