1. Introduction
The pursuit of healthy and sustainable urban development constitutes a critical global challenge, underscored by the United Nations Sustainable Development Goals (SDGs), particularly SDG 3 (Good Health and Well-being) and SDG 11 (Sustainable Cities and Communities). The scope of sustainable urban design continues to evolve, shifting from merely alleviating environmental impacts to actively shaping built environments that augment human capacity and resilience in a socially equitable manner. While fundamental frameworks such as Ulrich’s Stress Recovery Theory [
1] and Kaplan’s Attention Restoration Theory [
2] have substantiated the mental health advantages of nature, their primary focus remains on passive stress reduction and parasympathetic recovery. However, the transition from these classical theories to operational design mechanisms requires critical scrutiny. As indicated by recent systematic reviews, the empirical evidence for ART often relies on heterogeneous cognitive measures, leaving the precise physiological mechanisms, particularly those linking visual perception to behavioral activation, underspecified [
3]. This theoretical gap suggests that traditional restorative frameworks, while effective for analyzing “recovery”, may be insufficient for explaining “engagement”. Consequently, to achieve comprehensive sustainable health outcomes, a paradigm shift is necessary: moving from the design of passively restorative spaces to the configuration of environments that actively foster spontaneous health-related behaviors—a concept increasingly defined as “Active Health” [
4]. Unlike purely restorative approaches that aim for relaxation (low arousal), Active Health emphasizes strengthening an individual’s physiological and psychological capacity through positive environmental stimuli [
5,
6]. It posits that a state of “moderate activation” (rather than deep relaxation or high stress)—characterized by heightened alertness and physiological readiness—is the key mechanism driving the transition from sedentary behavior to active engagement.
This transition is especially compelling in high-density residential areas, where spatial constraints and sedentary lifestyles present significant public health risks. Crucially, this high-density condition must be understood as a structural reality of China’s territorial development rather than a transient planning outcome. As analyzed in recent structural studies, the historical “Heihe-Tengchong Line” (Hu Line) continues to dictate a persistent demographic imbalance, with over 90% of the population concentrated in the southeastern half of the country [
7]. Despite state-led efforts toward demographic redistribution, this pattern of concentration remains resilient. Consequently, China’s urbanization strategy has necessitated a paradigm shift from “extensive spatial expansion” to “internal quality optimization”. In this macro-territorial context, where the capacity for large-scale spatial restructuring is constrained by rigid land-use boundaries, micro public spaces, such as pocket gardens and courtyard extensions, are indispensable components of the sustainable urban infrastructure. Crucially, however, it is important to recognize that these visual environments do not operate as mechanical stimuli in a vacuum; rather, they are socially and institutionally embedded configurations [
8]. As noted in recent sociological studies on urban embeddedness, visual configurations function within routines, norms, and culturally learned interpretations that define what a space is for and how it should be used [
9]. Therefore, these spaces should be conceptualized not merely as physical containers, but as “infrastructures of care” that participate in the everyday governance of health. By lowering barriers and offering non-coercive visual cues, the orchestration of elements such as green space and activity areas represents a processual dimension of governing the large metropolis [
10]. Consequently, the strategic design of these spaces offers a scalable solution for governing health-related behaviors through everyday spatial arrangements, contributing to a more inclusive and socially sustainable urban environment [
11]. In this framework, sustainability is conceptualized not merely as a normative backdrop but as a social project rooted in equity: ensuring that the psychophysiological benefits of active health are not privileged commodities, but accessible assets that foster inclusion and everyday well-being in the high-density metropolis [
12].
Despite the recognition of micro public spaces as vital health infrastructure, a critical knowledge gap remains regarding the specific visual mechanisms that induce the “moderate psychophysiological activation” required for active health. Existing research has predominantly focused on aesthetic preference or restorative perception (e.g., high vegetation coverage for stress recovery) [
13], often leaving the mechanisms underlying activation-oriented outcomes underspecified. Although studies have documented the benefits of individual features like sky view factors or greenery [
14], there is a lack of quantitative research exploring how specific combinations of measurable visual elements collectively induce psychophysiological states that support spontaneous activity intention [
15]. Furthermore, current methodologies often rely on self-reported data or observational studies [
16,
17], lacking the rigorous integration of semantic visual quantification with real-time psychophysiological assessment needed to decode the complex pathway from visual perception to behavioral intention.
To address these deficiencies, this study employs a multimodal approach combining machine learning-based semantic segmentation, immersive Virtual Reality (VR), and psychophysiological measurement. While VR studies have simulated urban green spaces to assess stress recovery [
18,
19], few have addressed the intermediate psychophysiological activation states that predispose individuals to active behavior. Similarly, while advances in an algorithm for semantic segmentation (e.g., DeepLabv3+) enable the precise quantification of environmental constituents [
20,
21], these tools are rarely integrated with physiological data to explain how visual elements interact to yield coherent psychophysiological responses [
22,
23,
24]. Comprehending the pathway from visual morphology to active health necessitates an analytical framework capable of integrating these multiple data dimensions into interpretable patterns [
25,
26].
Therefore, the primary objective of this study is to elucidate how the visual morphology of micro public spaces can be optimized to transform them from passive areas into active catalysts for health. To address the identified knowledge gaps, this study is guided by the following three research hypotheses:
Hypothesis 1 (H1): distinct visual morphological patterns (archetypes) in micro public spaces will elicit significantly differentiated psychophysiological responses, extending beyond simple restoration to include specific activation states;
Hypothesis 2 (H2): a state of “moderate psychophysiological activation” acts as the core mediating mechanism that maximizes spontaneous behavioral intention, distinguishing it from both passive relaxation and stress;
Hypothesis 3 (H3): specific quantifiable visual elements can be identified as key predictive drivers for this health-promoting activation state.
To test these hypotheses, the study addresses three specific research questions: 1. How do quantifiable visual morphological elements in micro public spaces systematically influence physiological arousal and perceptual evaluation? 2. What is the core psychophysiological mechanism that mediates the relationship between visual stimuli and spontaneous behavioral intention? 3. How can these mechanisms be translated into operational design archetypes to support active health governance in high-density urban settings?
To address these inquiries, a multimodal framework was developed that integrates computational visual analysis, controlled psychophysiological assessment in immersive VR, and data-driven analytical modeling. This integrated approach aims to elucidate how quantifiable visual features induce a state of moderate psychophysiological activation that supports health-promoting behaviors. The findings contribute to the theoretical foundation of sustainable healthy environments and offer evidence-based insights for the design of activation-oriented spaces and evidence-based urban policy.
3. Results
This section presents the quantitative findings derived from the integrated experimental–analytical framework. It is structured to systematically report outcomes across three interrelated dimensions: First, the objective quantification of visual morphology across the studied micro public spaces; Second, the reliability and scene-sensitivity of the collected multimodal psychophysiological and perceptual responses; and Third, the identification of latent response patterns and their key visual drivers. Together, these results elucidate how variations in the visual composition of micro public spaces systematically shape physiological states, perceptual evaluations, and behavioral intentions relevant to active health.
3.1. Visual Morphology of Micro Public Spaces
Semantic segmentation and quantitative analysis of the 360° panoramic images from the twenty selected sites enabled the quantification of proportional coverage for nine core visual elements. As illustrated in
Figure 8, distinct spatial types exhibited significant variations in their visual element composition, highlighting the morphological diversity and complexity of micro public spaces within high-density urban districts. Based on their spatial functions, these elements were categorized into three groups: (1) foundational visual elements, such as vegetation coverage, activity area ratio, and sky view factor, were universally present with wide proportional ranges and collectively established the fundamental visual tone of a space; (2) dynamic and distractive elements, including pedestrian density and physical barriers, showed highly uneven distributions that directly reflected the functional vitality and potential disturbance level of a scene; (3) whereas constructed boundary elements, for example, fences and facilities, displayed distribution patterns closely tied to spatial typology and primarily served to reinforce boundaries and provide functional cues.
Within this categorical framework, distinct morphological characteristics were observed across the four spatial types. The Fully Enclosed type was dominated by ecological components. In several scenes, vegetation coverage and green space ratio both exceeded 70%, resulting in visually saturated green scenes and a high degree of spatial enclosure. The Semi-Enclosed type exhibited greater internal variability in its visual composition. High vegetation coverage co-occurred with relatively high proportions of either pedestrian presence or physical barriers, leading to more heterogeneous visual patterns across scenes. The Block-Derived type was characterized by large proportions of sky view factor and activity areas. In many cases, these two components together accounted for more than 75% of the visual field, corresponding to a more open spatial condition and reduced representation of ecological elements. The Independent Attached type showed the most diverse combinations of visual components. Vegetation, open surfaces, and facilities appeared in varying proportions among scenes. Some cases (e.g., T4-2) presented relatively balanced distributions between ecological and functional elements, whereas others were largely composed of activity areas and sky view factor.
In summary, the morphological analysis confirms that the selected micro public spaces possess quantifiable and structurally distinct visual configurations. This objective heterogeneity provides the necessary independent variable variation to rigorously test how different visual combinations—ranging from nature-dominated enclosures to open, activity-centric spaces—differentially impact human psychophysiological responses in the subsequent experimental phases.
3.2. Reliability and Consistency of Multimodal Data
This section evaluates the psychometric properties of both subjective and physiological measures, establishing a data quality baseline for the modeling procedures that follow.
3.2.1. Psychometric Reliability of Subjective Assessments
The reliability and validity of the subjective evaluation framework, which encompassed visual perception, spatial experience, emotional valence, and behavioral intention, were assessed using the 600 observational samples. As shown in
Table 1, Cronbach’s
α for all subscales and the full 16-item scale exceeded the 0.70 acceptability threshold, with the full scale reaching an excellent
α of 0.93. The closely matching standardized Cronbach’s
α coefficients further confirmed strong internal consistency, indicating that participants’ perceptual and affective appraisals were stable across scenes and suitable for multivariate analysis.
Construct validity was supported by a high KMO measure (0.959) and a significant Bartlett’s test of sphericity (
p < 0.001), as detailed in
Table 2, confirming the suitability of the data for factor analysis and the coherent structure of the evaluation framework.
As shown in
Table 3, one-way ANOVA results revealed significant scene-based effects for all four subjective indicators (
p < 0.001), with effect sizes ranging from medium to large (Cohen’s
f = 0.436–0.584). These findings demonstrate that visual variations among micro public spaces effectively elicit differentiated perceptual, affective, and intentional responses. Notably, behavioral intention showed the strongest effect, underscoring the critical role of the perceptual pathway in activating health-promoting behaviors.
3.2.2. Scene-Based Variability in Physiological Responses
Physiological indicators also demonstrated systematic sensitivity to visual environmental changes. One-way ANOVA confirmed highly significant scene effects (
p < 0.001) for all four autonomic indicators (SC, BVP, TEMP, and RESP), as presented in
Table 4, verifying that micro-scale visual conditions reliably modulate autonomic nervous system activity.
Analysis of effect sizes quantified the magnitude of these differences: SC (f = 0.585) and TEMP (f = 0.446) showed the greatest variability, BVP (f = 0.364) approached a large effect, and RESP (f = 0.262) exhibited a moderate effect. This pattern indicates that visual environments elicit graded physiological states, ranging from sympathetic arousal to parasympathetically influenced readiness for relaxation. Such context-dependent, moderate autonomic regulation constitutes the physiological basis for a “readiness state” that is conducive to health-promoting behavior, establishing a solid foundation for identifying coherent psychophysiological response patterns in subsequent analyses.
Collectively, the physiological and subjective analyses demonstrate that the experimental scenes elicited robust and statistically significant variations in user responses. The strong effect sizes, particularly for Skin Conductance (SC) and Behavioral Intention, confirm that the visual environment acts as a potent modulator of both autonomic arousal and conscious motivation. This validates the quality of the dataset for identifying specific psychophysiological patterns in the next analytical stage.
3.3. Pattern Identification and Visual Drivers of Active Health
Building on the established visual morphology and validated multimodal data, this section identifies the integrated psychophysiological patterns through which environmental features translate into health-related states and elucidates their key visual drivers.
3.3.1. Psychophysiological Response Patterns
Cluster analysis revealed three distinct psychophysiological response patterns, as shown in
Table 5, defined as three functional archetypes: Restoration-Supporting, Activity-Promoting, and Stress-Inducing. The Restoration-Supporting archetype exhibited the highest mean values for BVP and SC, alongside near-neutral subjective scores, indicating a parasympathetic-dominant state associated with bodily recovery. The Activity-Promoting archetype demonstrated the optimal profile for health promotion: it achieved the highest positive ratings across all subjective dimensions—perception, experience, emotion, and intention—while sustaining a moderate level of physiological arousal, thereby embodying the theorized state of moderate activation, characterized by an alert readiness that facilitates spontaneous activity without inducing stress. In contrast, the Stress-Inducing archetype displayed uniformly low values across both physiological and subjective measures, reflecting psychophysiological disengagement and discomfort in visually overloaded or socially dense environments.
The cluster structure demonstrated robust internal cohesion and clear separation (average silhouette coefficient = 0.71), confirming the reliability of the SOM-K-means hybrid approach. These findings empirically distinguish the “Activity-Promoting” state from traditional “Restorative” states. While restoration is characterized by maximum physiological relaxation, the activity-promoting state is defined by a unique synergy of high psychological positivity and moderate physiological arousal. This distinct profile supports the study’s central hypothesis regarding the existence of a specific activation mechanism for active health.
3.3.2. Visual Drivers of Response Patterns
The Activity-Promoting archetype, distinguished by its moderate activation profile, was confirmed as the most effective in fostering spontaneous behavioral intention. To unravel the specific visual features underlying these differentiated patterns, a RF classifier was constructed using the derived archetype labels as the outcome variable. The model achieved an out-of-bag classification accuracy of 87%, indicating a strong predictive link between visual morphology and psychophysiological state. Crucially, the stability of the identified visual predictors was verified through the repeated k-fold cross-validation analysis. The mean importance scores for the top-ranked features ranged from approximately 0.10 to 0.25 on a normalized scale with narrow confidence intervals, confirming that the feature hierarchy was robust to data partitioning. As shown in
Figure 9, feature importance analysis identified six key visual predictors, ranked in descending order of contribution: pedestrian density, activity area, green space ratio, fences, vegetation coverage, and facilities.
Each archetype possessed distinct visual morphological characteristics, as summarized in
Table 6. The Restoration-Supporting archetype was associated with the highest vegetation coverage, representing ecologically enriched and enclosed environments that support recovery. The Activity-Promoting archetype was characterized by the largest activity area, a moderate sky view factor, and minimal pedestrian density, forming a visually balanced setting conducive to active engagement. Conversely, the Stress-Inducing archetype exhibited the highest sky view factor and pedestrian density alongside the lowest vegetation coverage, corresponding to visually exposed and socially intense environments that induce stress.
4. Discussion
This study aimed to bridge the critical knowledge gap regarding the mechanisms by which micro public spaces influence active health. While previous research has extensively documented the restorative benefits of nature for stress recovery [
38,
39], our findings provide empirical evidence for a distinct “Active Health” paradigm. The results confirm that specific visual configurations do not merely induce passive relaxation but trigger a state of moderate psychophysiological activation that drives behavioral intention. The following sections discuss these findings through the lenses of environmental affordances, physiological regulation, and urban governance.
4.1. From Visual Stimuli to Social Affordances: The Embedded Nature of Perception
The results confirm that the visual morphology of micro public spaces significantly shapes participants’ physiological arousal and behavioral intentions. Specifically, the identified “Activity-Promoting” archetype—characterized by a high activity area ratio, a balanced sky view factor, and minimal physical barriers—was most effective in eliciting the target state of active health. However, it is crucial to recognize that these visual elements do not operate merely as mechanical stimuli acting on passive bodies. Rather, drawing on the sociological concept of embeddedness, we argue that these visual configurations function as socially and institutionally embedded signals. A clearly legible activity area or a permeable boundary does not just transmit visual information; it activates culturally learned interpretations of what a space is for, thereby defining the social logic of usage. This interpretation aligns with Affordance Theory, which posits that the environment is perceived in terms of its action possibilities.
This interpretation aligns with Affordance Theory, which posits that the environment is perceived in terms of its “action possibilities”. In our study, the reduction of physical barriers and the clarity of functional zones act as positive affordances. As noted in recent sociological discourse on health governance, such spatial arrangements effectively “lower the thresholds” for interaction. They transform the space from a static physical container into an inviting stage that offers non-coercive cues for engagement, reducing the psychological cost of initiating spontaneous physical activity.
Physiologically, this mechanism supports a distinct paradigm shift from “restorative” to “activating” design. While restorative environments (e.g., the Restoration-Supporting archetype) promote parasympathetic dominance to aid stress recovery [
1,
2], the “Activity-Promoting” environments trigger a state of moderate psychophysiological activation. This finding provides empirical support for Optimal Arousal Theory in urban micro-spaces: the environment maintains users in a “zone of optimal functioning”—alert enough to act (indicated by moderate SC and BVP) but calm enough to feel safe.
Consequently, the visual environment serves as a precondition for active behavior not just through physiological arousal, but through cognitive appraisal. The subjective data indicated higher ratings for perceived controllability in Activity-Promoting scenes. This interpretation aligns with Affordance Theory, which posits that the environment is perceived in terms of its “action possibilities”. In our study, the reduction of physical barriers and the clarity of functional zones act as positive affordances. Unlike traditional restorative models that focus on withdrawal from demand, our findings suggest a behavioral mechanism where visual clarity “lowers the thresholds” for interaction. This transforms the space from a static physical container into an inviting stage that offers non-coercive cues for engagement, thereby validating the extension of VR-based environmental research from passive health outcomes to active behavioral interventions.
4.2. Physiological Signature of Moderate Arousal: The Neural Basis for Active Engagement
The physiological data further delineate the specific pathways through which visual environments modulate the autonomic nervous system to shape distinct psychophysiological states. The Restoration-Supporting archetype, characterized by high vegetation coverage and enclosure, elicited parasympathetic-dominant responses (e.g., decreased SC, increased TEMP). This aligns perfectly with classical frameworks such as Stress Recovery Theory [
1], representing the physiological endpoint of stress recovery where metabolic resources are conserved. However, for the purpose of Active Health, our findings suggest that maximum relaxation is not the optimal state. The Activity-Promoting archetype exhibited a state of adaptive physiological preparedness, characterized by balanced, moderate levels of sympathetic arousal (as evidenced by SC and BVP trends). This finding provides robust empirical support for Optimal Arousal Theory within the built environment context. According to this theory, a medium level of arousal—distinct from both the lethargy of deep relaxation and the disorganization of high stress—is required to maximize attention and behavioral performance.
We argue that this “moderate activation” serves as the necessary neurophysiological substrate for perceiving and acting upon environmental affordances. When the visual environment offers clear “action possibilities” (e.g., open activity areas), the body responds by shifting from a “rest-and-digest” mode to a “ready-to-act” mode. This physiological shift is not a stress response, but a functional mobilization of energy resources to facilitate engagement with the social and physical environment. Conversely, the Stress-Inducing archetype (high pedestrian density, visual clutter) triggered maladaptive over-arousal (e.g., sharp increases in SC and RESP). Unlike the controlled activation observed in Activity-Promoting spaces, this state represents a defensive reaction that likely inhibits exploratory behavior.
In summary, visual morphology does not simply act on a linear scale from “stress” to “recovery”. Instead, it modulates a complex physiological continuum. Active Health design aims for a specific “Goldilocks zone” of regulated physiological activation—sufficiently stimulating to support positive behavioral intentions, yet sufficiently controlled to avoid the fatigue associated with sensory overload.
4.3. Cognitive Appraisal and the Sense of Agency: From Readiness to Intention
While physiological preparedness provides the necessary capacity for behavioral engagement, it is the perceptual and cognitive processes that determine whether this capacity translates into actual behavioral intention. Our subjective evaluation data indicated that participants’ experiences were deeply influenced by the balance between challenge and support inherent in the environment. We interpret this through the lens of Cognitive Appraisal Theory. The Activity-Promoting environments received significantly higher ratings for Spatial Experience and Behavioral Intention. This suggests that when users encounter a space with high visual legibility and low pedestrian congestion (as identified in our Random Forest model), they perform a positive “secondary appraisal”: they judge their coping resources as sufficient to meet environmental demands. Critically, this positive appraisal fosters a heightened sense of agency. The combination of visual openness (moderate sky view) and comprehensible complexity enhances the user’s perception of environmental control. In the sociology of spatial practices, a space that is legible and permeable is perceived as “conquerable” or “accessible”, whereas a cluttered, stressful space acts as a barrier to agency. This perception of behavioral feasibility is the catalyst that transforms latent physiological arousal into concrete motivation. In contrast, Stress-Inducing environments (e.g., those with high physical barriers) tended to cause perceptual overload, taxing cognitive resources and diminishing the sense of control.
Therefore, the “Visual Activation” mechanism is not a simple reflex. It is a sophisticated cognitive-emotional process where positive environmental appraisal validates physiological readiness. The visual environment effectively communicates to the user: “You are capable, and this space is enabled,” thereby reducing the psychological threshold for initiating spontaneous physical activity.
4.4. Synthesizing the Pathways: The Visual Activation for Active Health Framework
Based on the empirical findings and the theoretical discussions above, this study proposes an integrative theoretical model: the “Visual Activation for Active Health” framework (
Figure 10).
This framework systematically delineates the complete causal pathway from environmental visual input to the generation of behavioral intention. It incorporates the key analytical methods—semantic segmentation, response clustering, and Random Forest modeling—into a coherent explanatory model. The framework highlights the central mediating role of the moderate psychophysiological activation state and consists of four logically interconnected levels:
- (1)
Visual Design Inputs: Configurations of Quantified Affordances
This level serves as the starting point, corresponding to the nine core visual elements quantified via semantic segmentation. The analysis indicates that specific combinations of elements—such as a high activity area ratio coupled with low pedestrian density—do not merely exist as physical features but function as positive affordances. The Random Forest model confirmed that these quantifiable affordances (e.g., Pedestrian Density, Activity Area Ratio) are the primary drivers of the user’s initial interaction with the space.
- (2)
Core Mediating State: Moderate Psychophysiological Activation
This level constitutes the theoretical nexus. Supportive visual input concurrently elicits physiological regulation and subjective responses. Physiologically, this manifests as balanced autonomic nervous system activity; psychologically, it is characterized by a heightened yet moderate level of alertness. This integrated state provides the dynamic psychophysiological substrate for engagement, aligning with the Optimal Arousal Theory discussed in
Section 4.2.
- (3)
Cognitive-Motivational Translation: Appraisal and Agency
Building upon the physiological substrate, a process of real-time cognitive appraisal is engaged. Within the moderate activation state, users form positive judgments about environmental control and feasibility. As discussed in
Section 4.3, this enhances the user’s sense of agency, translating latent physiological readiness into a concrete intention for spontaneous physical activity.
- (4)
Integrated Outputs: The Emergence of Functional Archetypes
The convergence of these processes culminates in the emergence of stable behavioral archetypes (Restoration-Supporting, Activity-Promoting, Stress-Inducing). The Activity-Promoting archetype represents the optimal synergy where visual affordances successfully trigger the activation state and positive appraisal.
In summary, the framework reconceptualizes the visual characteristics of the built environment as an active, interventional medium. Its core mechanism lies in using specific visual configurations to elicit a “Goldilocks state” of regulated activation. Crucially, this framework extends beyond individual psychophysiology to the sociological dimension of urban governance. These micro-spaces should be viewed as “infrastructures of care”. Unlike coercive health interventions, the Activity-Promoting archetypes function through non-coercive governance: by optimizing visual affordances, they “lower the thresholds” for physical engagement. This mechanistic understanding shifts the design paradigm from creating static spaces for rest to shaping proactive environments capable of dynamically guiding health-promoting behaviors through embedded, everyday arrangements.
4.5. Limitations and Future Directions
Several limitations of this study warrant critical discussion. First, while the use of immersive VR provided rigorous experimental control over visual stimuli, it cannot fully replicate the multisensory richness (e.g., auditory, olfactory cues) or the physical sensation of movement inherent in real-world environments [
40]. Looking ahead, the transition from “visual activation” theory to practice requires operational research avenues. We propose that future work should prioritize longitudinal pilot interventions in physical community spaces.
Implementation: Researchers can collaborate with urban planners to retrofit existing underused micro-spaces. Interventions should focus on determining the “Activity-Promoting” features identified in this study, such as increasing the activity area ratio, optimizing the sky view factor through pruning or pergola removal, and replacing opaque walls with permeable fences to enhance affordance signaling.
Evaluation: To assess the durability of behavioral effects beyond the immediate laboratory response, effectiveness should be evaluated using a mixed-methods approach. This includes Post-Occupancy Evaluations (POE) over extended periods (e.g., 6–12 months) and the integration of mobile sensing technologies (e.g., portable EEG/EDA) to validate psychophysiological activation in situ.
Regarding sampling, the study employed a purposive strategy focusing on young, sedentary adults in China. While this was strategic for targeting the population most “at risk” of urban sedentarism, it introduces a bias that limits generalization to other demographic groups, such as children or the elderly, who may perceive spatial affordances differently. Furthermore, the visual semantics of “high-density” are culturally specific to the Chinese urban context. The interpretation of “crowding” or “enclosure” may differ in Western or low-density contexts, necessitating cross-cultural validation. To address this limitation and enhance the international relevance of the framework, we strongly advocate for comparative cross-cultural research. Future studies should replicate this multimodal protocol in diverse urban morphologies (e.g., European or North American cities). Such investigations are crucial to determine whether the “Activity-Promoting” visual archetypes identified here represent universal human responses to spatial form or adaptive mechanisms specific to high-density Asian urbanism. This cross-cultural validation will be essential for recalibrating the visual activation thresholds for global application.
Finally, implementing these findings requires a shift in urban governance mechanisms. As highlighted in recent discourse on regulation by incentives [
41], scientific evidence alone is insufficient to drive change. To bridge the gap between research and implementation, urban policies must structure incentives—such as density bonuses or tax credits—to encourage developers to incorporate “Activity-Promoting” designs. This interdisciplinary collaboration between environmental psychology, urban design, and public policy is essential for moving towards a more reflexive, evidence-based urban design that actively shapes health outcomes in the sustainable city.
5. Design Implications
5.1. Modulate Visual Gradients to Regulate Psychophysiological Arousal
Design should consciously orchestrate a gradient of visual experiences to guide users along a continuum from restorative recovery to moderate activation, corresponding to the identified functional archetypes. This involves the deliberate spatial zoning of micro public spaces. Restorative Zones should prioritize features associated with the Restoration-Supporting archetype, such as high vegetation coverage, substantial green space ratio, and enclosed boundaries, to promote parasympathetic dominance and psychological recovery. In contrast, Activation Zones should embody the visual signature of the Activity-Promoting archetype, characterized by a prominent and legible activity area, a sky view factor that is intentionally higher than in restorative zones yet lower than in stressful exposures, and minimal physical barriers. This configuration induces the state of moderate physiological arousal and behavioral readiness. Transitions between these zones should be mediated by semi-permeable elements, such as layered planting or low fences, to create a coherent perceptual rhythm.
Application Scenario: In a linear pocket park, this zoning can be implemented by sequencing an open entrance plaza (Activation Zone) that transitions into a densely vegetated inner grove (Restorative Zone). In space-constrained contexts like street corners, vertical gradients can be created using overhead pergolas to modulate the sky view factor, while raised planting beds define experiential edges without consuming ground space.
Addressing Urban Conflicts (Safety): A critical implementation trade-off involves balancing the enclosure needed for restoration with urban Safety requirements. High enclosure can obscure sightlines, reducing natural surveillance. This conflict can be resolved by employing “soft enclosure” strategies—such as open-work screens, transparent fencing, or trimmed lower vegetation—which maintain the psychological sense of refuge while preserving visual permeability for safety.
5.2. Structure Visual Affordances to Guide Perception and Intention
The visual environment serves as a primary medium for everyday governance, guiding behavior not through rigid rules but through spatial suggestions. Design should focus on structuring visual affordances that act as non-coercive cues, effectively “lowering the thresholds” for physical engagement. This approach transforms the abstract intention to exercise into a tangible, low-cost behavioral choice, positioning the built environment as an active agent in health promotion.
Enhance Legibility for Perceived Control: To strengthen the user’s sense of agency, spaces must maintain high Spatial Legibility. This is achieved by ensuring a balanced sky view factor and unobstructed sightlines across activity areas, which allows users to easily assess environmental opportunities. Clear sightlines also resolve the critical urban conflict between “enclosure” and “safety”. By ensuring visual continuity (e.g., using transparent vegetation layers), the design reduces the cognitive load of navigation and increases natural surveillance. This is particularly critical for empowering vulnerable groups (e.g., the elderly) to use the space without fear, reinforcing the space’s function as a safe infrastructure of care.
Signal Feasibility to Lower Behavioral Barriers: Affordance Signaling should be employed to communicate “permission” and “feasibility” instantly. This includes using intuitive ground textures (e.g., distinct paving for movement zones) and intentionally aligned facilities (e.g., seating facing activity zones) to broadcast social norms of participation. These embedded cues lower the psychological barrier to entry, making the “active choice” the path of least resistance.
Balance Vitality with Cognitive Accessibility: A critical design challenge is managing pedestrian density to suggest social vitality (Social Proof) without triggering “crowding stress”. The visual composition must balance dynamic human elements with static buffers to avoid sensory saturation. Furthermore, minimizing chaotic physical barriers and visual clutter is essential for Cognitive Accessibility. An overly complex environment can exclude neurodivergent users or those with cognitive decline. By simplifying the visual field, the design preserves the user’s attentional resources for engagement, ensuring the space remains an inclusive infrastructure of care rather than a source of sensory overload.
5.3. Orchestrate Synergistic Compositions as Processual Governance
Finally, the creation of active health environments should be understood not as a static installation of elements, but as a dynamic process of orchestration aligned with the agenda of governing the large metropolis. Design practitioners must govern the visual environment by articulating multiple elements into a coherent whole. The goal is not merely to place objects, but to produce a “synergistic effect” where the sum of visual cues creates a resilient social atmosphere that sustains engagement over time.
Synergize Activation with Restoration: To sustain the “Moderate Activation” state, design must act as a regulator between opposing forces. The strategy involves coupling “Activation Triggers” (e.g., high activity area ratio) with “Restorative Buffers” (e.g., sufficient green space ratio). For instance, an open plaza (high activation potential) should be framed by canopy trees (stress reduction). This synergy prevents the space from becoming a Stress-Inducing archetype (over-stimulation) while ensuring it does not regress into a purely passive Restoration-Supporting space (under-stimulation).
Define Boundaries to Embed Social Norms: Visual boundaries such as building density and fences are not just physical barriers; they are embedded institutional configurations that define social territories. A critical trade-off exists between “Openness” and “Security”. To resolve this, designers should employ permeable boundaries (e.g., low hedges, transparent fencing, or tiered planting). This approach provides clear territorial definition (governing usage norms) without creating visual exclusion, thereby maintaining the sense of control required for active engagement while preserving necessary lines of sight.
Orchestrate Dynamics for Social Proof: The visual presence of others—pedestrian density—functions as a powerful driver of behavioral intention via “social proof”. However, density must be orchestrated to encourage social mixing rather than crowding. Designers should spatially distribute facilities to avoid “visual congestion” while ensuring that active users remain visible. A major challenge in high-density areas is the Maintenance of these vibrant spaces. High usage intensity can lead to rapid degradation. Therefore, sustainable implementation requires integrating durable materials and “defensive” landscape strategies (e.g., protective buffer zones around delicate vegetation) to ensure the space remains a functional infrastructure of care over the long term.
A synergistic composition can be realized in a residential courtyard by integrating a central multi-use lawn (activation), a ring of peripheral seating (observation), a perforated feature wall (boundary definition), and a framed view of the sky. In highly constrained sites, synergy demands multifunctional elements; for example, a tiered planting wall can simultaneously act as a boundary, an ecological patch, and a seating ledge, maximizing the utility of limited space.
6. Conclusions and Outlook
This study provides empirical evidence that the strategic visual design of micro public spaces can effectively promote Active Health, distinct from passive stress recovery. Through a novel multimodal framework, we identified moderate psychophysiological activation—rather than maximum relaxation—as the core mechanism that elicits spontaneous activity intention. The empirical derivation of three functional archetypes—Restoration-Supporting, Activity-Promoting, and Stress-Inducing—demonstrates that specific visual configurations (e.g., high activity area ratio coupled with moderate openness) serve as robust predictors of the “ready-to-act” physiological state.
Theoretically, this research advances the field by proposing the “Visual Activation for Active Health” framework. This model transcends traditional stimulus-response paradigms by reconceptualizing micro public spaces as “infrastructures of care” that participate in the everyday governance of urban health. We demonstrate that visual morphology functions as a form of non-coercive governance, regulating health behaviors by “lowering the thresholds” for physical engagement through embedded, socially encoded spatial cues.
Methodologically, the study establishes a rigorous template for reflexive, evidence-based urban design. By integrating machine learning-based quantification with controlled psychophysiological measurement, we offer a replicable approach to decoding the complex, non-linear interactions between built forms and human well-being, bridging the gap between architectural metrics and public health outcomes.
Ultimately, the broader implication of this work lies in its potential to inform urban policy and planning. As cities face increasing density, the realization of active health requires shifting from normative guidelines to incentive-based design strategies [
41]. Scientific evidence must be translated into policy incentives—such as density bonuses or targeted retrofitting subsidies—that encourage the creation of “Activity-Promoting” spaces. This transition is critical for social sustainability, ensuring that high-quality, health-enabling environments are not privileges but accessible assets that actively shape equitable health outcomes in the high-density metropolis.