A Dual-Height AI Framework for Proxy Assessment of Children’s Spatial Perception in a Large Cultural Complex

Shen, Yingying; Zhu, Shuyan; Zhang, Fei

doi:10.3390/buildings16102030

Open AccessArticle

A Dual-Height AI Framework for Proxy Assessment of Children’s Spatial Perception in a Large Cultural Complex

by

Yingying Shen

¹

,

Shuyan Zhu

^1,2,*

and

Fei Zhang

¹

School of Architecture, South China University of Technology, Guangzhou 510641, China

²

State Key Laboratory of Subtropical Building Science, South China University of Technology, Guangzhou 510641, China

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(10), 2030; https://doi.org/10.3390/buildings16102030

Submission received: 20 April 2026 / Revised: 15 May 2026 / Accepted: 18 May 2026 / Published: 21 May 2026

(This article belongs to the Special Issue Data-Driven Intelligence for Sustainable Urban Renewal)

Download

Browse Figures

Versions Notes

Abstract

Large-scale cultural complexes serve significant numbers of child users, yet existing spatial assessment approaches are predominantly developed from adult perspectives and rarely consider child-height environmental exposure conditions at children’s own eye level. To address this gap, this study introdus a novel dual-height proxy assessment framework that integrates semantic segmentation with explainable machine learning, enabling scalable proxy-based spatial diagnosis without requiring direct child participation. This study proposes a proxy-based assessment framework combining dual-height street-view imagery (adult: 1.6 m; child: 1.2 m), semantic segmentation (DeepLabV3+ and PSPNet), GIS analysis, literature-informed proxy perceptual indices, and explainable machine learning (XGBoost with SHAP) applied across 480 sampling locations at the Longgang Cultural Centre, Shenzhen. The results reveal substantial differences in environmental exposure characteristics between adult-height and child-height viewpoints, with child-height imagery exhibiting 34% lower signage visibility and 30% higher spatial enclosure. Exploratory associations between environmental features and proxy perceptual indices yielded R²values ranging from 0.14 to 0.39, with walking distance, openness, and visual complexity emerging as the most influential variables within the proxy models. SHAP analysis identified non-linear relationships between environmental characteristics and proxy perception-related outcomes, and spatial mismatch mapping identified 120 locations warranting design attention. The study proposes a scalable and data-driven spatial proxy assessment framework to support child-friendly environmental screening and spatial diagnosis. The proposed proxy indices are grounded in developmental psychology literature and are not intended to substitute for children’s direct perceptual responses; rather, they are intended to characterise comparative child-height environmental exposure patterns within large-scale cultural environments. Validation using child-reported perception data, behavioural observation, participatory methods, and experimental wayfinding studies remains an important direction for future research.

Keywords:

children’s spatial perception; proxy assessment; semantic segmentation; dual-height imagery; child-friendly design

1. Introduction

Due to the rapid urbanization of Chinese cities, there has been a proliferation of large-scale cultural complexes designed to meet the diverse needs of communities. Despite their architectural scale and multifunctional programming, these environments frequently overlook child-relevant spatial conditions and environmental exposure characteristics. Children aged 6–12 years constitute a large proportion of the population in these public spaces, yet the available assessment frameworks are primarily based on adult design principles [1]. Conventional approaches to assessing spatial perception have relied on subjective surveys and observational studies, which cannot be considered evidence-based for architectural interventions [2,3,4]. Although these methods provide valuable qualitative insights, they are often constrained by limited spatial coverage, dependence on participant cooperation, and difficulties in conducting systematic large-scale environmental evaluation. These limitations become particularly pronounced in studies involving children, whose responses may be shaped by developmental differences in attention, memory, cognitive processing, and verbal articulation. Consequently, questionnaire- and interview-based approaches may not fully capture children’s continuous and situational interactions with complex built environments. This makes it challenging to systematically examine how specific environmental characteristics are associated with child-relevant environmental exposure conditions across large and spatially heterogeneous cultural settings.

Developmental factors related to eye height, visual field, mobility patterns, and cognitive processing abilities contribute to substantial differences between child and adult spatial experience. Studies have shown that spatial abilities develop significantly throughout childhood and that younger children (6–8 years) and older children (9–12 years) exhibit different levels of intrinsic and extrinsic spatial abilities [5,6,7]. These developmental differences are reflected not only in how environmental visibility and enclosure conditions are encountered at child height, but also in how navigational and spatial cues are differentially represented across age-relevant environmental contexts. Nevertheless, contemporary approaches to child-friendly design assessment still lack objective and scalable methods capable of systematically characterising child-height environmental conditions across large cultural complexes [8,9]. Most existing assessment frameworks continue to rely on adult-height environmental observations or generalized planning indicators, which may overlook child-relevant environmental exposure patterns such as reduced signage visibility, altered enclosure conditions, and restricted lines of sight. Consequently, environmental disparities associated with visual-height differences remain insufficiently represented within current built-environment evaluation frameworks.

These developmental differences are reflected not only in how environmental visibility and enclosure conditions are encountered at child height, but also in how navigational and spatial cues are interpreted across age groups. Nevertheless, contemporary methods for assessing child-friendly design lack objective, scalable methods to systematize perceptual variation across large cultural complexes [8,9]. Most existing assessment frameworks continue to rely on adult-height environmental observations or generalized planning indicators, which may overlook critical child-level visual experiences such as reduced signage visibility, altered enclosure perception, and restricted lines of sight. Consequently, the perceptual disparities caused by visual-height differences remain insufficiently examined within current built environment evaluation practices.

This study seeks to address this critical gap by proposing an AI-driven proxy assessment framework to characterise and analyse spatial conditions relevant to children at the Longgang Cultural Centre, Shenzhen (Figure 1). The framework combines semantic segmentation methods based on the DeepLabV3+ and PSPNet architectures to extract environmental characteristics from dual-height street-view imagery. These visual attributes are then analysed in relation to literature-informed proxy perceptual indices using machine learning models, principally XGBoost. This proxy-based approach offers several advantages over conventional methods: it enables systematic environmental characterisation at child-relevant viewing heights; supports the investigation of large spatial areas; identifies environmental variables associated with proxy perceptual indices; and generates spatially explicit guidance to inform child-sensitive design decisions. More specifically, this study aims to examine how visual-height disparities influence children’s spatial perception within large cultural complexes and to develop a scalable methodological framework for AI-assisted proxy-based spatial assessment capable of identifying potential spatial mismatches that are difficult to detect through traditional assessment approaches. Importantly, the objective of this study is not to reproduce or replace children’s actual perceptual responses, but rather to develop a proof-of-concept proxy assessment framework for identifying potentially problematic environmental conditions warranting further empirical investigation.

2. Literature Review

2.1. Child-Friendly Environments, Age-Sensitive Design, and Children’s Spatial Perception

Spatial perception reflects the thinking processes by which people comprehend their location in space and the connections of things surrounding them [10,11,12]. In children, these skills are gradually cultivated during middle childhood and have important implications for how architectural environments are interpreted and navigated. Research adopting two-factor spatial cognition models has shown that intrinsic spatial abilities (e.g., mental rotation and visualisation) and extrinsic spatial abilities (e.g., spatial relations and navigation) develop at different rates among children aged 6–9 years [13]. By later childhood (approximately 12 years of age), these differences become less pronounced as spatial cognition becomes increasingly coordinated.

Previous studies have also identified developmental differences in spatial abilities across age groups and genders, with intrinsic spatial skills showing stronger development during ages 6–8 years and extrinsic spatial skills becoming more evident between ages 8–10 years [14]. These developmental trends suggest that younger children rely more heavily on proximal visual cues and immediate environmental references, whereas older children demonstrate more advanced capacities for large-scale spatial interpretation and environmental negotiation. Additionally, spatial perception abilities are closely associated with reading comprehension and cognitive reasoning, highlighting their broader significance in childhood cognitive development [13,15].

Children’s environmental exposure conditions within urban environments differ substantially from those of adults because of differences in physical development, eye height, mobility, and cognitive processing. Weir [16] emphasised that threshold and transitional spaces play an important role in children’s neighbourhood mobility, particularly among children aged 9–10 years, who prefer environments that maintain visual connections to familiar places while simultaneously supporting exploration. Similarly, research examining children’s school-route experiences has demonstrated that street-interface complexity, environmental diversity, and accessibility to child-friendly facilities are strongly associated with child-relevant environmental interpretation and orientation patterns [17].

From the perspective of environmental psychology, children’s engagement with urban environments is shaped not only by physical accessibility but also by how environmental affordances support exploration, orientation, safety perception, and independent navigation. However, existing child-friendly environment studies remain dominated by qualitative observations, participatory workshops, and questionnaire-based evaluations, which frequently prioritise reported experiences rather than continuous environmental exposure conditions. Although such approaches provide valuable experiential insights, they are often difficult to standardise across large and spatially heterogeneous cultural environments and may not systematically capture how specific spatial configurations are associated with child-height environmental conditions.

These limitations highlight a broader methodological challenge in child-friendly urban research. While environmental psychology and inclusive design studies emphasise children’s embodied experiences, navigation behaviours, and spatial affordances, existing assessment approaches often lack scalable analytical tools capable of systematically translating these dimensions into measurable environmental indicators. As a result, the relationship between child-level environmental exposure conditions and built environment characteristics remains insufficiently quantified, particularly within large-scale cultural complexes where spatial conditions are highly heterogeneous. This methodological gap has prompted increasing interest in AI-assisted image-based approaches capable of extracting fine-grained environmental characteristics at scale.

2.2. Street-View Imagery and Semantic Segmentation in Built Environment Evaluation

The rapid advancement of deep neural network (DNN) techniques in image processing has reshaped how urban environmental conditions are analysed using artificial intelligence [18,19,20]. In particular, semantic segmentation—where each pixel is assigned to a predefined category—has enabled researchers to move beyond coarse visual interpretation and extract fine-grained environmental information from street-view imagery. This analytical capability is especially valuable because it transforms complex urban scenes into structured and quantifiable environmental representations, thereby supporting systematic large-scale spatial analysis.

Such analytical capacity relies heavily on the performance of segmentation models. Among them, DeepLabV3+ and PSPNet have demonstrated strong adaptability in heterogeneous urban environments [21,22]. DeepLabV3+ integrates an encoder–decoder architecture with atrous spatial pyramid pooling (ASPP), allowing multi-scale contextual features to be captured without substantial computational overhead [22,23]. Its use of depth-wise separable convolutions further reduces model complexity while maintaining high accuracy, with mean Intersection over Union (mIoU) values exceeding 82% on benchmark datasets [23]. PSPNet approaches the problem from a different angle by employing a pyramid pooling module, through which global contextual information is aggregated across multiple spatial scales. This design proves particularly effective when interpreting broader spatial arrangements and scene structures [24,25]. In practice, these two models often complement one another: while DeepLabV3+ refines boundary delineation at the object level, PSPNet contributes to a more coherent understanding of large-scale spatial patterns [26,27].

Once reliable segmentation outputs are obtained, an important methodological question emerges: how can pixel-level classifications be translated into meaningful representations of the built environment? Recent studies have addressed this issue by transforming segmented elements into quantitative indicators. Metrics such as the green-view index, enclosure ratio, and visual complexity are generated through the aggregation of specific pixel categories, enabling urban environments to be characterised in measurable terms. Existing studies suggest that these indicators exhibit meaningful associations with environmental perception and behavioural tendencies, indicating their potential to approximate certain environmental exposure characteristics associated with spatial experience [23].

This transformation from visual data into interpretable environmental metrics has gradually evolved into a more integrated framework for built environment evaluation [3,28,29]. Instead of analysing individual images in isolation, researchers increasingly construct composite indicators—including greenness, openness, enclosure, and walkability—to capture multiple dimensions of spatial quality. Compared with traditional field surveys, which are frequently constrained by time, labour, and spatial coverage limitations, image-based approaches provide a more scalable and operationally consistent alternative [30]. This advantage becomes particularly important in large and spatially heterogeneous urban environments, where conventional manual observation would be prohibitively resource-intensive.

Yet environmental perception is not determined solely by physical descriptors. From the perspective of environmental psychology, spatial experience emerges through dynamic interactions between individuals and environmental affordances, including visibility, legibility, enclosure, and navigational cues. Consequently, image-based indicators must be interpreted not merely as visual measurements, but as potential proxies for how users perceive and engage with urban environments. Increasing attention has therefore been directed towards incorporating perceptual and behavioural dimensions into evaluation frameworks. By linking visual indicators with literature-informed perception-related constructs-such as proxy indices associated with safety, comfort, and legibility-previous studies have demonstrated that computational metrics may support exploratory proxy-based environmental assessment at scale. Nevertheless, such approaches do not directly capture children’s actual subjective responses, behavioural interaction, observational evidence, or experimental navigation performance, highlighting the importance of interpreting AI-derived indicators cautiously within broader environmental psychology frameworks.

At a broader spatial scale, the organisation of environmental indicators introduces an additional layer of complexity. Urban environments are inherently heterogeneous, and findings derived from isolated sampling points may fail to capture underlying spatial variability. Aggregating indicators across street segments and neighbourhoods allows gradients, clusters, and spatial disparities to emerge that might otherwise remain obscured. When integrated with GIS-based spatial analysis, these patterns can be situated within larger urban systems, thereby providing a more comprehensive analytical basis for urban planning and environmental evaluation.

Against this backdrop, an important limitation becomes apparent: most existing studies rely on a single adult-height visual perspective. Such an approach may insufficiently represent child-height environmental exposure conditions and developmental differences in environmental visibility. The integration of multi-height street-view imagery provides a potential methodological extension. By combining adult-level (1.6 m) and child-level (1.2 m) perspectives, researchers can compare how environmental characteristics vary across viewing heights and age-relevant environmental conditions. Rather than directly measuring children’s actual perception or behaviour, this perspective-sensitive approach enables comparative analysis of child-height environmental exposure patterns that may remain insufficiently represented within conventional adult-centred evaluation frameworks.

2.3. Explainable Machine Learning for Design-Support Analytics

Machine learning algorithms provide strong capabilities for modelling nonlinear associations between environmental characteristics and proxy perception-related outcomes. XGBoost (eXtreme Gradient Boosting) has become especially valuable for urban spatial analysis because it can handle high-dimensional datasets, complex feature interactions, and nonlinear relationships while maintaining relatively high computational efficiency [31].

XGBoost is particularly suitable for spatial quality assessment because it can identify multifaceted associations between built-environment characteristics and proxy perceptual indices [32]. Li et al. [33], for example, employed XGBoost to examine associations between street-environment quality and high-resolution spatial data, demonstrating strong exploratory explanatory performance and revealing environmental characteristics associated with perceived spatial quality. The capacity of XGBoost to model nonlinear relationships is especially important in built-environment studies, where environmental exposure conditions and perception-related indicators rarely exhibit simple linear patterns [34,35]. In addition, XGBoost demonstrates robustness to noise and incomplete data, which are common characteristics of real-world urban datasets.

Random Forest serves as an important reference model in proxy index association analysis. Although computationally less complex than XGBoost, Random Forest constructs multiple independent decision trees through bootstrap aggregation, whereby subsets of training data are repeatedly sampled and combined through ensemble voting procedures. Comparative studies indicate that both Random Forest and XGBoost are suitable for urban spatial analysis, although XGBoost often demonstrates stronger performance under uneven data conditions. Nevertheless, Random Forest maintains several practical advantages, including simplified hyperparameter tuning and relatively stable performance when applied to smaller datasets.

In recent years, machine learning applications in urban planning have received increasing attention, particularly in studies concerning child-friendly environments. XGBoost-based analyses examining child-friendliness within Xiamen street environments demonstrated high exploratory explanatory performance, with street-feature diversity, interface colour variation, and spatial complexity emerging as important environmental variables associated with proxy perception-related indicators [36]. These findings suggest that machine learning approaches may provide useful analytical tools for examining associations between environmental characteristics and literature-informed proxy assessment constructs within complex urban environments, rather than directly predicting children’s actual perceptual responses or behavioural outcomes.

Despite these advantages, machine learning models are frequently criticised as “black-box” systems because their internal decision processes are difficult to interpret directly [37]. SHAP (SHapley Additive Explanations) addresses this limitation by providing theoretically grounded explanations of model outputs based on cooperative game theory. SHAP values estimate the contribution of individual features to model predictions, thereby enabling both global feature-importance analysis and local interpretation of individual predictions.

The application of SHAP within urban studies has revealed complex associations between environmental characteristics and perception-related indicators. Li [38], for example, demonstrated that building morphology, window-to-wall ratios, and commercial intensity exhibited important nonlinear associations with energy use and comfort-related conditions. Research has further shown that building typology and environmental configuration jointly influence how environmental characteristics are associated with indoor environmental outcomes [39], thereby supporting the rationale for user-stratified analytical frameworks such as the age-group-separated models adopted in this study. Similarly, SHAP has been applied in land-price modelling to identify nonlinear associations involving accessibility, road width, and parcel size across different land-use contexts.

The interpretability provided by SHAP is particularly relevant for child-friendly environmental analysis because it enables researchers to identify which environmental variables are most strongly associated with proxy indices of safety, comfort, enjoyment, and legibility across different age-group models [40]. Kee and Ho [41] used SHAP in urban-growth modelling and found that road-related variables-including travel time, elevation, and distance-were among the most influential factors associated with urban expansion patterns.

SHAP values provide both global and local explanations of model outputs. Global feature-importance rankings identify which environmental variables are most strongly associated with overall variation in proxy index values, whereas local explanations estimate how specific feature values contribute to individual model outputs. These local interpretations enable location-specific proxy-based environmental diagnosis rather than direct inference regarding children’s actual subjective perception or behavioural response. Visualisation tools such as force plots further support interpretation by illustrating how environmental variables shift model outputs above or below baseline conditions. This dual-level interpretability helps bridge the gap between exploratory explanatory modelling and practical proxy-based spatial assessment within built-environment research.

Although AI-based urban analysis has advanced substantially and increasing attention has been directed towards child-friendly environments, several important research gaps remain. First, many child-friendly environment studies continue to rely on qualitative participation or small-scale perceptual surveys, which provide valuable experiential insights but are difficult to generalise across large and spatially complex cultural settings. Second, research grounded in environmental psychology and affordance theory has highlighted the importance of navigation, visibility, and exploratory interaction in children’s spatial experience, yet these dimensions remain insufficiently operationalised within scalable built-environment evaluation frameworks. Third, although semantic segmentation and machine learning have been increasingly applied in urban studies, most approaches remain adult-centred and rarely consider child-height viewing perspectives or developmental differences in environmental exposure conditions. Finally, limited research integrates AI-extracted environmental characteristics with explainable analytical frameworks capable of identifying how environmental variables are associated with proxy perceptual outcomes across age groups.

This study addresses these gaps by proposing a proof-of-concept methodological framework integrating dual-height semantic segmentation, machine-learning-supported proxy index association analysis, and SHAP-based explainability for child-relevant environmental screening. Using the Longgang Cultural Centre in Shenzhen as a representative large-scale cultural complex, the study examines how environmental characteristics differ across adult-height and child-height perspectives within a spatial context where children constitute a major user group but design evaluation has historically remained adult-centred. Importantly, the proposed framework should be interpreted as a proxy-based analytical approach for characterising child-height environmental exposure conditions rather than a direct measurement or prediction of children’s actual perception, behavioural interaction, or navigation performance. The age-group-separated analytical structure adopted in this study therefore aims to explore developmental differences in proxy environment–outcome associations rather than replicate children’s embodied spatial experience directly. The overall analytical workflow is illustrated in Figure 2.

3. Methodology

3.1. Study Area

This paper is about the Longgang Cultural Centre in Shenzhen, China, 95,000 m² large-scale cultural complex designed by Mecanoo Architecten and opened in 2019. The complex is 400 m long and occupies a 3.8-hectare site in the eastern Longgang district. It is a complex of four, including an art museum (13,500 m²), a science centre (10,000 m²), a youth centre (8000 m²), and a book mall with cafes (35,000 m²). The distinctive red architectural surfaces and tilted structural forms create covered public passages that connect the adjacent commercial district with Longcheng Park.

The site was selected because children constitute an important user group within the science centre, youth centre, and surrounding public circulation spaces, making the complex a representative case for examining child-relevant environmental exposure conditions within large-scale Chinese cultural environments. The coexistence of diverse programmatic functions and expansive circulation networks generates a broad range of spatial typologies, including transitional spaces, open plazas, enclosed corridors, recreational interfaces, and visually complex circulation zones. This spatial heterogeneity provides an appropriate experimental setting for large-scale proxy-based environmental screening and comparative analysis of adult-height and child-height environmental characteristics.

3.2. Data Collection

Street View Imagery:

Street-view imagery was collected using a stabilised rig-mounted system equipped with a calibrated digital camera. The images were captured at two viewing heights: adult height (1.6 m) and child height (1.2 m). The 1.2 m viewpoint was adopted as a standardised proxy reference height representing general child-level environmental exposure conditions for children aged approximately 6–12 years, rather than a direct physiological replication of individual children’s visual experience or perceptual response. This dual-height protocol operationalises two distinct analytical categories (adult-height and child-height environmental viewpoints), each represented by an independent image set throughout the analysis.

The selection of 1.2 m as the single child-height capture point was justified on two grounds. First, anthropometric data for Chinese children aged 6–12 years (National Standard GB/T 26158-2010) indicate that mean standing eye height ranges from approximately 1.08 m (age 6) to 1.32 m (age 12), with a midpoint close to 1.20 m. This value therefore provides a pragmatic proxy reference height for comparative environmental analysis across the broader age range and has been adopted in comparable street-view-based environmental studies. Second, the primary objective of this study is to compare child-height and adult-height environmental exposure conditions at the building-complex scale. Accordingly, a single representative child-height viewpoint was considered sufficient for identifying systematic differences in environmental visibility, enclosure conditions, and sky-view characteristics within the proposed proxy assessment framework.

Image collection was conducted under stable sunny-weather conditions to minimise variations in lighting and environmental visibility. A total of 480 image pairs (960 images in total) were collected across the public circulation areas of the complex, including transitional interior corridors, connecting public pathways, and exterior plaza spaces. The sampling strategy was designed to support systematic large-scale environmental characterisation rather than behavioural observation or direct measurement of children’s spatial interaction. A total of 480 sampling points were distributed using a 10 m interval grid to ensure comprehensive spatial coverage. This resolution balances spatial precision and operational feasibility, consistent with urban micro-scale environmental analysis standards.

GIS Accessibility Data:

ArcGIS 10.8 was used to perform geospatial analysis and derive accessibility-related environmental indicators for each sampling point. The analysed variables included walking distance to main entrances, spatial openness (defined as the ratio of open space to total space within a 20 m radius), and connectivity (measured as the number of accessible routes linked to each sampling location). Additional variables included distance to amenities, such as seating areas, water fountains, and restrooms, as well as distance to child-relevant activity spaces. Architectural layout data obtained from the complex management were georeferenced within the GIS environment to ensure accurate spatial positioning of all sampling points and associated environmental indicators [30,42,43].

3.3. AI Feature Extraction

DeepLabV3+ (ResNet-101 backbone) and PSPNet (ResNet-50 backbone) models implemented in PyTorch were employed for semantic segmentation analysis. Both models were pre-trained on the Cityscapes dataset, which provides pixel-level annotations for 30 urban scene categories across 5000 high-resolution images collected from 50 cities [44]. The use of the Cityscapes training framework supported transferability across dense urban environments and enhanced segmentation robustness under varying spatial conditions within the study area. To evaluate segmentation reliability, the models were tested using a held-out set of manually annotated images collected from the study site. DeepLabV3+ and PSPNet achieved mean Intersection over Union (mIoU) values of 0.83 and 0.81, respectively, indicating satisfactory segmentation performance for subsequent environmental feature extraction.

The segmentation pipeline identified seven categories of environmental characteristics associated with child-relevant environmental exposure conditions and literature-informed proxy perceptual indices. Greenness was calculated as the proportion of pixels classified as vegetation, trees, or grass. Sky-visible pixels were quantified relative to the total image area and used as an indicator of openness-related environmental conditions. The enclosure ratio was calculated as the proportion of building façade and wall pixels relative to the total number of image pixels. Signage visibility was quantified as the proportion of pixels occupied by wayfinding-related elements, including text, symbols, and directional signage. These indicators were operationalised as proxy-based environmental descriptors intended to characterise comparative spatial conditions across adult-height and child-height viewpoints rather than direct measurements of children’s subjective environmental interpretation or navigation performance.

3.4. Proxy Index Construction and Synthetic Score Generation

To construct literature-informed proxy perceptual indices, four dimensions were defined: safety, comfort, enjoyment, and legibility. Each proxy index was computed as a weighted linear combination of the seven extracted environmental features, with weights derived from empirical findings in developmental psychology and built-environment perception research. The proposed framework is explicitly intended as a proof-of-concept proxy assessment methodology rather than a substitute for children’s self-reported experience, behavioural observation, or experimental navigation assessment. Accordingly, the proxy indices were designed to identify spatial conditions potentially associated with child-relevant environmental exposure patterns and to flag areas warranting further empirical investigation, rather than replicate or predict children’s actual subjective responses. The resulting proxy scores should therefore be interpreted as exploratory, theory-informed environmental indicators rather than formally validated psychological or behavioural measures. At the current stage, no child self-report dataset, observational dataset, behavioural trace data, or experimental wayfinding dataset was available for external calibration of the proxy indices.

The weighting schemes were derived from published research concerning developmental spatial cognition and child-friendly environmental assessment. The general form of each proxy index is expressed as: Index = Σ(w_i × f_i), where f_i denotes the normalised value of environmental feature i and w_i represents its age-group-specific weighting coefficient. Positive contributors (e.g., greenness, openness, signage visibility, activity-related elements) increase proxy index values, whereas negative contributors (e.g., enclosure ratio, visual complexity) reduce them. Younger children (6–8 years) were assumed to exhibit stronger dependence on proximal visual cues and immediate environmental references based on developmental cognition literature [2,3]. Accordingly, the proxy indices for this age group assigned comparatively greater weighting to child-height environmental indicators (e.g., child-height openness weight: 0.20; child-height greenness: 0.15) than to broader spatial organisation variables. Older children (9–12 years), who demonstrate more advanced extrinsic spatial abilities and environmental orientation capacities in previous developmental studies, were assigned comparatively greater weighting for connectivity, signage visibility, and spatial organisation indicators (e.g., walking distance weight: 0.22; signage visibility: 0.18).

To reflect inherent variability within the proxy indices, Gaussian noise (μ = 0, σ = 0.1) was added to all computed proxy scores prior to Likert-scale mapping. This noise level was selected to represent plausible inter-individual variation in perceptual sensitivity (approximately ± 0.1 Likert units at one standard deviation) while preserving the signal from the weighted feature contributions; a sensitivity analysis confirmed that varying σ between 0.05 and 0.20 did not materially alter the relative ranking of sampling locations. The final proxy scores were linearly mapped to a 1–5 Likert-type scale using min–max normalization: Likert score = 1 + 4 × (score − score^min)/(score^max − score^min). The resulting scale was adopted to facilitate interpretability and comparison across sampling locations within the proxy assessment framework rather than to replicate formal psychometric survey measurements.

3.5. Associations Between Environmental Features and Proxy Perceptual Indices

To explore associations between environmental features and the proxy perceptual indices, XGBoost was employed as a supplementary analytical tool using the xgboost Python library (v1.7.0) in combination with scikit-learn. The model examined relationships between 14 environmental variables derived from semantic segmentation (7 visual features across 2 viewing heights) and 3 GIS based accessibility indicators. Hyperparameter optimisation was conducted using 5-fold cross-validation combined with grid-search procedures across learning rates (0.01–0.30), maximum tree depths (3–10), and subsample ratios (0.60–1.00). The final model adopted a learning rate of 0.1, a maximum depth of 6, 300 estimators, and L2 regularisation (λ = 1.0) to balance model fit and overfitting risk. Cross-validation was incorporated to improve model robustness and reduce instability associated with the relatively heterogeneous spatial dataset. However, due to the relatively close spatial proximity among sampling locations, the adopted random train-test partitioning strategy may still introduce potential spatial auto correlation and spatial leakage effects. Accordingly, the reported model performance should be interpreted as preliminary exploratory explanatory performance rather than fully spatially independent predictive accuracy. Future research should incorporate spatial block cross-validation or leave-zone-out validation strategies to further minimize spatial leakage and strengthen the robustness of spatial generalization assessment.

Random Forest models were additionally implemented as comparative baseline models using scikit-learn (v1.3.0) with 500 estimators under default parameter settings. Model performance was evaluated using stratified 80–20 train–test splits separated by age group to ensure balanced representation across analytical subsets. Evaluation metrics included the coefficient of determination (R²), mean absolute error (MAE), and root mean squared error (RMSE). Paired t-tests across cross-validation folds were further employed to assess the statistical significance of performance differences between XGBoost and Random Forest models. The two age groups (6–8 years and 9–12 years) were trained as separate analytical models to capture developmental differences in proxy environment–outcome associations across age-group-specific proxy assessment frameworks.

3.6. Explainability Analysis

The Tree Explainer algorithm implemented in the SHAP Python library (v0.42.0) was used to compute SHAP values for all XGBoost models. SHAP analysis provided both global rankings of feature importance and local explanations for individual model outputs. Global feature importance was quantified as the mean absolute SHAP value for each feature across all sampling locations, thereby identifying environmental characteristics exhibiting the strongest and most consistent associations with proxy model outputs. Comparisons of feature-importance rankings across age-group models were further used to examine developmental differences in proxy environment–outcome associations derived from the literature-informed weighting structure.

To interpret feature contributions at the local scale, SHAP-based explanations were generated for representative locations exhibiting high and low proxy index scores. Force plots were used to illustrate how individual environmental variables shifted model outputs above or below baseline values for specific sampling locations. Dependence plots were additionally employed to visualise nonlinear associations between environmental variables and proxy index outcomes, thereby enabling the identification of threshold effects and interaction patterns within the exploratory analytical framework.

3.7. Spatial Analysis and Mismatch Mapping

In ArcGIS, the proxy index scores were spatially interpolated across the entire cultural complex using inverse distance weighting (IDW). This interpolation process generated continuous proxy-based spatial distribution maps for each of the four analytical dimensions (safety, comfort, enjoyment, and legibility) across the two age-group-specific proxy assessment models. Heatmaps were visualised using a red-to-green gradient representing relatively low and high proxy index values, respectively, with contour intervals of 0.1 units to facilitate the identification of spatial concentration patterns and potential environmental problem areas.

Mismatch zones were classified into two analytical categories. The first category comprised locations with comparatively high environmental quality indicators but relatively low proxy index scores, indicating areas where favourable physical environmental conditions did not correspond with comparatively positive proxy assessment outcomes. Within the proof-of-concept framework, these areas were interpreted as locations where spatial configurations may warrant further empirical investigation using child-centred observational or participatory methods. The second category included locations characterised by comparatively low environmental quality indicators and low proxy index scores, representing spatial areas where both environmental conditions and proxy assessment outcomes were comparatively unfavourable. These zones were identified as priority areas for future child-sensitive environmental review and design evaluation within the proxy-based assessment framework.

4. Results

4.1. Environmental Feature Distribution and Height-Based Variations

The semantic segmentation analysis generated environmental feature indicators across 480 sampling locations, revealing substantial differences between adult-height (1.6 m) and child-height (1.2 m) viewpoints. Table 1 presents descriptive statistics for all extracted environmental variables. The mean greenness at adult height was 0.291, compared with 0.340 at child height, representing an approximately 17% difference between viewing heights. The similar pattern was observed for openness-related environmental conditions, with mean openness values decreasing from 0.504 at adult height to 0.406 at child height, corresponding to an approximately 19% reduction in visible openness within the child-height imagery. Conversely, enclosure-related environmental conditions were more pronounced at child height, with the mean enclosure ratio increasing from 0.491 at adult height to 0.637 at child height, representing an approximately 30% increase in enclosure exposure within the child-height viewpoint.

Table 1 further illustrates the distributional differences between adult-height and child-height environmental viewpoints. The most pronounced disparity was observed in signage visibility, where the mean child-height value (0.068) represented only approximately 34% of the signage visibility recorded at adult height (0.198). In addition, more than half of the child-height images (55%) exhibited comparatively low wayfinding-related visibility scores.

By contrast, activity-related environmental elements were more prominent within the child-height imagery, with mean values increasing from 0.194 at adult height to 0.291 at child height, indicating a greater proportion of activity-oriented visual features within lower-height environmental exposure conditions. Visual complexity also increased at the 1.2 m viewpoint, rising from 0.486 at adult height to 0.593 at child height, representing an approximately 22% increase. This difference reflects variations in the visible composition of façades, surfaces, and near-ground environmental elements across the two viewing perspectives.

4.2. Perception Scores and Age-Group Differences

Table 2 presents the proxy index scores across the four analytical dimensions. The safety-related proxy scores were lower in the younger age-group model (6–8 years; 2.663) than in the older age-group model (9–12 years; 3.108), representing a difference of 0.444. In contrast, only minor differences were observed in comfort-related proxy scores between the two age-group models (3.200 vs. 3.205), indicating relatively similar comfort-related environmental conditions across the analysed spatial settings.

The enjoyment-related proxy scores were comparatively higher in the younger age-group model (3.327) than in the older age-group model (3.135). Both age-group models exhibited comparatively lower legibility-related proxy scores (2.825 vs. 2.951), suggesting reduced wayfinding-related environmental clarity across substantial portions of the cultural complex. Within the proxy assessment framework, these comparatively lower legibility scores indicate that signage visibility and spatial orientation conditions were less favourable relative to the other analytical dimensions.

Comparisons between the two age-group-specific proxy assessment models are illustrated in Figure 3.

4.3. Machine Learning Model Performance

Table 3 and Table 4 present the model-performance metrics for the proxy assessment framework. Within the younger age-group model (6–8 years), XGBoost produced R² values ranging from 0.185 for the comfort-related proxy index to 0.356 for the legibility-related proxy index. The highest explanatory performance within this age-group model was observed for the legibility proxy index (R² = 0.356; MAE = 0.485; RMSE = 0.596). The safety-related proxy index achieved an R² value of 0.315, with corresponding MAE and RMSE values of 0.404 and 0.514, respectively.

Random Forest models produced broadly comparable performance patterns, with the highest explanatory performance also observed for the legibility-related proxy index (R² = 0.390). Within the older age-group model (9–12 years), the model R² values ranged from 0.141 to 0.347. For the safety-related proxy index, Random Forest achieved comparatively higher explanatory performance than XGBoost (R² = 0.346 vs. 0.256).

As illustrated in Figure 4, the environmental variables explained approximately 14–39% of the variance in the literature-informed proxy perceptual indices across the different analytical dimensions and age-group-specific models. These results indicate moderate exploratory associations between environmental characteristics and proxy assessment outcomes within the proof-of-concept framework, rather than direct prediction of children’s actual perceptual or behavioural responses.

4.4. SHAP Feature Importance Analysis

Figure 5 presents SHAP-based feature-importance rankings for the safety-related proxy index within the younger age-group model. Walking distance emerged as the most influential environmental variable (mean |SHAP| = 0.118), with greater distances generally associated with comparatively lower proxy safety scores across sampling locations. Child-height openness ranked second in importance (0.111), followed by adult-height sky-view factor and child-height signage visibility. Visual complexity exhibited a comparatively negative association with the safety-related proxy index (0.086), whereby higher visual-complexity values were generally associated with lower proxy safety outcomes within the model outputs.

Figure 6 presents the SHAP beeswarm plot, which reveals several nonlinear relationships between environmental variables and safety-related proxy outcomes. Walking distance demonstrated a relatively stable directional association, with increasing distances generally corresponding to comparatively lower proxy safety scores. Openness exhibited a nonlinear distribution pattern in which intermediate openness values were associated with comparatively higher proxy safety scores, whereas both comparatively low and comparatively high openness values corresponded to reduced proxy outcomes. Differences in SHAP-value distributions between the younger and older age-group models further suggest variations in how environmental characteristics are associated with literature-informed proxy perceptual indices across the two developmental analytical frameworks.

4.5. Spatial Distribution Patterns

Figure 7 illustrates the spatial distribution of predicted proxy perception scores for the younger age-group model. To ensure consistency with the predefined 1–5 Likert-type proxy scale, all mapped prediction values were verified and constrained to this range before visualization. A unified colour scale was applied across all four dimensions, where lower values indicate comparatively less favourable proxy conditions and higher values indicate comparatively more favourable proxy conditions.

Safety-related proxy scores exhibited substantial spatial variability, with comparatively lower-score zones mainly distributed along peripheral regions and comparatively higher-score zones concentrated around major circulation areas. Comfort-related proxy scores displayed a comparatively more uniform spatial pattern, with higher values occurring in areas characterised by greater greenness and increased provision of environmental amenities. Enjoyment-related proxy scores showed moderate-to-high values across large portions of the study area, indicating generally favourable proxy assessment outcomes across multiple circulation and activity zones.

By contrast, legibility-related proxy scores demonstrated more extensive lower-score distributions across several spatial regions, suggesting comparatively limited wayfinding-related visibility and orientation conditions within substantial portions of the cultural complex.

4.6. Spatial Mismatch Analysis

Figure 8 illustrates the spatial relationship between environmental quality indicators and proxy-predicted perception scores. Mismatch analysis identified 47 locations (9.8%) classified as Higher Environmental Quality × Lower Proxy Perception and 73 locations (15.2%) classified as Lower Environmental Quality × Higher Proxy Perception.

The Higher Environmental Quality × Lower Proxy Perception category referred to locations where comparatively favourable environmental quality conditions corresponded with comparatively lower proxy perception outcomes within the analytical framework. Conversely, the Lower Environmental Quality × Higher Proxy Perception category represented locations where comparatively lower environmental quality conditions corresponded with comparatively higher proxy perception outcomes.

Spatially, Higher Environmental Quality × Lower Proxy Perception locations were more frequently observed in the eastern and northern peripheral areas of the study site, whereas Lower Environmental Quality × Higher Proxy Perception locations appeared more commonly near central gathering and circulation zones. Overall, approximately 25% of the sampling locations exhibited discrepancies between environmental quality indicators and proxy-based perception assessments.

Importantly, these mismatch categories should not be interpreted as direct evidence of children’s actual subjective dissatisfaction or behavioural difficulty, because the perception measures were derived from literature-informed proxy indicators rather than direct child-reported evaluations. Instead, the observed mismatch patterns indicate areas where environmental quality metrics and proxy perception-related spatial characteristics exhibited comparatively inconsistent spatial relationships within the analytical framework.

This local mismatch pattern echoes broader evidence that built-environment provision may not always correspond closely with population experience or usage patterns across urban contexts [45], suggesting that design–perception inconsistencies may represent a broader structural challenge rather than a purely site-specific phenomenon.

5. Discussion

5.1. Age-Related Differences and Design Implications

The proxy index analysis demonstrated substantial age-related differences in proxy perceptual conditions across all four dimensions. Younger children recorded proxy safety scores 0.444 points lower than older age-group users and exhibited greater sensitivity to visual complexity, enclosure, and navigational visibility conditions. These findings are consistent with developmental cognition research suggesting that intrinsic spatial abilities develop earlier than extrinsic spatial abilities during childhood, thereby increasing children’s reliance on proximal visual references and immediate environmental cues in large-scale navigation contexts [5,6,7,13,34]. Similar patterns have also been identified in child-friendly street studies, where environmental visibility, enclosure, and spatial complexity were found to influence children’s mobility behaviour and environmental interpretation [16,17]. The comparatively higher explanatory performance of the XGBoost models for younger children (R² up to 0.356) relative to older age-group models (R² up to 0.347) further aligns with Liu et al. [36], who suggested that environmental characteristics may exert comparatively stronger deterministic influences on child users within child-friendly environments. More broadly, the findings support environmental psychology research indicating that children’s spatial behaviour and environmental interpretation are closely associated with the availability, accessibility, and legibility of environmental affordances within large public environments [46].

Spatial processing showed more advanced skills in older children, with higher proficiency in interpreting abstract wayfinding cues and in mental imagery of spatial layouts. Nevertheless, they had lower enjoyment scores (0.192 points lower), suggesting that older users are not appropriately challenged by environments that attract younger users. This age-related divergence partially addresses the first research question regarding how developmental differences may influence proxy spatial perception within large-scale cultural complexes. The results suggest that immediate visual environmental characteristics were more strongly associated with younger age-group proxy outcomes, whereas older age-group models demonstrated comparatively broader associations with organisational and accessibility-related variables. The argument for this developmental divergence suggests a spatial differentiation of cultural complexes, with ideating primary zones of activity serving particular age groups, and shared circulation spaces serving all users through layered design strategies.

The machine learning models demonstrated higher exploratory explanatory performance among younger children (R² up to 0.356) than older children (R² up to 0.347), implying that environmental factors have a stronger deterministic effect on younger users. The dissimilarity of older children is characterized by greater individual variation, effects of social conditions, and prior experiences that are not captured in a visual analysis of the environment. This interpretation aligns with environmental psychology research emphasising that children’s spatial experience is shaped not only by physical visibility conditions, but also by social interaction, behavioural familiarity, memory, and environmental affordances [10,12,16]. This result indicates that although environmental change could significantly enhance the experience of younger children, older children also need other interventions that consider social dynamics, the structural diversity of programs, and opportunities for mental experiences.

The analysis of age groups confirms the theoretical framework that younger children rely more on extrinsic environmental support to develop spatial understanding. Intervention design must focus on improving legibility, simplifying visual spaces, and providing clearer supervision indications for spaces mainly used by 6–8-year-olds, and add more complexity, interactive exploration, and autonomous navigation capabilities to spaces serving 9–12-year-olds.

5.2. Advantages of the Proxy Assessment Framework over Conventional Methods

The proposed proxy assessment framework demonstrated notable advantages compared with traditional child-friendly design assessment methods. Conventional techniques based on human opinion or small-scale surveys offer limited spatial coverage, subjective interpretations, and are costly to collect data. This experiment was a systematic study of 480 locations, providing an in-depth proxy-based assessment of spatial conditions at 10 m above each location within a 95,000 m² complex. This level of granular spatial analysis is not achievable with conventional survey approaches, which would require large-scale participant recruitment and supervised data-collection sessions with children.

The semantic segmentation method, which derives features from imagery at dual heights (1.6 m and 1.2 m), captures differences in developmental perspective that cannot be evaluated using either the method of adult description or architectural plan analysis. The 34% lower signage visibility and the 30% greater enclosure would go unnoticed in a typical design review and lead to the creation of environments that seem satisfactory to adults but are problematic for children. The objective measure of these differences provides compelling grounds for altering the child-centred design.

At the same time, the findings should be interpreted cautiously within the broader context of AI-assisted environmental perception research. The existing literature on environmental psychology and affordance theory suggests that spatial experience is not determined solely by visible environmental characteristics, but also by embodied interaction, emotional attachment, social behaviour, and contextual familiarity [10,16,40]. Consequently, semantic segmentation and explainable machine learning should not be interpreted as direct measurements of children’s actual perception, but rather as scalable proxy representations of environmental exposure patterns.

SHAP-based explainability transformed analytical models into precise indications of the actions that needed to be taken by identifying specific features of the environment to incorporate. The framework did not produce opaque scores for perception but instead shows that walking distance, openness, and signage visibility are the most significant levers to improve safety. This partially addresses the second research question concerning whether AI-assisted dual-height analysis can identify spatial mismatches that are difficult to detect through conventional adult-centred assessment approaches. The mismatch analysis and SHAP-based associations suggest that child-height environmental conditions differ substantially from adult-height interpretations in several spatial zones, particularly with regard to enclosure, visibility, and circulation-related variables.

Such a mechanistic understanding also allows for design decisions supported by proxy-based diagnostic evidence and the formulation of testable questions about intervention effectiveness. The mismatch analysis identified 47 areas where environmental quality did not contribute to a positive perception, and it reallocated resources toward high-impact retrofit areas.

Nevertheless, the present study also reflects broader limitations frequently discussed in intelligent urban analysis research. While AI-based image analysis can systematically quantify environmental characteristics across large spatial areas, it may oversimplify human perceptual processes by privileging visually measurable attributes over behavioural, emotional, and socially mediated dimensions of experience [3,38,42]. This limitation is particularly relevant in child-friendly environment studies, where perception is closely connected to movement, play behaviour, and social interaction.

5.3. Limitations and Future Research Directions

Several limitations should be noted. Although the proxy perceptual index is generated based on developmental psychology, it is not specifically substantiated by children’s self-reports. To reduce the effects of the proxy index models, future studies are needed to obtain empirical survey data to test the hypothesis that proxy-derived spatial patterns are associated with measured responses. The proposed framework should therefore be interpreted as an exploratory spatial proxy assessment approach rather than a substitute for direct child-based perceptual evaluation. Future studies could combine AI-assisted environmental analysis with child-reported surveys, behavioural observation, participatory mapping, eye-tracking, or VR-based perception experiments to establish stronger convergent validity between proxy indices and embodied spatial experience. In addition, the reported R²values should be interpreted as preliminary exploratory explanatory associations rather than robust high-precision predictive outcomes. Future research should further incorporate spatial block cross-validation or leave-zone-out validation strategies to minimise potential spatial leakage effects and improve the robustness of spatial generalisation assessment.

The pure use of environmental visual features does not account for acoustic effects, thermal comfort, crowding levels, and social dynamics, all of which play major roles in enhancing children’s experience. These extra dimensions could be captured through multimodal sensing, including thermal cameras, acoustic monitors, and occupancy tracking.

The analysis of the static assumes constant environmental factors, disregarding changes in perception over time, lighting, weather, and programmed activities. Dynamic patterns in perceptions and relationships between features and experiences over time and across events and seasons can be observed from longitudinal data. His single cultural complex in Shenzhen will not be generalizable to other architectural, cultural, or urban typologies.

The mismatch analysis identified problem areas, but lacked longitudinal follow-up to assess how the design interventions benefit the children. The next generation must introduce controlled changes in the observation areas of the previous mismatch, and a post-intervention sample will test the diagnostic utility of the framework and effectiveness of the applied intervention.

6. Conclusions

This study advances current child-friendly environment research by proposing a dual-height AI-assisted proxy assessment framework capable of systematically characterising child-relevant environmental exposure within large-scale cultural complexes. Rather than relying exclusively on conventional adult-centred evaluation approaches, the framework integrates semantic segmentation, GIS-based spatial analysis, explainable machine learning, and spatial mismatch mapping to identify spatial conditions that may remain insufficiently visible in traditional design assessment workflows. In doing so, the study contributes a scalable methodological approach for incorporating child-height environmental analysis into early-stage architectural and urban design evaluation.

Application of the framework at the Longgang Cultural Centre in Shenzhen demonstrated that environmental characteristics captured through dual-height image analysis exhibit differentiated associations with proxy perceptual conditions across age-group models, particularly in relation to visibility, enclosure, openness, and circulation-related environmental variables. These findings extend existing environmental psychology and child-friendly urbanism research by suggesting that developmental differences in spatial experience may be associated not only with general environmental quality, but also with age-sensitive variations in visual accessibility, spatial legibility, and environmental affordances within large public environments.

The framework demonstrated notable advantages over conventional assessment approaches in terms of spatial coverage, systematic characterisation, and spatially explicit design guidance. SHAP analysis showed non-linear associations between features and the optimal design thresholds. In contrast, spatial mismatch mapping identified 47 locations of high quality that could not facilitate positive experiences and 73 locations of low quality that were over expectations.

Methodologically, the study contributes to emerging AI-assisted built-environment research by demonstrating how explainable machine-learning techniques can be combined with spatial environmental analysis to support interpretable child-relevant environmental screening rather than purely predictive modelling. At the same time, the study emphasises that AI-derived proxy indices should be interpreted cautiously as exploratory representations of environmental exposure rather than direct measurements of children’s embodied perceptual experience. Accordingly, the proposed framework should be understood as a supplementary spatial diagnostic tool intended to support child-friendly design review, prioritisation, and hypothesis generation, rather than as a substitute for child-centred participatory evaluation or behavioural validation.

In practice, it equips designers with proxy-based diagnostic tools to screen and prioritise child-friendly spatial interventions in high-density urban environments, with findings constituting early-stage evidence for design review rather than a substitute for child-reported validation. At the same time, the study emphasises that AI-derived proxy indices should be interpreted cautiously as exploratory representations of environmental exposure rather than direct measurements of children’s embodied perceptual experience. Future research should therefore combine AI-assisted spatial analysis with child-reported surveys, behavioural observation, longitudinal validation, and multi-height environmental simulation to further strengthen the empirical applicability and generalisability of proxy-based child-friendly assessment frameworks.

Author Contributions

Conceptualization, Y.S. and F.Z.; methodology, Y.S.; software, Y.S.; validation, Y.S. and F.Z.; formal analysis, Y.S.; investigation, Y.S.; resources, Y.S.; data curation, F.Z.; writing—original draft preparation, Y.S. and S.Z.; writing—review and editing, S.Z.; visualization, Y.S.; supervision, S.Z.; project administration, Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the China Postdoctoral Science Foundation (Grant No. 2025M781541) and the Postdoctoral Fellowship Program of the China Postdoctoral Science Foundation (Grant No. GCZ20251117).

Institutional Review Board Statement

Ethical review and approval were not required for this study in accordance with the Measures for the Review of Scientific and Technological Ethics (Trial), jointly issued by the Ministry of Science and Technology of China and other national ministries and commissions, because the research did not involve human participants, identifiable personal data, behavioural experiments, or clinical intervention.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ataol, S.; Krishnamurthy, S.; Van Wesemael, P. Children’s Participation in Urban Planning and Design: A Systematic Review. Child. Youth Environ. 2019, 29, 27. [Google Scholar] [CrossRef]
Nordström, M. Children’s views on child-friendly environments in different geographical, cultural and social neighbourhoods. Urban Stud. 2010, 47, 514–528. [Google Scholar] [CrossRef]
Yu, M.; Chen, X.; Zheng, X.; Cui, W.; Ji, Q.; Xing, H. Evaluation of spatial visual perception of streets based on deep learning and spatial syntax. Sci. Rep. 2025, 15, 18439. [Google Scholar] [CrossRef]
Pilarczyk, A.; Kondak, A.; Grzelka, K.; Vizzari, M.; Bieda, A. Assessing the perception of urban landscape from the perspective of residents of multi-storey buildings. Cities 2026, 169, 106594. [Google Scholar] [CrossRef]
Hodgkiss, A.; Gilligan-Lee, K.A.; Thomas, M.S.C.; Tolmie, A.K.; Farran, E.K. The developmental trajectories of spatial skills in middle childhood. Br. J. Dev. Psychol. 2021, 39, 566–583. [Google Scholar] [CrossRef] [PubMed]
Fernández-Méndez, L.M.; Contreras, M.J.; Mammarella, I.C.; Feraco, T.; Meneghetti, C. Mathematical achievement: The role of spatial and motor skills in 6–8 year-old children. PeerJ 2020, 8, e10095. [Google Scholar] [CrossRef]
Lasc, D.; Grinshpun, S.; Bixter, M.T.; Yang, Y. Predicting large-scale spatial ability from small-scale spatial abilities in children: An application of the double-dimension framework. Cognition 2025, 254, 105982. [Google Scholar] [CrossRef] [PubMed]
Iurchyshyn, O. Involving children in design processes: A systematic review. Civ. Archit. Eng. 2025, 24, 25004. [Google Scholar] [CrossRef]
Malinverni, L.; Schaper, M.-M.; Pares, N. An evaluation-driven design approach to develop learning environments based on full-body interaction. Educ. Technol. Res. Dev. 2016, 64, 1337–1360. [Google Scholar] [CrossRef]
Ishikawa, T. Spatial thinking, cognitive mapping, and spatial awareness. Cogn. Process. 2021, 22, 89–96. [Google Scholar] [CrossRef]
Zhang, J.; Wang, Z.; Antwi, C.O.; Liang, X.; Ge, J. Geospatial Thinking and Sense of Place: The Mediating Role of Creativity. Sustainability 2022, 15, 523. [Google Scholar] [CrossRef]
Liu, D.; Lou, M. How spatial perception and psychological distance activate immersive experiences for tourists—An expanded study of the mind-body dualism. Kybernetes 2025, ahead of printing. [Google Scholar] [CrossRef]
Mix, K.S.; Levine, S.C.; Cheng, Y.-L.; Young, C.J.; Hambrick, D.Z.; Konstantopoulos, S. The Latent Structure of Spatial Skills and Mathematics: A Replication of the Two-Factor Model. J. Cogn. Dev. 2017, 18, 465–492. [Google Scholar] [CrossRef]
Gilligan, K.A.; Flouri, E.; Farran, E.K. The contribution of spatial ability to mathematics achievement in middle childhood. J. Exp. Child Psychol. 2017, 163, 107–125. [Google Scholar] [CrossRef]
Assel, M.A.; Landry, S.H.; Swank, P.; Smith, K.E.; Steelman, L.M. Precursors to Mathematical Skills: Examining the Roles of Visual-Spatial Skills, Executive Processes, and Parenting Factors. Appl. Dev. Sci. 2003, 7, 27–38. [Google Scholar] [CrossRef]
Weir, H. Spaces for children’s play and travel close to home: The importance of threshold spaces. Child. Geogr. 2023, 21, 1071–1086. [Google Scholar] [CrossRef]
Su, C.; Cheng, Y.; Chen, S.; Li, W.; Nie, K.; Ding, Z. Perception of Child-Friendly Streets and Spatial Planning Responses in High-Density Cities Amidst Supply–Demand Disparities. Buildings 2025, 15, 3908. [Google Scholar] [CrossRef]
Olawade, D.B.; Wada, O.Z.; Ige, A.O.; Egbewole, B.I.; Olojo, A.; Oladapo, B.I. Artificial intelligence in environmental monitoring: Advancements, challenges, and future directions. Hyg. Environ. Health Adv. 2024, 12, 100114. [Google Scholar] [CrossRef]
Patil, M. Environmental Sustainability in the Age of Deep Learning: Balancing Technological Advancement with Ecological Responsibility. J. Electr. Syst. 2024, 20, 2377–2388. [Google Scholar] [CrossRef]
Li, R.; Zhao, J.; Fan, Y. Research on CTSA-DeepLabV3+ Urban Green Space Classification Model Based on GF-2 Images. Sensors 2025, 25, 3862. [Google Scholar] [CrossRef]
Liu, H.; Chen, Y.; Wang, R.; Li, M.; Li, Z. MFA-Deeplabv3+: An improved lightweight semantic segmentation algorithm based on Deeplabv3+. Complex Intell. Syst. 2025, 11, 424. [Google Scholar] [CrossRef]
Anilkumar, P.; Venugopal, P.; Satheesh Kumar, S.; Jagannadha Naidu, K. Adaptive multilevel attention deeplabv3+ with heuristic based framework for semantic segmentation of aerial images using improved golden jackal optimization algorithm. Results Eng. 2024, 24, 103164. [Google Scholar] [CrossRef]
Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. arXiv 2018, arXiv:1802.02611. [Google Scholar] [CrossRef]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. arXiv 2016, arXiv:1612.01105. [Google Scholar] [CrossRef]
Cheng, X.; Chen, J.; Li, J.; Yin, J.; Cheng, Q.; Chen, Z.; Li, X.; You, H.; Han, X.; Zhou, G. Enhanced DeepLabV3+ with OBIA and Lightweight Attention for Accurate and Efficient Tree Species Classification in UAV Images. Sensors 2025, 25, 7501. [Google Scholar] [CrossRef]
Soundararajan, J.; Kalukin, A.; Malof, J.; Xu, D. Deep Learning-Driven Multi-Temporal Detection: Leveraging DeeplabV3+/Efficientnet-B08 Semantic Segmentation for Deforestation and Forest Fire Detection. Remote Sens. 2025, 17, 2333. [Google Scholar] [CrossRef]
Wang, Y.; Wang, C.; Wu, H.; Chen, P. An improved Deeplabv3+ semantic segmentation algorithm with multiple loss constraints. PLoS ONE 2022, 17, e0261582. [Google Scholar] [CrossRef]
Xia, D.; Wu, Z.; Zou, Y.; Chen, R.; Lou, S. Developing a bottom-up approach to assess energy challenges in urban residential buildings of China. Front. Archit. Res. 2025, 14, 6. [Google Scholar] [CrossRef]
Zou, Y.; Ren, S.; Wu, Z.; Yao, L.; Lou, S.; Li, H.; Huang, Y.; Lin, J. Evaluating the impact of urban climate on building energy: Discrepancies during the hottest period in a sub-tropical city. Energy 2025, 341, 139431. [Google Scholar] [CrossRef]
Du, S.; Du, S.; Liu, B.; Zhang, X.; Zheng, Z. Large-scale urban functional zone mapping by integrating remote sensing images and open social data. GIScience Remote Sens. 2020, 57, 411–430. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Chen, J.; Li, P.; Lei, Y.; Li, H.; Zhang, D.; Chen, B.; Liu, J.; Schnabel, M.A. Unraveling the nexus between spatial quality and buzz behavior: Analyzing geo-tagged social media and multisource spatial data using text mining and XGBoost. Appl. Geogr. 2026, 186, 103858. [Google Scholar] [CrossRef]
Li, T.; Xu, H.; Sun, H. Spatial Patterns and Multi-Dimensional Impact Analysis of Urban Street Quality Perception under Multi-Source Data: A Case Study of Wuchang District in Wuhan, China. Appl. Sci. 2023, 13, 11740. [Google Scholar] [CrossRef]
Ahmadi, E.; Taniguchi, G. Influential Factors on Children′s Spatial Knowledge and Mobility in Home–School Travel A Case Study in the City of Tehran. J. Asian Archit. Build. Eng. 2007, 6, 275–282. [Google Scholar] [CrossRef]
Wang, R.; Lu, Y.; Zhang, J.; Liu, P.; Yao, Y.; Liu, Y. The relationship between visual enclosure for neighbourhood street walkability and elders’ mental health in China: Using street view images. J. Transp. Health 2019, 13, 90–102. [Google Scholar] [CrossRef]
Liu, X.; Lu, P.; Van Ameijde, J. Development of a machine-simulated human scoring model for assessing child-friendly street environments: A case study of Sham Shui Po, Hong Kong SAR, China. J. Chin. Archit. Urban. 2024, 7, 3578. [Google Scholar] [CrossRef]
Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]
Li, Z. Extracting spatial effects from machine learning model using local interpretation method: An example of SHAP and XGBoost. Comput. Environ. Urban Syst. 2022, 96, 101845. [Google Scholar] [CrossRef]
Xie, W.; Yao, L.; Xia, D.; Zou, Y.; Lou, S.; Yang, L. Occupant behavior and building type jointly shape indoor heat stress in Guangzhou urban villages. Build. Environ. 2026, 296, 114505. [Google Scholar] [CrossRef]
Jansson, M.; Herbert, E.; Zalar, A.; Johansson, M. Child-Friendly Environments—What, How and by Whom? Sustainability 2022, 14, 4852. [Google Scholar] [CrossRef]
Kee, T.; Ho, W.K.O. EXplainable Machine Learning for Real Estate: XGBoost and Shapley Values in Price Prediction. Civ. Eng. J. 2025, 11, 2116–2133. [Google Scholar] [CrossRef]
Chen, L.; Cai, X.; Liu, Z. Multi-Source Data and Semantic Segmentation: Spatial Quality Assessment and Enhancement Strategies for Jinan Mingfu City from a Tourist Perception Perspective. Buildings 2025, 15, 2298. [Google Scholar] [CrossRef]
Zhang, F.; Zhou, B.; Liu, L.; Liu, Y.; Fung, H.H.; Lin, H.; Ratti, C. Measuring human perceptions of a large-scale urban region using machine learning. Landsc. Urban Plan. 2018, 180, 148–160. [Google Scholar] [CrossRef]
Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. arXiv 2016, arXiv:1604.01685. [Google Scholar] [CrossRef]
Lou, S.; Huang, Y.; Zou, Y.; Xia, D. Mapping the mismatch between building and population growth: A global study of 1700 cities. iScience 2025, 28, 113289. [Google Scholar] [CrossRef] [PubMed]
Kyttä, M. The extent of children’s independent mobility and the number of actualized affordances as criteria for child-friendly environments. J. Environ. Psychol. 2004, 24, 179–198. [Google Scholar] [CrossRef]

Figure 1. Geographic context and location of the study site.

Figure 2. Flowchart.

Figure 3. Children’s spatial perception scores by age group.

Figure 4. Model performance.

Figure 5. Mean SHAP value.

Figure 6. SHAP value plot: impact of features on safety perception (younger children).

Figure 7. Spatial distribution of predicted proxy perception scores for younger children aged 6–8 years. All mapped values were constrained to the predefined 1–5 Likert-type proxy scale, with lower values indicating less favourable proxy conditions and higher values indicating more favourable proxy conditions.

Figure 8. Spatial relationship between environmental quality and proxy perception scores.

Table 1. Descriptive statistics.

Variable	Mean	Std Dev	Min	Max
Greenness adult	0.291	0.153	0.007	0.738
Openness adult	0.504	0.192	0.03	0.969
Enclosure adult	0.491	0.217	0	1
Signage adult	0.198	0.128	0.006	0.622
Activity elements adult	0.194	0.132	0.001	0.743
Visual complexity adult	0.486	0.163	0.152	0.927
Sky view adult	0.503	0.195	0	0.97
Greenness child	0.34	0.17	0	0.866
Openness child	0.406	0.199	0	0.912
Enclosure child	0.637	0.219	0.033	1
Signage child	0.068	0.106	0	0.57
Activity elements child	0.291	0.14	0.012	0.798
Visual complexity child	0.593	0.182	0.066	1
Sky view child	0.409	0.205	0	1

Table 2. Mean perception scores by age group.

	Perception Dimension	Younger Children (6–8 years)	Older Children (9–12 years)	Difference
0	Safety	2.663	3.108	0.444
1	Comfort	3.2	3.205	0.005
2	Enjoyment	3.327	3.135	−0.192
3	Legibility	2.825	2.951	0.126

Table 3. Model performance metrics—younger children (6–8 years).

	Perception	XGBoost R²	XGBoost MAE	XGBoost RMSE	RF R²	RF MAE	RF RMSE
0	safety	0.3153	0.4037	0.5135	0.2938	0.4109	0.5216
1	comfort	0.1847	0.5091	0.6818	0.2863	0.4916	0.6379
2	enjoyment	0.2358	0.4129	0.5387	0.2913	0.3959	0.5187
3	legibility	0.3563	0.4845	0.5963	0.3897	0.4544	0.5807

Table 4. Model performance metrics—older children (9–12 years).

	Perception	XGBoost R²	XGBoost MAE	XGBoost RMSE	RF R²	RF MAE	RF RMSE
0	safety	0.2562	0.4706	0.5963	0.3463	0.4414	0.559
1	comfort	0.1775	0.4765	0.5941	0.2454	0.4425	0.569
2	enjoyment	0.141	0.4916	0.6148	0.1835	0.475	0.5993
3	legibility	0.1497	0.5776	0.7094	0.2467	0.5416	0.6677

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shen, Y.; Zhu, S.; Zhang, F. A Dual-Height AI Framework for Proxy Assessment of Children’s Spatial Perception in a Large Cultural Complex. Buildings 2026, 16, 2030. https://doi.org/10.3390/buildings16102030

AMA Style

Shen Y, Zhu S, Zhang F. A Dual-Height AI Framework for Proxy Assessment of Children’s Spatial Perception in a Large Cultural Complex. Buildings. 2026; 16(10):2030. https://doi.org/10.3390/buildings16102030

Chicago/Turabian Style

Shen, Yingying, Shuyan Zhu, and Fei Zhang. 2026. "A Dual-Height AI Framework for Proxy Assessment of Children’s Spatial Perception in a Large Cultural Complex" Buildings 16, no. 10: 2030. https://doi.org/10.3390/buildings16102030

APA Style

Shen, Y., Zhu, S., & Zhang, F. (2026). A Dual-Height AI Framework for Proxy Assessment of Children’s Spatial Perception in a Large Cultural Complex. Buildings, 16(10), 2030. https://doi.org/10.3390/buildings16102030

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Dual-Height AI Framework for Proxy Assessment of Children’s Spatial Perception in a Large Cultural Complex

Abstract

1. Introduction

2. Literature Review

2.1. Child-Friendly Environments, Age-Sensitive Design, and Children’s Spatial Perception

2.2. Street-View Imagery and Semantic Segmentation in Built Environment Evaluation

2.3. Explainable Machine Learning for Design-Support Analytics

3. Methodology

3.1. Study Area

3.2. Data Collection

3.3. AI Feature Extraction

3.4. Proxy Index Construction and Synthetic Score Generation

3.5. Associations Between Environmental Features and Proxy Perceptual Indices

3.6. Explainability Analysis

3.7. Spatial Analysis and Mismatch Mapping

4. Results

4.1. Environmental Feature Distribution and Height-Based Variations

4.2. Perception Scores and Age-Group Differences

4.3. Machine Learning Model Performance

4.4. SHAP Feature Importance Analysis

4.5. Spatial Distribution Patterns

4.6. Spatial Mismatch Analysis

5. Discussion

5.1. Age-Related Differences and Design Implications

5.2. Advantages of the Proxy Assessment Framework over Conventional Methods

5.3. Limitations and Future Research Directions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI