1. Introduction
Amidst globalization, spontaneous living spaces formed by residents’ habits and complex neighbourhood relationships are ubiquitous within neighbourhoods [
1]. Spontaneous living spaces refer to informally produced micro-spatial interventions created by residents in response to everyday needs, typically emerging within public or semi-public urban interstices, such as alleyways, building setbacks, or residual and underutilized land [
2], These spaces are not part of formal planning or architectural design, but are continuously adapted through daily practices, serving functions such as social interaction, temporary storage, resting, and small-scale domestic activities [
3]. Community-led construction creatively utilizes limited resources and local materials, producing rich spatial and regional characteristics [
4]. These spontaneous living spaces also serve as authentic reflections of urban life [
5,
6], capturing both the lifestyles of local residents and the distinctive fabric of historic districts [
7]. Historic districts, with their unique architectural styles and profound historical legacy, act as valuable repositories of urban memory [
8,
9]. Consequently, spontaneous living spaces within these districts play a crucial role in understanding urban social structures and cultural transmission [
10].
From the mid-twentieth century to the present, both domestic and international research has increasingly focused on spontaneous living spaces in historic districts [
11,
12], reflecting growing recognition of urban cultural diversity and community autonomy. The Death and Life of Great American Cities highlighted the importance of spontaneous activities in old towns for fostering vitality and innovation [
13]. Urban streets are not fully designed but emerge from the daily actions of countless individuals; genuine urban communities are “organic and spontaneous” [
14]. Yoshinobu Ashihara’s theory of the “First Contour” and “Second Contour” in The Aesthetic Townscape similarly captures the tension and unity between architecture and spontaneous spaces [
15]. In 1966, Aldo Rossi noted in The Architecture of the City that the preservation of urban memory depends not only on physical forms but also on the continuous inscription of daily life [
16], offering theoretical support for community-based spontaneous construction. Jan Gehl, in Life Between Buildings, similarly proposed that life within public spaces represents a ‘potential activity’—one realised only when spatial conditions allow [
17]. Accordingly, accessibility, facilities for lingering, and visual connections are pivotal for stimulating spontaneous behavior [
18].
Since the 21st century, Europe has witnessed numerous examples of spontaneous living spaces [
19,
20]. Amsterdam’s De Ceuvel Sustainable Community, through self-organised practices by residents and designers, has created social hubs, including shared gardens and open-air theatres, which are continually innovated to extend their lifecycle [
21]. In Copenhagen’s Strøget, urban administrators have supported and guided residents’ spontaneous activities, successfully integrating them into urban planning and significantly enhancing the street’s vitality [
22,
23]. Community vitality refers to the extent and quality of social interaction and civic engagement among residents within a defined urban neighbourhood, reflecting both the frequency and meaningfulness of interpersonal interactions, the active use of communal spaces [
24,
25] and the residents’ perceived social well-being. This concept emphasises social cohesion, collective practices, and neighbourhood-level participation, rather than general population density or isolated activities. In Japan, “Gradual Protection Zones” allow up to 15% spontaneous spatial increments [
26]. These spontaneous spaces on Japanese streets not only satisfy residents’ quality-of-life needs but also foster urban cultural diversity and creativity. Such cases offer valuable references for studying spontaneous living spaces in China. Previous research on historic districts has predominantly focused on preservation and renewal in high-density, rapidly developing urban contexts. However, the study of spontaneous construction within these districts remains limited [
27], highlighting the urgent need for deeper investigation into the types and formation mechanisms of these spaces.
Spontaneous living spaces enhance community cohesion and promote social interaction, thereby boosting community vitality [
28]. Nevertheless, traditional architecture and aesthetics often conflict with modern development demands [
29], and unplanned use of these spaces can result in inefficient spatial utilisation or environmental issues [
30]. Excessive reliance on spontaneous living spaces may also create management challenges, while unequal distribution of spatial resources can provoke community conflicts and complicate preservation efforts in historic districts [
31]. Within the context of historic district preservation, the dynamic nature of spontaneous spaces may clash with requirements for historical authenticity. Unauthorised additions by residents can compromise the historical integrity of building facades [
32], as exemplified by the tension between self-built structures in Beijing’s hutongs and cultural heritage protection standards [
33]. Furthermore, the informal nature of these spaces can strain infrastructure; in Cairo’s historic city, street vendors occupying thoroughfares obstructed fire lanes, significantly delaying emergency response times [
34]. Therefore, regulating spontaneous living space formation requires a balanced framework that integrates historic district conservation with community vitality. This can be achieved through tiered management, adaptive design, and social co-governance [
35]. alongside detailed regulatory policies to protect historic character while improving residents’ quality of life.
Recent advancements in urban spatial analysis have increasingly utilised street-view imagery as a valuable data source for studying informal spaces [
36], including spontaneous living spaces. Street-view imagery provides an efficient and scalable method for capturing high-resolution visual data that reflects both the physical characteristics and social interactions within urban environments [
37]. This approach enables large-scale studies in cities, including historic districts, where traditional surveying methods may be limited.
Building on previous studies [
38], this research employs Mask R-CNN [
39], a deep learning model, to automate the identification and segmentation of spontaneous living spaces in the Tanhualin Historic District. By integrating street-view images with GIS and random forest regression models, we quantify the relationships between community vitality and spontaneous space usage, providing insights into how these informal spaces contribute to urban life.
Street-view imagery offers multiple advantages: it captures real-time, dynamic changes in the urban fabric, supports large-scale data collection, and enables automated identification of spatial features via machine learning [
40,
41]. This approach allows a comprehensive and accurate analysis of the interactions between spontaneous living spaces and the built environment, particularly in historic districts where balancing architectural preservation with social vitality is critical [
42]. Employing these techniques facilitates the standardization of spontaneous living space formation while accommodating residents’ needs, thereby avoiding conflicts with historic district character, refining preservation practices, and enhancing understanding of these spaces’impact on neighborly relations and community vitality. Formation mechanism refers to the bottom-up processes through which spontaneous living spaces emerge in urban neighborhoods, shaped by the everyday practices and adaptive behaviors of residents within the constraints of the built environment [
43]. This concept emphasizes incremental, locally-driven spatial interventions rather than formally planned or top-down developments. The formation mechanism captures how resident needs, spatial opportunity, and social interactions collectively give rise to informal micro-spaces within residual or underutilized urban areas.
The core distinction between Mask R-CNN and traditional spatial measurement models lies in their data processing methods and capabilities. Conventional measurement approaches, such as threshold segmentation or edge detection algorithms, typically rely on manually designed features and predefined rules. These methods exhibit limited generalization when faced with complex backgrounds, variable object shapes, or uneven lighting conditions, often resulting in false positives or false negatives. They also require extensive manual parameter tuning for specific scenarios. In contrast, Mask R-CNN, a deep learning-based approach, automatically learns object features through end-to-end training. It simultaneously classifies and localizes objects while generating pixel-level masks, precisely delineating the contours of each target object. This capability offers a significant advantage in handling complex, multi-object scenarios, allowing the identification and segmentation of overlapping or occluding objects with robustness and generalization far exceeding traditional methods.
Similarly, the distinction between Random Forest regression models and traditional spatial network models lies in their approaches to data processing and capturing spatial dependencies. Traditional spatial network models, such as Spatial Autoregression (SAR) or Spatial Error Models (SEM), rely on predefined spatial weight matrices to quantify geographical proximity. These models, grounded in explicit theoretical assumptions, excel at explaining spatial autocorrelation but have limited capacity for capturing nonlinear or complex interactions. They are also sensitive to collinearity among input variables. In contrast, the Random Forest regression model, as a non-parametric machine learning method, does not require a predefined spatial structure. Its strength lies in automatically handling complex nonlinear relationships and high-dimensional data. By constructing multiple decision trees and averaging their outputs, it enhances predictive robustness and accuracy while remaining insensitive to outliers and collinearity. However, Random Forest models do not directly account for spatial dependencies; to address spatial autocorrelation, geographical coordinates or spatial features must typically be included as input variables. While traditional spatial models utilize spatial structure explicitly, Random Forests learn implicitly by treating spatial information as features, demonstrating superior flexibility and predictive performance.
2. Research Methodology
2.1. Methodological Framework
This study focuses on spontaneous living spaces in the Tanhualin district, examining the tension between urban renewal processes and the daily needs of residents. Field investigations were conducted to document the morphological characteristics, spatial dimensions, and functional properties of these spaces [
44], establishing a typological framework [
45], aligned with local conditions while distilling their spatial organization logic and patterns of community interaction. The research explores these informal spaces, naturally formed through residents’ daily practices, as vital components of the city’s self-organizing fabric. It investigates the dynamic equilibrium they establish between meeting contemporary living needs and preserving the district’s historical context.
Considering the current state of the Tanhualin Historic District, this study addresses the following aspects:
- (1)
Field Surveys: Cameras(Canon EOS 5D, Canon Inc., Tokyo, Japan) were used to photograph and collect fundamental information on spontaneous living spaces within the site, including spatial types, functional categories, and user demographics. This data was organized to produce preliminary classifications.
- (2)
Spatial Identification and Typology: Mask R-CNN(implemented using PyTorch, Version 1.13, Meta Platforms, Inc., Menlo Park, CA, USA) was employed to identify spontaneous living spaces from the collected data. Spatial typology was applied to categorize existing spontaneous living spaces within the Tanhualin historical and cultural block into two primary classifications: functional and morphological. Subsequently, the Formal Grammar Method was used to analyze spatial characteristics, incorporating actual site conditions as judgmental factors for classification. Demographic information of residents was also collected alongside spatial data.
- (3)
Spatial Analysis and Community Assessment: The Tanhualin district was subdivided into zones, and statistical data on spontaneous living spaces were used to identify the distinct attributes of each area. GIS technology(ArcGIS Pro, Version 3.0, Esri, Redlands, CA, USA) was employed to generate heat maps of these zones. Concurrently, a questionnaire survey assessed local residents’ usage patterns and satisfaction levels regarding spontaneous living spaces. Building upon this, a Random Forest regression analysis was conducted to examine the relationships between community vitality and factors such as the type, quantity, and satisfaction levels of spontaneous living spaces within the block.
To facilitate understanding of the overall research process,
Figure 1 presents the study workflow. The flowchart integrates all stages, from data collection to the development of analytical models and strategy proposals, clearly illustrating the connections between each step. First, the data acquisition layer includes the collection of street-view images, questionnaire responses, and heatmap data. Next, the Mask R-CNN model was used for automatic extraction and classification of spatial features of spontaneous living spaces. These data were then input into a Random Forest regression model to analyze the nonlinear relationships between spontaneous living spaces and community vitality. Finally, SHAP was applied to interpret the predictive results, revealing the key factors influencing community vitality.
2.2. Scope of the Study
The scope of this research focuses on the Tanhualin district in Wuchang, Wuhan (
Figure 2). This historic district, located in Wuchang District, Wuhan City, Hubei Province, China, is a well-established neighborhood. Its architecture combines both Chinese and Western styles, and the district includes residential areas, historical landmarks, schools, hospitals, shops, and other infrastructures, contributing to its rich historical and cultural heritage. Most buildings in the Tanhualin Historic District were constructed during the last century and underwent renovation and reconstruction around 2004. The area still retains numerous spontaneous living spaces created by original residents to adapt to modern urban living.
Predominantly composed of older residential buildings, the district faces challenges such as outdated infrastructure and limited space. In response, residents have developed spontaneous living spaces to meet daily needs. These include creating communal rest areas by adding eaves or installing tables and chairs, as well as constructing carports and rain shelters beneath buildings. Such spaces not only preserve the alleyway structure but also adapt flexibly to contemporary living conditions through spatial collage [
46,
47], forming unplanned sources of daily vitality within historic districts. They underscore the unique role of informal spatial creation in maintaining community resilience and balancing the tension between preservation and development.
Within urban morphology, space is defined as the foundation of a city’s form, embodying the materialization of social relations and cultural concepts [
48]. Space is not merely a physical void but a vessel for human activity and interaction. Traditionally, space refers to formally planned, designed, and constructed areas with clear functions and boundaries, such as squares, parks, tree-lined avenues, and grid-patterned street networks. These spaces reflect social orders, power structures, and aesthetic ideals, with their formation governed by laws, regulations, and professional expertise. They constitute the skeletal framework of urban morphology and serve primary functions such as public assembly, celebrations, and transportation.
Conversely, spontaneous spaces emerge organically through residents’ daily activities and needs, without formal planning interventions [
49]. These are typically informal zones situated between traditional spaces—such as alleyways, lanes, market stalls, or the rear and corners of buildings. They are rich with traces of lived experience, reflecting residents’ creativity, adaptability, and social networks. Due to their flexibility and organic nature, spontaneous spaces often better embody community vitality and cultural distinctiveness, adding nuanced layers to urban life. Together, traditional and spontaneous spaces weave the complex fabric of the city, collectively shaping its unique character and dynamism.
The spontaneous living spaces examined in this study represent a subcategory of spontaneous spaces, specifically denoting informal areas created and adapted by residents to fulfill daily living needs. Typically located within residential zones, these spaces are characterized by flexibility, multifunctionality, and palpable vitality. They emerge in response to ‘gaps’ or ‘shortcomings’ in formal planning, serving as direct manifestations of community vitality. Unlike spaces solely dedicated to transport or commerce, spontaneous living spaces host diverse social activities, including neighborhood chats, children’s play, informal gatherings, and community gardening. Physically, these spaces are equally diverse, incorporating canopies, staircase or ramp additions, and portable furniture. Collectively, these elements form the material foundation of spontaneous living spaces, reflecting residents’ micro-alterations and engagement with public areas.
In studying spontaneous living spaces within neighborhoods, automated identification and localization can be achieved using Mask R-CNN. Researchers no longer need to manually annotate canopies, plants, or portable furniture in each street-view photograph. By inputting large volumes of street-view imagery into a trained Mask R-CNN model, it can automatically and rapidly identify all material elements constituting spontaneous living spaces. The pixel-level masks generated by Mask R-CNN enable precise calculation of these elements’ area, density, and distribution patterns. For example, the model can quantify total green plant coverage within a block or count specific object types, transitioning the analysis from qualitative to quantitative. Simultaneously, the spatial location data of identified objects can be integrated with Geographic Information Systems (GIS). By analyzing the spatial relationships between these elements and roads, buildings, and residential blocks, researchers can further understand how spontaneous living spaces interact with and coexist alongside traditionally planned spaces.
Within the study area, spontaneous living spaces were first identified at the instance-observation level. A total of 2512 visual instance observations were documented from street-view imagery. After excluding images that failed to meet the identification criteria (e.g., insufficient visibility or spatial ambiguity), 1693 valid area views—defined as coherent and interpretable spatial scenes—were retained for further analysis. Because the same physical spontaneous living space could be captured multiple times from different viewpoints or during different survey periods, a consolidation process was conducted. Through this process, repeated observations were merged, resulting in 1249 unique instances of spontaneous living spaces that constitute the final spatial analytical population.
Given the ongoing influence of local government-led redevelopment and demolition activities, the spatial configuration of spontaneous living spaces in the study area is inherently dynamic. To reduce temporal bias, multiple field surveys were conducted, and repeated observations were averaged or consolidated to ensure spatial statistical consistency. Based on a comparative analysis of formal characteristics and spatial configuration, the 1249 compiled instances were classified into three spatial interface types using a spatial typology approach: base interfaces, side interfaces, and top interfaces.
Base interfaces refer to spontaneous living spaces located at ground level or beneath buildings, commonly used for storage, parking, or communal activities. Side interfaces include spaces attached to the vertical sides of buildings, such as staircases, ramps, or small extensions, and are typically associated with personal or semi-public functions. Top interfaces correspond to rooftop spaces or upper-level areas of buildings, frequently used for activities such as drying clothes, small-scale gardening, or creating sheltered auxiliary spaces. These spatial categories were determined through an integrated assessment of physical attributes, spatial position, and observed usage patterns derived from field investigation.
In addition to spatial interface classification, the identified spontaneous living spaces were functionally categorized into four types: fundamental spontaneous living spaces, quality-enhancing spontaneous living spaces, social-oriented spontaneous living spaces, and safety- and privacy-oriented spontaneous living spaces. This functional taxonomy captures differences in usage intensity, social engagement, and environmental contribution.
For model development, satellite imagery and street-view data of the Tanhualin neighbourhood were first acquired from Google Maps and supplemented with street-view photographs captured during a two-week field survey. After quality control, 4217 high-resolution street-view photographs were retained as the primary visual input. Given the substantial heterogeneity in the form and function of spontaneous living spaces, all images were manually annotated to construct a high-quality supervised learning dataset. During the annotation process, the Roboflow platform was employed to label spontaneous living space instances, generating annotation files compliant with the COCO dataset format, including semantic labels, bounding boxes, and pixel-level masks. This annotation framework enabled the predictive model to effectively adapt to the task of identifying spontaneous living spaces within complex urban street environments.
2.3. Random Forest-Based Regression Analysis Model
2.3.1. Introduction to the Random Forest Regression Model
To effectively capture the nonlinear relationships between community vitality density and various independent variables, this study employs a Random Forest Regression model as the analytical tool. This non-parametric statistical method, based on ensemble learning principles, improves predictive accuracy and model robustness by constructing and aggregating multiple regression decision trees [
50]. Specifically, during training, the model uses bootstrap sampling to generate multiple subsamples from the original dataset. A separate regression tree is trained on each subsample, and the random selection of feature subsets at each node split further enhances model diversity. All features used in the Random Forest model were extracted exclusively from the training set and subsequently applied to the validation and test sets without recalibration, ensuring consistency with the data partitioning strategy and preventing feature leakage.
The final prediction is obtained through the weighted average of all base learners’ outputs, mathematically expressed as:
where
denotes the prediction from the
ith decision tree, calculated by assessing each variable’s contribution to the dependent variable
Y through node splitting.
n represents the total number of decision trees. During model training, the random forest aggregates predictions from multiple independently trained decision trees, with the final output being a weighted average of all individual tree outputs.
2.3.2. Feature Selection and Data Pre-Processing
In this study, the dependent variable is community vitality, derived from field surveys and GIS heatmap analysis. It comprehensively reflects residents’ interactions, spatial utilization, and social engagement. Community vitality is a key indicator in urban studies, directly influencing the sustainability of public spaces and the cohesion of local communities (
Table 1 and
Table 2).
To examine the factors influencing community vitality, four major independent variables were considered:
- (1)
Crowd density and dwell time (X1): This variable reflects the intensity and persistence of space usage, serving as a key indicator of spatial attractiveness and utilization efficiency. Higher crowd density combined with moderate dwell time typically signifies stronger social interaction within a space, thereby contributing to community vitality.
- (2)
Satisfaction with various types of spontaneous living spaces (X2): This variable captures residents’ subjective evaluation of the quality and functionality of different spontaneous living spaces. It was measured using a 5-point Likert scale, assessing spaces such as personal clotheslines, public clotheslines, and fitness equipment. Satisfaction levels influence residents’ engagement with their environment and thus affect overall community vitality.
- (3)
Number of spontaneous living spaces (X3): This variable measures the count of different spontaneous living spaces—such as personal clotheslines, fitness equipment, and canopies—within a 500 m2 area. Densities of these spaces were identified using the Mask R-CNN model and included as a key predictor of community vitality.
- (4)
Frequency of use of spontaneous living spaces (X4): This variable reflects how often residents interact with spontaneous living spaces on a weekly basis. High-frequency use indicates that these spaces fulfill essential daily functions and contribute substantially to community vitality.
A Random Forest regression model was employed to analyze these variables and their relationships with community vitality. These features, comprising both behavioral and satisfaction indicators as well as image-derived metrics, are comprehensively described in the Methods section and summarized in
Table 3. Units, descriptive statistics, and the correlation matrix for each feature are provided to ensure clarity and transparency in the analysis.
Feature importance and interactions were further interpreted using SHAP (Shapley Additive Explanations) analysis. A detailed examination of these variables, including their correlations with community vitality and interaction effects, is presented in
Section 5.
Multicollinearity test: Variance Inflation Factors (VIF) for all independent variables were <4.5 (highest was personal clothesline satisfaction vs. density, VIF = 4.12), indicating no serious multicollinearity issues. This allows safe inclusion in the random forest model, which is itself robust to collinearity, but reported here for rigour.
3. Identification and Classification of Spontaneous Living Spaces Using MaskR-CNN
Spontaneous living spaces in China, serving as authentic conduits for grassroots social activities, typically fulfill dual functions: bridging gaps in public services and sustaining community bonds. Their forms directly reflect local infrastructure standards and residents’ practical knowledge. The emergence of these spaces is rooted in the logic of daily life. On one hand, they result from residents’ functional enhancements to their living environments, such as utilizing courtyard gaps to establish communal kitchens or adding storage spaces to building ground levels. On the other hand, they arise from self-organized neighborhood practices aimed at enhancing community vitality, exemplified by spontaneously formed drying areas or public recreational seating.
Characterized by incremental renewal, these informal interventions flexibly occupy liminal spaces such as eaves and courtyard corners. Everyday elements, including clotheslines and plant stands, act as spatial demarcators, preserving the tactile memory of street-scale textures while employing low-tech strategies to reconfigure multifunctional living scenarios for contemporary needs. In this way, spontaneous living spaces serve as vital conduits for sustaining community cultural resilience.
This study employs the Roboflow platform to construct a street-view–based computer vision analysis system for identifying spontaneous living spaces. To eliminate ambiguities in data scale and unit definitions, a unified and hierarchical data pipeline was established, in which each processing stage corresponds to a clearly defined analytical unit, including photographs, area views, spatial instances, and image-level training samples [
51].
First, raw visual data were collected from two sources: (1) Google Maps satellite and street-view imagery, and (2) on-site street-view photographs captured during a two-week field survey in the Tanhualin neighbourhood. After removing duplicated frames and low-quality images (e.g., severe occlusion or motion blur), a total of 4217 unique street-view photographs were retained. At this stage, the analytical unit is the photograph, defined as a single camera frame.
Second, each photograph was screened for the presence of spontaneous living spaces. Photographs providing a coherent and interpretable spatial field were defined as area views, representing visually continuous spatial scenes suitable for spatial analysis. Through this screening process, 1693 valid area views were identified.
Third, within the identified area views, individual spontaneous living spaces were delineated as visual instances, defined as spatially discrete and functionally interpretable units. This process yielded 2512 instance-level observations. To ensure spatial statistical consistency, multi-angle and multi-temporal observations of the same physical entity were consolidated, resulting in 1249 unique spontaneous living space instances. These instances were further categorized into three spatial interface types—base, side, and top interfaces—which serve as spatial attributes associated with each instance rather than independent object classes.
Fourth, to prevent data leakage and ensure rigorous model evaluation, the 4217 original photographs were partitioned prior to annotation and augmentation into training (70%), validation (20%), and test (10%) sets using stratified sampling. This image-level partitioning ensures that different augmented versions of the same original photograph cannot appear across different dataset splits.
Fifth, a deep learning–oriented annotation and augmentation workflow was implemented. Each spontaneous living space instance was manually annotated using a multidimensional scheme that includes semantic object labels, bounding box geometry, and pixel-level segmentation masks, with all annotations exported in COCO format. Data augmentation was applied exclusively to the training set, including horizontal flipping (50% probability) and random rotations ranging from −15° to +15° [
52]. Through this process, the training split was expanded into 9325 image–annotation pairs, which constitute the final set of training samples used for model learning.
Finally, by explicitly distinguishing between photographs, area views, spatial instances, and augmented image-level samples, the proposed workflow establishes a transparent, leakage-free, and reproducible data pipeline. Data augmentation was applied exclusively to the training set after dataset partitioning. The validation and test sets remained unaugmented and were used solely for model tuning and performance evaluation, respectively. This design ensures strict separation between training and evaluation data and eliminates the risk of information leakage across dataset splits.This pipeline ensures balanced representation across spatial interface types and provides a robust foundation for deep learning–based identification and analysis of spontaneous living spaces in complex urban street environments (
Figure 3).
3.1. Annotation Protocol and Inclusion Criteria
All image annotations were conducted following a predefined and standardized annotation protocol. The annotation team consisted of three graduate researchers with backgrounds in urban design and spatial analysis, all of whom received unified training prior to the annotation process. The training session included detailed instructions on the definition of spontaneous living spaces, spatial interface classification, and annotation consistency requirements.
Inclusion criteria required that a spontaneous living space be (1) visually identifiable in street-view imagery, (2) spatially discrete from adjacent structures, and (3) functionally interpretable based on observable physical features and usage cues. Images were excluded if spontaneous living spaces were severely occluded, visually ambiguous, or insufficiently resolved for reliable interpretation.
To ensure annotation consistency, all annotations were reviewed by a senior researcher with expertise in urban spatial analysis. Discrepancies were resolved through group discussion, and consensus-based corrections were applied to the final dataset. This procedure ensured a consistent and reproducible annotation standard across the entire image corpus.
3.2. Model Selection
This study employs the Mask R-CNN model for the precise identification and classification of spontaneous living spaces within the Tanhualin neighbourhood. Mask R-CNN, an extension of Faster R-CNN, adopts a two-stage framework for object detection and instance segmentation. Model selection comparisons were conducted based on target characteristics (
Table 4):
For the multi-object detection and segmentation of spontaneous living spaces in the Tanhualin neighbourhood, Mask R-CNN demonstrates clear advantages in applicability.Its two-stage detection framework—comprising a Region Proposal Network (RPN) and RoIAlign—integrates an end-to-end optimized Mask Branch to generate pixel-level segmentation masks. This capability satisfies the quantitative analysis requirements of irregularly shaped living spaces, including area measurement and boundary curvature analysis. In contrast to DeepLabv3+, which relies solely on semantic segmentation, Mask R-CNN’s instance segmentation capability effectively distinguishes densely arranged adjacent objects, thereby avoiding statistical errors caused by mask overlap.
At the model training level, transfer learning based on the COCO dataset enables high-precision convergence with limited training samples. By freezing backbone parameters and fine-tuning the detection head, this strategy significantly reduces training costs and mitigates overfitting risks. Moreover, compared with the general-purpose segmentation model SAM (Segment Anything Model), which contains approximately 632 million parameters, Mask R-CNN achieves a 93% reduction to 44 million parameters. This reduction preserves segmentation performance while improving deployment efficiency (8.2 FPS), making the model suitable for resource-constrained edge computing scenarios. Although YOLOv8-seg offers superior inference speed (23.5 FPS), its single-stage architecture exhibits reduced accuracy in small-object detection, limiting its suitability for fine-grained spatial analysis. Conversely, SAM demonstrates insufficient segmentation accuracy due to limited domain adaptability. Overall, Mask R-CNN outperforms the comparison models in terms of accuracy–efficiency balance, transfer learning compatibility, and engineering feasibility.
In its first stage, the Region Proposal Network scans input images to generate candidate bounding boxes and filters regions likely to contain target objects. In the second stage, feature extraction, classification, and bounding box refinement are performed on these candidate regions, while a dedicated segmentation branch generates pixel-level masks. Compared with traditional object detection methods, Mask R-CNN effectively decouples classification and segmentation tasks. The RoIAlign operation further mitigates quantization errors in candidate boxes, thereby improving detection accuracy.
Additionally, the model incorporates a Feature Pyramid Network (FPN) to extract multi-scale features and reconstructs the anchor parameter system of the RPN. Based on the size distribution of objects derived from statistically annotated data, a multi-level anchor box system is established, including small-scale (64
2 pixels), medium-scale (128
2 pixels), and large-scale (256
2 pixels) anchors combined with differentiated aspect ratios (1:1, 1:2, 2:1). This configuration significantly enhances detection sensitivity for small-scale targets and irregularly shaped objects. As a result, the model demonstrates strong robustness in capturing detailed and diverse targets within complex neighbourhood scenes, providing reliable technical support for neighbourhood spatial morphology research (
Figure 4).
3.3. Model Training
3.3.1. Model Architecture Construction and Customized Adjustments
Based on the two-stage Mask R-CNN framework, a complete architecture incorporating the Feature Pyramid Network (FPN), Region Proposal Network (RPN), and mask prediction branch is constructed. To address multi-class recognition demands in spontaneous spatial contexts, the category output dimension of the Classifier Head is reconfigured to strictly align with the target number of classes. Concurrently, the segmentation granularity of the Mask Head is adjusted by extracting high-resolution feature maps (typically 28 × 28 pixels) via the RoIAlign layer, enabling precise segmentation of irregular spatial boundaries within complex street scenes.
3.3.2. Data Engineering and Training Parameter Configuration
For this study, a total of 4217 high-resolution street-view photographs were collected from the Tanhualin neighborhood, containing 2512 visual instances of spontaneous living spaces documented across multiple surveys. After excluding images that did not meet the identification criteria, 1249 valid area views were retained for subsequent analysis.
Manual annotation was conducted using the Roboflow tool. Labeling followed predefined categories—such as canopies, plants, and portable furniture—which were classified according to their physical presence and functional roles within spontaneous living spaces. The annotation files were formatted according to the COCO dataset standard, ensuring compatibility with the model training process.
A DataLoader was implemented to perform preprocessing operations on both images and annotations (COCO format), including normalization, random cropping, and scale jittering. The batch size was set between 8 and 16, depending on GPU memory capacity. Model optimization employed the default Adam optimizer, with the parameter update formula defined as follows:
where
and
represent the bias correction terms for first-order and second-order moments respectively, with
denoting the learning rate (initialized at 3 × 10
−4). The loss function inherits Mask R-CNN’s default multi-task joint optimization strategy, comprising the following three components: (1) Classification Loss: Employing cross-entropy loss to measure the discrepancy between the predicted probability distribution
p and the ground truth labels
y:
(2) Bounding Box Regression Loss: Optimizes detection box coordinates
x,
y,
w,
h using Smooth L1 Loss:
where
is the smoothing threshold (default 1.0),
denotes the predicted offset, and
represents the ground truth offset.
(3) Mask Segmentation Loss: Evaluates the accuracy of each pixel’s mask prediction using Binary Cross-Entropy:
where
is the logit output for pixel
j,
denotes the Sigmoid function, and
N represents the total number of pixels within the region
. The total loss is the weighted sum of these three components:
(default
).
3.3.3. Iterative Training and Model Convergence
Street view images and annotated data are fed in batches through the data loader. During forward propagation, the network generates predictions including category probabilities, bounding box coordinates, and mask matrices, while backpropagation computes gradients and updates network parameters. A learning rate decay strategy is applied throughout training, which terminates once the validation mAP@0.5 metrics stabilize. The final model simultaneously outputs target spatial locations, category labels, and pixel-level masks, satisfying the requirements for both morphological quantification and dynamic monitoring of spontaneous living spaces.
3.4. Optimization Strategy
To optimize data augmentation and input preprocessing, a dynamic multi-scale image processing strategy was adopted. Input images were standardized to a resolution of pixels, which is suitable for capturing fine-grained street-view features. In addition, random transformations were applied, including rotation (), probabilistic multi-scale random cropping (scaling ratios ranging from 0.5 to 2.0), and perturbations in the HSV color space (, ). These augmentation techniques increase data diversity and enhance the model’s generalization ability under complex lighting conditions, varying viewpoints, and heterogeneous target distributions.
To address challenges in small-object detection, Mosaic Augmentation was introduced. This technique combines four independent training images into a single composite sample, encouraging the model to learn richer local contextual information.
Within the transfer learning strategy, the backbone network was initialized using pre-trained weights from the COCO dataset (mask_rcnn_coco.h5). The first three convolutional stages (Stages 1–3) of the ResNet-50 backbone were frozen to preserve general feature extraction capability, while Stage 4 and the subsequent detection heads were unfrozen for domain-specific fine-tuning. The initial learning rate was set to 1 × 10−4, and a cosine annealing learning rate schedule was applied, with the cycle length () set to one-quarter of the total training epochs. This configuration balances convergence speed and training stability.
For loss optimization, a hybrid loss function was introduced to address class imbalance and ambiguous segmentation boundaries, which are common challenges in street-scene segmentation tasks.
(1) Classification loss employs Focal Loss instead of standard cross-entropy, utilising adjustable focus parameters (
) to mitigate negative sample dominance. Its expression is:
(2) For bounding box regression, GloU loss (Generalized IoU Loss) replaces Smooth L1 loss. This enhances localization accuracy by optimizing the spatial overlap between detected and ground-truth boxes:
where
C denotes the minimum bounding area between the predicted and ground-truth boxes.
(3) Mask segmentation introduces a linear combination of Dice loss and binary cross-entropy (
). Leveraging the Dice coefficient’s insensitivity to the area of segmented regions alleviates optimization challenges for small objects:
This approach employs an end-to-end joint optimization mechanism. While maintaining model compactness, it effectively enhances detection recall and segmentation boundary coherence for resident-modified spaces in complex street scenes. The experimental validation phase requires constructing ablation studies, Quantitative analysis of the contribution of each optimization module to the mAP@0.5 and IoU metric, to guide strategic trade-offs during engineering deployment.
3.5. Experimental Results Analysis
Based on a systematic analysis of experimental data and model performance (
Figure 5), the proposed Mask R-CNN enhancement demonstrates substantial engineering utility and academic value for spontaneous living space recognition tasks. Area and positional errors were evaluated using standard segmentation metrics. The Mean Absolute Error (MAE) was calculated to quantify deviations between predicted and ground-truth values. In addition, Intersection over Union (IoU) was employed to assess the overlap between predicted and ground-truth bounding boxes, providing a robust measure of localization accuracy.
Through an end-to-end joint optimization framework, the model achieves an mAP50 of 0.842 (95% confidence interval: 0.821–0.863) on the test set, indicating high detection confidence for multi-scale objects in complex street scenes. The segmentation masks yield a Dice similarity coefficient of 0.791 and an IoU of 0.752 when compared with human annotations, confirming the geometric accuracy of pixel-level instance segmentation. This performance is particularly evident in the edge continuity of irregularly shaped objects, such as rooftop extensions and ground-level canopies.
Ablation experiments using a control variable approach demonstrate that the data augmentation strategy (random rotation combined with multi-scale cropping) improves mAP50 by 7.9%, while transfer learning (COCO pre-training followed by staged fine-tuning) contributes a 17.2% increase in mAP50–95. Furthermore, the enhanced hybrid loss function (Focal Loss + GIoU + Dice Loss) significantly improves boundary alignment under class-imbalanced conditions, achieving a 6.8% increase in the Dice coefficient relative to the baseline.
The model-generated bounding box coordinates (dashed-line rectangles) and classification labels (with preset category confidence scores) achieve sub-pixel spatial localization accuracy, with a mean positioning error of less than 0.5 pixels. When combined with a semi-transparent mask overlay visualization engine, the model constructs a digital analytical base map that integrates geometric attributes (area error rate ≤ 3.2%, perimeter agreement ≥ 94%) with semantic labels.
This solution not only meets real-time calibration requirements for spontaneous living space elements (inference speed: 8.2 FPS; GPU memory usage < 4 GB) but also enables quantitative urban planning decisions through the extraction of morphological parameters, such as aspect ratio dispersion analysis (
Table 5) and functional zone area proportion statistics. Its lightweight architecture, comprising 44 million parameters, further facilitates deployment in edge computing scenarios, providing a robust technical paradigm for intelligent monitoring of the built environment.
4. Analysis of Spontaneous Living Space Types
This study employed the Mask R-CNN deep learning model to perform comprehensive image recognition and segmentation within the Tanhualin Historic and Cultural District, resulting in the identification of 1249 instances of spontaneous living spaces. Based on their primary functional attributes, these spaces were classified into four major types: Fundamental, Quality-enhancing, Social-oriented, and Safety and Privacy-oriented.
Following extensive training and validation of the image segmentation model, a final dataset comprising 1249 images was selected for spatial identification and analysis in this chapter.
Table 6 presents the frequency and corresponding percentage of each major category and its subcategories, with consistent totals, thereby providing a robust foundation for subsequent analyses. The following sections proceed with a detailed morphological and functional examination of each major type.
4.1. Fundamental Spontaneous Living Spaces
Fundamental spontaneous living spaces (
Figure 6) constitute the most prevalent category within the Tanhualin Historic District, accounting for 51.72% of the total with 646 identified instances. Analysis of their spatial distribution indicates a predominant concentration on building top surfaces, middle surfaces, and bottom surfaces. These spaces typically exhibit function-driven complexity and demand-oriented differentiation, achieving multifunctional layering through both horizontal and vertical spatial expansion. Their flexible forms and blurred boundaries maintain visual connectivity with public passageways while creating transitional privacy. In some cases, they utilize inter-building voids to achieve dynamic functional coverage.
Based on functional use, Fundamental spontaneous living spaces can be further classified into storage spaces, addition spaces, public clotheslines, and canopies (
Figure 7). Public clotheslines are primarily located on building top and bottom surfaces, addressing essential drying needs in high-density residential environments while simultaneously fostering informal community connections. Shared drying practices encourage neighborly interaction and enhance a sense of community belonging. This subtype accounts for 32.66% of the category, with 211 instances.
Addition spaces are mainly distributed along building middle and bottom surfaces. By vertically extending living functions through added drying, storage, or leisure areas, these spaces alleviate indoor spatial constraints while flexibly utilizing underused areas. Inefficient rooftops or facades are transformed into practical living environments that accommodate diverse household needs. This subtype represents 28.33% of the total, with 183 instances.
Storage spaces are predominantly located on building bottom surfaces, with a smaller proportion found on top surfaces. These spaces are typically enclosed using materials such as wooden planks or wire mesh and feature lightweight, removable, and reconfigurable structures. They address insufficient internal storage capacity within buildings and reflect residents’ emphasis on economy and practicality, accounting for 25.85% of the category, or 167 instances.
Canopies are mainly situated along building middle surfaces, providing shelter from rain and intense sunlight for vehicles, stored items, or activity areas. Beyond their primary protective function, canopies support additional uses such as temporary storage or neighborhood gatherings. This subtype accounts for 13.16% of Fundamental spontaneous living spaces, with 85 instances.
Overall, Fundamental spontaneous living spaces emerge in diverse forms in response to varying resident needs, construction constraints, and functional integration strategies. They embody authentic traces of everyday life, contributing to dynamic community memory while efficiently addressing essential residential demands and enhancing daily convenience at low cost. However, these spaces also pose challenges, including unregulated construction practices and the fragmentation of historic neighborhood character.
4.2. Quality-Enhancing Spontaneous Living Spaces
Quality-enhancing spontaneous living spaces (
Figure 8) account for 26.50% of the total, with 331 identified instances, and are predominantly distributed along the middle and bottom surfaces of buildings. These spaces are generally characterized by decorative attributes, spatial flexibility, and semi-public qualities. They contribute to environmental beautification, the efficient use of limited space, and improved accessibility. In terms of materiality, greater emphasis is placed on aesthetics and durability, with a preference for modular, movable, and lightweight components. Such material strategies accommodate the spatial constraints of high-density environments while allowing residents to adapt the spaces flexibly in response to changing needs.
Based on functional characteristics, Quality-enhancing spontaneous living spaces can be classified into plants, personal clotheslines, and staircase/ramp additions (
Figure 9). Personal clotheslines constitute the largest proportion, accounting for 59.52% of the category with 197 instances, and are mainly distributed along building middle surfaces. Through cantilevered structures, these installations expand drying capacity in dense residential settings, efficiently utilize vertical space, and maintain spatial permeability and visual continuity. This configuration mitigates indoor dampness while minimizing obstruction to natural lighting.
Plant installations account for 27.79% of the total, with 92 instances, and are primarily located on building bottom surfaces, with a smaller proportion on middle surfaces. By introducing greenery on balconies and windowsills, these spaces enhance visual quality, improve microclimatic conditions and air quality, and soften spatial boundaries. They function as semi-private natural partitions and facilitate informal neighborhood interaction.
Staircase and ramp additions represent 12.69% of quality-enhancing spontaneous living spaces, with 42 instances, and are concentrated on building bottom surfaces. These interventions improve accessibility by optimizing vertical circulation, addressing entry and exit difficulties in older buildings, restructuring movement patterns, and creating continuous pathways across different elevation levels.
Overall, Quality-enhancing spontaneous living spaces directly respond to residents’ everyday challenges, such as limited drying space and inefficient barrier-free access. Their frequent use contributes to enhanced community vitality. Design strategies, including greening and modular construction, improve both spatial aesthetics and functional adaptability, while their integration with the historical fabric helps preserve the character of historic districts.
4.3. Social-Oriented Spontaneous Living Spaces
Social-oriented spontaneous living spaces (
Figure 10) account for 9.29% of the total, comprising 116 instances, and are predominantly distributed along the bottom surfaces of buildings. As public activity carriers formed through self-organization, their spatial characteristics and patterns of use are defined by openness, shared accessibility, and functional flexibility. These spaces typically adopt low-intervention and highly adaptable layouts, making use of idle areas such as street corners and front courtyards. Randomly arranged facilities can be reconfigured at any time, allowing these areas to function as multifunctional social nodes.
With blurred spatial boundaries and composite activity scenarios, these spaces provide shaded and sheltered gathering environments that can simultaneously support board games, temporary markets, and community discussions. The facilities are generally lightweight and movable, minimizing intervention in the existing environment while enabling residents to dynamically adjust spatial configurations in response to changing activity demands.
Based on functional attributes, socially oriented spontaneous living spaces can be classified into portable furniture (
Figure 11), fitness equipment, and leisure spaces, all of which are primarily arranged along the building’s bottom surfaces. Portable furniture represents the largest proportion, accounting for 43.97% with 51 instances. By occupying street corners, areas beneath trees, and similar marginal locations, these elements create flexible resting nodes that encourage informal social interaction. Low-cost, spontaneous gatherings for tea drinking, chess, or card games effectively activate public spaces and enhance neighbourhood vitality.
Leisure spaces account for 32.76% of the total, with 38 instances, and provide shaded and sheltered gathering places. Their semi-enclosed configurations support all-weather social and recreational activities. Fitness equipment accounts for 23.27%, with 27 instances, and addresses basic exercise needs across different age groups through modular installations. By transforming underutilized areas, such as inter-building voids and circulation paths, into community health activity points, these facilities reinforce neighbourhood networks through incidental encounters and everyday conversations during exercise.
Overall, socially oriented spontaneous living spaces prioritize openness and shared use, flexibly activating marginal or underutilized areas to provide temporary gathering points for residents. They contribute to the preservation of community cultural memory and facilitate informal interaction across age groups. However, disorderly facility placement may encroach upon pedestrian circulation, while lightweight equipment and temporary seating can weaken spatial order and visual coherence within the neighbourhood.
4.4. Safety and Privacy-Oriented Spontaneous Living Spaces
Safety and privacy-oriented spontaneous living spaces (
Figure 12) account for 12.49% of the total, comprising 156 instances. These spaces are inherently defensive, created by residents to address safety and privacy concerns through physical isolation measures. Their spatial characteristics and patterns of use exhibit both functional rigidity and socio-psychological attributes. Metal fences or mesh structures are commonly installed to enclose balconies and windows, forming dual barriers of physical separation and visual shielding. In some cases, differentiated designs in height and density are employed to delineate boundaries between private domains and public areas.
Based on functional attributes, these spaces can be classified into security meshes (
Figure 13) and fences. Security meshes constitute 58.33% of the total, with 91 instances, and are primarily distributed along the bottom and middle surfaces of buildings. They typically employ modular materials such as metal grilles, stainless steel mesh, or invisible steel wires to form dense layers of physical isolation. These installations serve both anti-burglary and fall-prevention functions while obstructing views and reducing the visibility of interior spaces from public streets. However, excessive enclosure through security meshes may disrupt the visual continuity of building façades and weaken neighbourhood affinity.
Fences account for 41.67% of the total, with 65 instances, and achieve boundary demarcation between private and public areas through variations in height, density, and material. They effectively prevent intrusion by pedestrians, vehicles, or animals, while in some cases incorporating aesthetic elements as decorative fences that soften spatial boundaries and improve visual quality.
Overall, Safety and privacy-oriented spontaneous living spaces prioritize security and seclusion. Facilities such as security meshes and fencing reinforce spatial isolation, with rigid barriers clearly defining property boundaries and alleviating safety concerns for ground-floor dwellings and street-facing residences. However, this approach carries the risk of visual fragmentation and spatial confinement. The industrial mesh patterns of security grilles and the materials and colours of colour-coated steel fences often conflict with the traditional aesthetic of historic districts, while protruding installations may compromise the integrity of building facades.
5. An Investigation into Factors Influencing Community Vitality in the Tanhualin District
This study focuses on Wuhan’s Tanhualin community, employing field surveys and resident questionnaires to examine the formation mechanisms of spontaneous living spaces and the factors sustaining community vitality. As a historic district within the urban core, Tanhualin blends traditional fabric with contemporary expressions, resulting in diverse informal uses of alleyway spaces. Guided by Jane Jacobs’ theory of urban diversity, the study emphasizes the importance of street vitality, functional mixing, and small-scale interventions in fostering livable communities. Drawing on Henri Lefebvre’s concept of The Production of Space, it posits that residents continuously reconstruct and empower spaces through daily actions, generating a locally rooted system of spontaneous spaces. Survey data further indicate residents’ strong demand for and positive perception of neighborhood spaces that are “habitable, usable, and participatory.” Overall, the study aims to reveal how historic communities maintain livability and community vitality through micro-spatial practices under modern urban pressures.
To investigate the mechanisms shaping spontaneous living spaces within the Tanhualin Historic Quarter, 30 field surveys and questionnaire-based interviews were conducted between December 2024 and June 2025. The research sought to comprehensively understand the usage behaviors and needs of different residential groups—permanent residents, long-term and short-term tenants, and short-term guests in private accommodations—highlighting their intrinsic connection to the neighborhood’s spatial evolution. Field observations prioritized peak activity periods (8:00–10:00 a.m., 4:00–6:00 p.m., and 8:00–10:00 p.m.) to capture authentic behavioural patterns. The questionnaires involved face-to-face interviews with 200 randomly selected respondents from 150 residential units, covering demographics, frequency of spontaneous space usage, user perceptions, and psychological evaluations.
By integrating field observations, questionnaire responses, and interview transcripts, this study aims to depict the dynamic formation of spontaneous spaces within the neighbourhood. It explores residents’ motivations across multiple dimensions—including daily necessities, safety concerns, and social needs—providing empirical support for the renewal and development of historic districts.
5.1. A GIS-Based Study on the Correlation Between Crowd Density and Community Vitality
This study employs Geographic Information System (GIS) technology to perform spatial analysis and visualization of neighborhood crowd dynamics. During data processing, discrete point data are converted into continuous spatial information using GPS positioning and footfall statistics. Specifically, Kernel Density Estimation (KDE) is applied to smooth crowd flow data within each analysis unit. Density values are calculated as persons per unit area (per square meter), generating continuous crowd density heatmaps. KDE effectively captures the clustering characteristics of pedestrian distribution, transforming discrete footfall data into intuitive visualizations that reveal spatial patterns of crowd density throughout the neighborhood.
During the GIS mapping phase, this study produced two key maps: a crowd density heatmap and a community vitality heatmap (
Figure 14), followed by a detailed spatial overlay analysis. The crowd density heatmap visually represents population distribution patterns across different time periods using graduated color gradients, capturing the intensity and concentration of pedestrian flows. Similarly, the community vitality heatmap employs color coding to indicate the levels of social engagement, functional activity, and neighborhood liveliness across various zones. Overlay analysis within the GIS platform revealed a high degree of spatial coincidence between high-density areas and high-vitality zones. This spatial correlation was confirmed both visually and quantitatively through established spatial autocorrelation metrics and clustering analyses, demonstrating that high-density population areas are not merely physical gathering points but also core drivers of social vitality, where recreational, cultural, and community interactions are most concentrated.
The application of GIS in this study extends beyond static mapping and visualization. Time-series analyses were conducted across weekdays, weekends, and public holidays to capture temporal fluctuations in high-density zones and compare the dynamics of crowd flow with changes in vitality. This approach enabled a comprehensive understanding of how pedestrian patterns correspond to social activity, showing that pedestrian carrying capacity is closely tied to residents’ daily behaviors, preferences, and interaction habits. High-density areas function not only as transit spaces but as attractive, interactive public environments, serving as key indicators of community vitality and collective engagement.
Spatial autocorrelation was primarily measured using Moran’s I, which quantifies global clustering or dispersion of variables—here, crowd density and community vitality—across the study area. The Getis–Ord Gi* statistic was applied to detect localized clusters and statistically significant hotspots, identifying areas with concentrated high or low values, such as regions where elevated crowd density coincides with heightened community vitality. Results from Moran’s I indicated significant positive spatial autocorrelation (p < 0.05), confirming that areas with dense pedestrian presence tend to cluster with zones exhibiting high vitality. Getis–Ord Gi* analyses identified statistically significant hotspots (p < 0.01) in locations such as the community park, commercial corridors, and residential courtyards, highlighting spatial nodes of intense interaction and activity.
To ensure accurate spatial modeling, queen contiguity weights were employed for Moran’s I, while distance-based weight matrices (threshold = 100 m) were applied for Getis–Ord Gi*, capturing both direct neighborhood relationships and broader spatial interactions. Temporal variations were further examined by analyzing differences in crowd density and vitality across weekdays, weekends, and holidays. Detailed results of these temporal comparisons are provided in a
Appendix A. Permutation tests for both metrics were conducted to verify robustness and statistical significance, with all
p-values below 0.05, supporting the observed significant clustering patterns and confirming the strong spatial association between pedestrian density and community vitality.
5.2. Data Analysis Based on the Random Forest Regression Model
The Random Forest regression model was trained and evaluated using the 18 features listed in
Table 7. The original four variables (X1–X4) were further decomposed into more fine-grained, object-level indicators of satisfaction and spatial density to enhance both explanatory power and model interpretability. The results from the random forest regression model analysis are presented in the table below (
Table 7).
Mean Absolute Error (MAE) represents the average absolute difference between predicted and actual values, quantifying the magnitude of deviations in model predictions. Mean Squared Error (MSE) denotes the average of the squared differences between predicted and actual values, amplifying larger errors and imposing stronger penalties on substantial discrepancies. The coefficient of determination reflects the model’s explanatory power, indicating the proportion of variance in the dependent variable accounted for by the independent variables. It serves as a measure of model goodness-of-fit, ranging from 0 to 1, with values closer to 1 indicating a better fit.
Figure 15 presents a scatter plot comparing the predicted and observed values of community vitality. The horizontal axis represents the observed values, while the vertical axis indicates the model predictions. The black dashed line (y = x) denotes perfect agreement between predicted and actual values.
In the questionnaire survey, residents rated their satisfaction with different types of spontaneous living spaces using a 5-point Likert scale, where a value of 2 corresponds to “somewhat dissatisfied” and a value of 5 indicates “very satisfied.” These ratings were incorporated as categorical variables to examine the relationship between satisfaction levels and spatial features.
As shown in
Figure 15, most data points cluster closely around the y = x line, indicating high predictive accuracy and good model fit. This pattern demonstrates the Random Forest model’s effectiveness in capturing the relationship between the selected features and community vitality. However, several points deviate from the ideal line, particularly at observed vitality levels of 2 and 5. In these cases, the predicted values exhibit a wider dispersion. This suggests increased uncertainty in model predictions at certain vitality levels, especially for high-vitality communities (actual value = 5), where predictions range between approximately 4.0 and 5.0.
Overall, the results indicate that the Random Forest regression model performs well in predicting community vitality across most observations, while also revealing limitations in accurately capturing extreme values. These deviations provide useful insights for further refinement of feature selection and model optimization in future studies.
5.2.1. The Impact of Satisfaction Levels with Various Types of Spontaneous Living Spaces on Community Vitality
This study utilised a random forest regression model applied to survey data from 200 questionnaires, aiming to quantify and reveal the key factors influencing community vitality and their relative importance. The analysis indicates significant variations in the impact of satisfaction levels with various types of spontaneous living spaces on community vitality, providing crucial insights for understanding and enhancing community vitality.
As shown in
Figure 16 satisfaction with personal clotheslines emerges as the most influential determinant of community vitality, with the highest feature importance weight of 0.23. This value is notably higher than those of all other variables, underscoring its central role in shaping community vitality. This result likely reflects residents’ strong demand for quality-enhancing spontaneous living spaces that directly support everyday domestic activities within dense community environments.
Additional space ranks second, with an importance weight of 0.18, indicating that fundamental spontaneous living spaces also play a critical role. Fitness equipment follows closely, with a weight of 0.16. Together, these two variables constitute the second tier of influential factors alongside the primary determinant. The combined importance of the top three factors reaches 0.57, suggesting that they account for the majority of variance in community vitality. Accordingly, these elements should be prioritized in future community planning and intervention strategies.
Space utilization frequency and storage space rank fourth and fifth, with importance weights of 0.07 and 0.06, respectively. Although their influence is weaker than that of the top three variables, they remain meaningful secondary contributors and should not be disregarded.
Other factors—including security mesh, fencing, canopies, plants, staircase or ramp additions, public clotheslines, portable furniture, and leisure spaces—exhibit relatively low importance, with most weights ranging between 0.03 and 0.04. This indicates that safety- and privacy-oriented spaces, as well as lifestyle-driven facilities, exert a comparatively limited influence on overall community vitality.
In summary, the Random Forest analysis clearly identifies the principal drivers of community vitality and provides a quantitative ranking of their relative importance. The findings highlight that strengthening community vitality requires prioritizing spontaneous living spaces that address residents’ core functional needs. On this basis, the incremental optimization of public facilities and spatial utilization can further contribute to a comprehensive improvement of the community environment.
5.2.2. SHAP-Based Causal Explanation and Interaction Analysis of Community Vitality Drivers
To further examine the causal relationships between different types of spontaneous living spaces and community vitality, as well as the interaction effects among multiple features, this study applies SHAP (Shapley Additive Explanations) to interpret the Random Forest regression model. SHAP values quantify each feature’ s contribution to the predicted outcome at the individual-sample level, thereby indicating whether a feature exerts a positive or negative influence on community vitality and revealing non-linear interactions among variables.
The SHAP Summary Plot (
Figure 17) visually illustrates both the direction and magnitude of each feature’s influence on the model output. In this plot, high-value observations of “Personal clothesline,” “Addition space,” and “Fitness equipment” are predominantly distributed along the positive SHAP axis, indicating that higher values of these features substantially increase predicted community vitality. In contrast, the SHAP value distribution for “Space utilization frequency” is more dispersed, suggesting that its effect varies depending on the specific usage intensity.
The SHAP Feature Importance Plot (
Figure 18) further quantifies these effects by ranking features according to their mean absolute SHAP values. The results show that “Personal clothesline” ranks first, with an average SHAP value of 0.41, making it the most influential predictor in the model. This is followed by “Addition space” and “Fitness equipment,” with average SHAP values of 0.36 and 0.33, respectively, confirming their critical roles in shaping model predictions. By contrast, features such as “Leisure space” and “Portable furniture” exhibit average SHAP values below 0.05, indicating a relatively limited contribution to overall community vitality.
In summary, the SHAP analysis clearly demonstrates that “Personal clothesline,” “Addition space,” and “Fitness equipment” are the three dominant drivers of community vitality in the model. These features exert the strongest and primarily positive influence on predicted outcomes, highlighting their central importance in sustaining community vitality.
The SHAP dependency plots presented in
Figure 19 and
Figure 20 illustrate the non-linear contribution patterns of individual features to the model’s predictions. By integrating scatter distributions with grayscale bar charts, these plots demonstrate how variations in feature values correspond to changes in SHAP values, thereby quantifying the positive or negative influence of each feature on the predicted outcome.
The results indicate that the effects of several features do not follow simple linear relationships. For example, addition space (
Figure 19) and fitness equipment generally exert a positive influence on predicted community vitality when their values are low (levels 1–2), as reflected by positive SHAP values. However, as these values increase to higher levels (4–5), their SHAP values rapidly shift to negative. This suggests that an excessive provision of additional space, fitness equipment, or fencing may undermine community vitality, implying that moderate levels are beneficial, whereas over-concentration can be counterproductive in specific contexts.
In contrast, other features exhibit an opposite trend. Public clotheslines (
Figure 20) and portable furniture display predominantly negative SHAP values at lower levels, indicating that limited quantities contribute little or may even exert a negative effect on model predictions. As their values increase—particularly at level 5—the corresponding SHAP values become strongly positive. This pattern suggests that these features are positively evaluated only after reaching a certain scale, at which point they become important contributors to predicted community vitality.
Overall, the SHAP dependency plots provide critical insights into the internal logic of the model. They not only identify the most influential features but also reveal complex, non-linear relationships between feature intensity and predictive impact. Such insights are essential for interpreting model behavior and offer clear guidance for future feature engineering and model refinement. By recognizing these non-linear thresholds, planners and researchers can focus on optimizing key features within appropriate value ranges to improve bot.
The SHAP interaction heatmaps (
Figure 21 and
Figure 22) provide a detailed assessment of how combinations of features jointly influence the model’s predictions. The results show that the interactions between Fitness equipment and Personal clothesline, Addition space and Personal clothesline, and Storage space and Personal clothesline are the most pronounced, with SHAP interaction values of 0.23, 0.17, and 0.18, respectively. These relatively high values indicate that when these features occur at specific levels simultaneously, their combined contribution to the model output substantially exceeds the sum of their individual effects, reflecting strong non-linear synergistic interactions. For instance, the combination of Fitness equipment at level 5 and Personal clothesline at level 1 exerts a distinctly positive influence on the model, illustrating how feature interplay can amplify predictive outcomes.
By contrast, interactions between other feature pairs—such as Public clothesline and Fitness equipment, or Staircase/Ramp addition and Addition space—are considerably weaker, with SHAP interaction values generally below 0.06. This suggests that these features operate largely independently within the model, and changes in one feature do not substantially intensify the effect of the other. Similarly, interactions between Security mesh and Personal clothesline, as well as between Plants and Personal clothesline, are minimal, as reflected by their low SHAP values.
Overall, the interaction analysis highlights that only specific feature combinations produce strong synergistic effects on community vitality, whereas many others contribute in a largely additive or independent manner. These findings underscore the importance of considering not only individual features but also their interactions when interpreting model behaviour and formulating targeted spatial interventions.
The SHAP dependency plots (
Figure 23 and
Figure 24) offer a detailed evaluation of how individual features contribute to the model’s prediction outcomes. The results indicate that Personal clothesline and Public clothesline are the most influential features, both exerting a strong positive effect on the model’s predictions.
For the Personal clothesline, the SHAP plot reveals a clear positive relationship, with a slope of 0.42 and an R2 value of 0.74. This demonstrates that increases in the proportion of Personal clotheslines consistently and substantially elevate the predicted values of the model. Similarly, Public clothesline shows a pronounced positive influence, characterized by a smaller slope of 0.08 but a notably high R2 value of 0.86—the highest among all examined features. This indicates that variations in the proportion of Public clotheslines explain a large share of the variance in SHAP values, underscoring its critical role in the prediction process.
In contrast, Fitness equipment emerges as the feature with the strongest negative impact. Its SHAP dependency plot exhibits a clear negative correlation, with a slope of −0.24 and an R2 value of 0.62. This suggests that higher proportions of Fitness equipment are associated with a substantial reduction in the model’s predicted values, highlighting its adverse influence on the outcome.
In addition, Plants and Portable furniture display moderate associations with the model output. For Plants, SHAP values decrease as the feature proportion increases, indicating a negative effect on predictions. Conversely, Portable furniture demonstrates a positive contribution. Both features have R2 values of approximately 0.5, suggesting a moderate explanatory capacity that is weaker than that of the dominant features, such as Personal clothesline and Public clothesline.
By comparison, Security mesh and Storage space exhibit the weakest influence on the model’s predictions. Both features are characterized by very low R2 values and high p-values, indicating the absence of a statistically meaningful linear relationship with the model output. This implies that these variables play a limited role in the model’s decision-making process.
Overall, this analysis provides important insights into the internal behavior of the predictive model by clarifying the relative influence of different features. By explicitly examining the relationships between feature values and SHAP outputs, the findings enhance the interpretability of the model and offer clear guidance for future feature selection, engineering, and model optimization.
5.2.3. The Impact of Utilization Frequency of Spontaneous Living Spaces on Community Vitality Density Values
The two SHAP dependency plots (
Figure 25 and
Figure 26) provide a comprehensive examination of the relationship between the utilization frequency of spontaneous living spaces and community vitality. As shown in
Figure 25, a clear and strong positive linear relationship is observed, with a slope of 0.20. This indicates that higher utilization frequency of spontaneous living spaces is associated with a substantial increase in community vitality. The relationship is highly statistically significant, with a
p-value well below 0.05 and an R
2 value of 0.75, suggesting that approximately 75% of the variance in community vitality can be explained by changes in utilization frequency. These results identify utilization frequency as a key predictor of community vitality.
Figure 26 adds further insight by incorporating a histogram that illustrates the distribution of the observed data. The results show that most data points are concentrated around several discrete frequency levels, particularly 2.0%, 3.0%, 4.0%, and 5.0%. Although the overall trend remains positive, noticeable variability in SHAP values is observed at identical frequency levels. For instance, at a utilization frequency of 4.0%, SHAP values range from 0.1 to 0.4. This dispersion indicates that, beyond utilization frequency, additional factors also contribute to variations in community vitality, reflecting the influence of other interacting features within the model.
In summary, the utilization frequency of spontaneous living spaces represents a critical indicator for assessing and enhancing community vitality. At the same time, the results highlight that community vitality emerges from a complex system shaped by multiple interacting factors. Future research should therefore incorporate additional variables and examine their interactions in order to develop a more comprehensive and robust explanatory model of community vitality.
5.2.4. Positive and Negative Impacts on the Character of Historic Districts
The impact of spontaneous living spaces within historic districts on their character and physical fabric demonstrates a dialectical and dynamic complexity. On the positive side, these informal spaces function as catalysts for revitalizing historic districts and serve as important carriers of local authenticity. By occupying residual spaces between buildings and emerging through residents’ creative practices, they generate diverse spatial nodes that enhance neighbourhood density and spatial legibility (
Figure 27). At the same time, they closely integrate residents’ daily activities and social interactions with the built environment, reinforcing the sense of place and strengthening community vitality.
However, their negative effects cannot be ignored. When spontaneous living spaces proliferate without regulation, they may undermine streetscape integrity and disrupt spatial order. Such encroachments can alter the original proportions and silhouettes of historic buildings, weakening aesthetic coherence and diminishing the historical gravitas of heritage districts. Moreover, unregulated spatial occupation may blur the boundaries between public and private realms, obstruct pedestrian circulation, and potentially create safety risks.
Accordingly, the management of these organic and spontaneous living spaces should avoid both rigid prohibition and laissez-faire tolerance, instead seeking a dynamic balance. Professional planning and governance should acknowledge their role in fostering community vitality, while employing careful design guidance and regulatory frameworks to integrate them harmoniously into the historic district. In this way, spontaneous living spaces can be accommodated as complementary elements that enrich, rather than erode, the intrinsic value of historic environments.
6. Discussion
6.1. Core Contributions of the Research Methodology
The core contribution of this research methodology lies in the establishment of an interdisciplinary research paradigm that integrates deep learning, machine learning, and explainability techniques. First, the Mask R-CNN deep learning model was employed to achieve automated, pixel-level identification and segmentation of Spontaneous Living Spaces within Historic Districts. This approach overcomes the limitations of traditional manual surveys and substantially improves the efficiency and accuracy of data acquisition.
Second, the application of a random forest regression model enabled a detailed examination of the complex non-linear relationships between spontaneous living spaces and community vitality, offering a flexible and robust framework for quantitative analysis. Finally, the innovative use of SHAP technology provided transparent interpretations of the model’s decision-making process at both global and local scales.
Together, this integrated research paradigm offers reliable technical support for quantitatively investigating the morphology and dynamic mechanisms of spontaneous living spaces in historic districts. It also provides new analytical perspectives and evidence-based decision-making tools for urban planning and community development.
6.2. Design Strategy Guidance
This research seeks to establish a novel data-driven design paradigm. By employing Mask R-CNN to identify and quantify spatial elements, in combination with field surveys and questionnaire investigations, the study captures residents’ behavioural patterns with a high degree of precision. Analytical models such as Random Forest Regression are then applied to reveal the key factors influencing community vitality.
Based on these findings, a set of targeted design strategies is distilled, aiming to enhance community vitality through precise interventions in spontaneous living spaces and to foster symbiotic interactions between spatial facilities and residents’ behaviors. Design optimization is guided by interpretable models, thereby providing an evidence-based approach to effectively improving community vitality.
6.2.1. Explainable Model-Driven Optimization Design
This strategy leverages interpretable models, such as SHAP, as core tools to ensure transparency and enable precise optimisation within the design process. The research illustrated in the figure demonstrates that model quality depends not only on predictive accuracy but also critically on interpretability. SHAP analysis allows clear observation of each feature’s contribution to model predictions—for example, the SHAP value for ’Personal clothesline’ indicates a highly positive effect, whereas other features show non-linear or minimal correlations. This approach shifts design decisions from reliance on experience or intuition to a data-driven, evidence-based framework.
6.2.2. Reinforcement Design Based on “Positive Correlation”
In the context of community space design, interpretability analysis enables precise identification and reinforcement of elements that most significantly influence resident satisfaction, avoiding ineffective investments and facilitating efficient iteration and optimisation of design proposals. This strategy focuses on quantitatively identifying and amplifying features positively correlated with design objectives. As indicated in the figure, the SHAP value for ‘Personal clothesline’ reaches 0.42, demonstrating that each incremental unit substantially enhances the model’s predicted positivity. Consequently, future community planning or residential unit design should intentionally improve drying facilities to increase convenience and accessibility. Possible interventions include foldable or concealed drying spaces or smart drying racks in communal areas, with their placement and quantity optimised through data analysis. This ensures limited resources target design elements with the greatest positive impact, maximising overall project effectiveness.
6.2.3. Flexible Interventions for Predictive Boundaries
Additionally, this strategy applies data modelling to predict spontaneous behaviours, enabling flexible and non-intrusive interventions at spatial boundaries. The figure highlights the presence of Spontaneous Living Spaces and their interaction with planned areas, suggesting that rigid physical demarcations are counterproductive. Instead, dynamic approaches are recommended: for example, analysing heatmaps of social behaviours can predict likely spontaneous gathering points, allowing designers to strategically pre-position movable seating, lighting, or modular facilities. This flexible delineation supports organically emerging ‘boundaries’ while respecting user autonomy, preventing disorderly sprawl, and promoting a harmonious coexistence between planned structures and spontaneous behaviours. Such an approach ensures the enduring vitality and adaptability of community spaces.
6.2.4. Symbiotic Interaction Design Between Facilities and Behaviour
This strategy emphasises facilities as active mediators, fostering bidirectional, symbiotic interactions with user behaviour. The illustrated study examines the behavioural impacts of facilities such as Personal clotheslines, Public clotheslines, potted plants, and portable furniture. The findings demonstrate that design must move beyond static physical configurations to operate as dynamic systems capable of responding to and shaping user behaviour. For example, a shared seating system could adjust its layout automatically in response to user clustering patterns, while these spatial adjustments, in turn, influence interaction dynamics, creating a continuous feedback loop. Such designs actively encourage user participation in the co-creation of space, transforming facilities into catalysts for social and spatial behaviour. Through ongoing data monitoring, designers can gain real-time insights into facility usage and resident feedback, enabling iterative optimisation and ensuring that facilities closely align with actual user needs. This approach promotes the co-evolution of design and behaviour, enhancing both functional efficiency and community vitality.
6.2.5. Micro-Renewal Strategy Thresholds and Constraints
Based on our findings, we propose actionable micro-renewal strategy thresholds to guide the development of spontaneous living spaces that effectively enhance community vitality. These thresholds are derived from the density, coverage, and combinations of key features such as Personal clotheslines, Fitness equipment, and ancillary spaces.
- (1)
Density and Coverage Thresholds:
To achieve significant improvements in community vitality, we recommend implementing Personal clotheslines and Fitness equipment at a minimum density of 10 items per 500 m2 of neighbourhood area. These thresholds are informed by our feature importance analysis and SHAP dependency plots. Beyond this point, the benefits begin to diminish due to overcrowding. Our study indicates that a combination of 15 Fitness equipment units and 15 Personal clotheslines per 500 m2 yields optimal outcomes for social interaction and vitality enhancement.
- (2)
Combination of Features and Marginal Returns:
Personal clotheslines and Fitness equipment demonstrate a synergistic effect when positioned within 50 m of each other. However, once the number of these features exceeds 30 units each, the marginal gains in vitality decrease due to space limitations. This finding aligns with our SHAP interaction analysis, which shows that Fitness equipment at level 5 combined with Personal clotheslines at level 1 significantly boosts community vitality, whereas other feature pairings—such as Public clotheslines and portable furniture—exert considerably weaker effects.
- (3)
Design Constraints:
Although increasing the number of Fitness equipment units and Personal clotheslines can enhance vitality, their placement must not obstruct fire exits, block emergency pathways, or compromise structural safety. We recommend maintaining a minimum distance of 1 m from fire exits and corridors to ensure accessibility and compliance with safety codes. In the context of the Tanhualin Historic District, attention must also be given to the visual and cultural impact of spontaneous living spaces. Temporary and reversible interventions, such as foldable clotheslines or modular Fitness equipment, are advised. This approach allows for the enhancement of community vitality while preserving the architectural heritage and historic character of the district.
6.3. Research Limitations and Future Directions
This study, centred on the Wuhan Tanhualin site, provides preliminary insights into the relationships among crowd density, resident satisfaction with various types of Spontaneous Living Spaces, utilisation frequency of these spaces, and overall community vitality within historic districts. However, several limitations point toward directions for future research.
First, there are geographical constraints in data collection. The study is primarily based on the specific context of Tanhualin, whose unique historical background, architectural character, and community culture may limit the generalizability of the findings. For example, as a modern historical quarter blending Chinese and Western elements, the formation and activation mechanisms of spontaneous living spaces in Tanhualin may differ significantly from those observed in more traditional historical quarters elsewhere in China. Future research should expand data collection across multiple regions and cultural contexts to enable comparative analyses, thereby elucidating how diverse historical backgrounds and urban textures influence spontaneous living spaces and community vitality. Additionally, the absence of long-term monitoring restricts the ability to capture the dynamic evolution of community vitality as Tanhualin continues to transform.
Second, the explanatory power of the model requires further refinement. Although SHAP was applied to reveal non-linear relationships within the model, it remains challenging to quantify intangible factors unique to spontaneous living spaces in historic districts like Tanhualin. These include residents’ collective memories associated with heritage structures, such as churches and old schoolhouses, as well as informal social networks that emerge through everyday interactions. Such factors may exert a greater influence on community vitality than physical environmental elements. Future research should integrate qualitative methods, including ethnography and in-depth interviews, to incorporate these intangible dimensions into analytical frameworks, thereby achieving a more comprehensive understanding of the sources of community vitality within historic districts.
Finally, in terms of practical application, the conclusions of this study remain largely analytical and have not yet been translated into actionable decision-support tools. The conservation and renewal of Tanhualin constitute a complex and sensitive process requiring precise guidance. Future work could focus on developing an integrated model that combines principles of historic district conservation with strategies for enhancing community vitality. Such a tool would enable planners and designers to assess, in real time, the potential impacts of interventions on community vitality during renovation projects, effectively stimulating intrinsic vitality while preserving the historic character of districts like Tanhualin.
6.4. Recommendations and Planning Implications
Based on our empirical analysis, the following planning and design recommendations are proposed for Spontaneous Living Spaces to enhance community vitality. These recommendations are directly grounded in our research findings, providing urban planners with specific and actionable guidance:
6.4.1. Prioritize the Design of High-Density Spontaneous Living Spaces
Our findings indicate that high-density spontaneous living spaces have a significant impact on community vitality. We recommend prioritising the integration of such spaces in urban renewal plans, particularly in high-density residential areas. These spaces effectively foster social interactions among residents, thereby enhancing community cohesion.
6.4.2. Ptimize the Use of High-Frequency Spontaneous Living Spaces
The study shows that the frequency of use of public clotheslines and other spontaneous living spaces directly influences community vitality. Urban planners are encouraged to optimise the utilisation of these areas by designing flexible and adaptable spaces. For instance, the provision of modular seating areas or community gardens that accommodate residents’ daily needs can further strengthen social engagement and overall vitality.
6.4.3. Use GIS Technology to Identify High-Density Spontaneous Living Spaces
Planners should employ GIS technology and heatmaps to identify the highest-frequency spontaneous living spaces, especially those that actively promote social interaction. This spatial data can guide area planning, ensuring seamless integration of spontaneous spaces with formal urban infrastructure, thereby enhancing the public, interactive character of the community.
6.4.4. Promote Social Interaction While Respecting Historical Heritage
In renewal projects within historic districts, the design of new spontaneous living spaces should simultaneously preserve historical heritage and encourage social interaction. Flexible design solutions, such as modular furniture and communal seating areas, are recommended to optimise space utilisation and facilitate resident engagement, while maintaining the distinctive architectural characteristics of historical buildings.
7. Conclusions
This study investigates the formation mechanisms of spontaneous living spaces within historic cultural districts and their influence on community vitality. It uniquely integrates the Mask R-CNN deep learning model with the Random Forest Regression Model to conduct an empirical analysis of Wuhan’s Tanhualin district. Findings reveal that spontaneous living spaces are informal areas organically created by residents to meet daily needs and compensate for gaps in official urban planning. These spaces form integral components of the urban fabric, effectively enhancing community cohesion, fostering social interaction, and thereby elevating overall community vitality.
By identifying and classifying 1249 instances of spontaneous living spaces, the study categorises them into four types: fundamental spontaneous living spaces, quality-enhancing spontaneous living spaces, social-oriented spontaneous living spaces, and safety and privacy-oriented spontaneous living spaces. Among these, fundamental spaces represent the largest proportion, accounting for 51.72% of the total. This predominance underscores residents’ heavy reliance on informal, self-constructed spaces that satisfy daily needs, highlighting gaps in official planning and emphasising the critical role of spontaneous living spaces in providing essential community functions. A decision-making matrix comparing all four categories further clarifies their functional implications (
Table 8).
At the technological level, traditional surveys have long been used in urban studies to assess community vitality and spatial utilisation. However, such approaches often rely on self-reported data, which may be biased or incomplete. They also struggle to capture nuanced, spatially embedded behaviours within informal, spontaneously created spaces. For instance, previous studies assessing public space usage in European cities found it challenging to identify the vibrancy and informal interactions that underpin community vitality. While questionnaires and interviews provide general insights, they often overlook the dynamic, evolving nature of self-organised spaces critical to sustaining social interaction and cohesion.
The enhanced Mask R-CNN approach employed in this study offers substantial advantages in object recognition and segmentation. Through optimisation strategies—including data augmentation, transfer learning, and an improved hybrid loss function—the model achieved an mAP50 of 0.842 on the test dataset and a high Dice similarity coefficient of 0.791, demonstrating its ability to accurately identify and segment irregular objects in complex street scenes. By combining deep learning with spatial analysis tools such as GIS, this study captures the complexity of spontaneous living spaces in ways traditional surveys cannot. This approach enables the identification and quantification of spatially distributed elements, including temporary structures and informal additions, often overlooked in conventional research. Consequently, it improves data accuracy and provides novel insights into the spatial interactions that shape community vitality. Additionally, the Random Forest Regression Model effectively revealed non-linear relationships between community vitality and key factors such as spatial types, resident satisfaction, and crowd density.
In summary, spontaneous living spaces serve as indispensable sources of vitality within historic districts, playing a pivotal role in maintaining dynamic equilibrium between urban conservation and development pressures. This study establishes a robust technical paradigm and quantitative decision-support framework for standardising the formation of spontaneous living spaces, thereby enhancing residents’ quality of life and urban vitality while preserving historical character.