1. Introduction
The optimization of urban spatial planning to enhance livability has emerged as a critical focus in contemporary urban research, driven by escalating societal demands for improved quality of life. A systematic assessment of urban spatial configurations through multidimensional evaluation frameworks provides essential empirical foundations for evidence-based urban management [
1]. Within urban ecosystem planning paradigms, green infrastructure development constitutes a vital component of sustainable environmental systems, requiring integrated approaches that balance ecological functionality with human-centric design principles [
2]. The investigation of urban green space (UGS) in Chinese megacities has traditionally leveraged multi-source social media datasets, including points-of-interest (POI) mapping, geotagged Weibo check-ins, and Dazhong Dianping data [
3,
4,
5]. Satellite remote sensing historically dominated UGS research prior to street view imagery, offering broad spatial coverage and temporal resolution [
6,
7]. Core vegetation indices included the NDVI for spectral analysis and FVC for structural assessment [
8,
9]. Zhao et al. analyzed green space evolution in Nanjing and Greater Manchester using remote sensing-derived land use data and spatial statistics [
10]. Li et al. investigated NDVI dynamics in Beijing through multispectral processing and geospatial modeling, identifying key drivers of green space variations [
11].
Research has shown that approximately 90% of the environmental information that is perceived by humans is acquired through visual channels [
12]. With advancements in street view imagery technology and the comprehensive studies conducted by Japanese scholars on the Green View Index (GVI), which include quantitative assessments of environmental greenery, psychological response mechanisms, and landscape perception evaluations [
13,
14], the GVI has evolved into a fundamental three-dimensional metric for assessing urban green spaces. The GVI quantifies pedestrian-level environmental perceptions, capturing spatiotemporal variations and three-dimensional community greening composition [
15,
16]. Compared to traditional indices like the Greening Rate (GR) and Green Space Ratio (GSR), studies demonstrate that the GVI more accurately reflects public green space quality and aligns with daily human activities [
17]. Villeneuve et al. identified statistically significant associations between the GVI and summer recreational time, outperforming the NDVI [
18]. As a robust three-dimensional metric, the GVI addresses critical limitations of conventional two-dimensional indices by quantifying pedestrian-scale vegetation exposure with high precision [
19]. It also demonstrated that correlations with human activity patterns and recreational behaviors have aligned with contemporary demands for human-centric urban design, while also complementing traditional remote sensing approaches that have dominated urban green space research. Current methodologies for GVI extraction and calculation primarily involve HSV color space analysis, semantic segmentation, and supervised classification techniques. Zheng et al. integrated an HSV-based GVI and sky view factor (SVF) with population heatmaps, revealing significant correlations between street hierarchy and GVI values in historic districts [
20]. Ye et al. combined SegNet for vegetation extraction from Google Street View (GSV) imagery with spatial design network analysis (sDNA) using OpenStreetMap (OSM) data to calculate street accessibility metrics [
21]. Comparative analyses by Feng et al. demonstrated street view imagery’s superiority over multispectral remote sensing in pedestrian-perspective greenery monitoring [
22]. These multimodal approaches enable comprehensive urban planning through complementary spatiotemporal insights.
Recent advancements in urban informatics have demonstrated the efficacy of integrating deep learning with geospatial analysis. Guo Jinhuan et al. used the Deeplabv3+ model to process street view images [
23,
24]. Based on the results of semantic segmentation, they combined multi-source data to analyze the impact of environmental features, architectural features, and neighborhood characteristics on housing prices on Xiamen Island. Concurrently, Hu et al. conducted comparative evaluations of segmentation performance on multispectral remote sensing datasets, with empirical results demonstrating the superior Intersection over Union (IoU) and mean IoU (mIoU) metrics achieved by DeepLabv3+ across all land cover categories [
25]. Within urban greening assessment frameworks, street view imagery and remote sensing data exhibit complementary strengths: pedestrian-centric visual exposure metrics derived from street-level perspectives contrast with the NDVI and FVC obtained from aerial platforms, collectively enabling multi-scale urban ecosystem monitoring that is essential for sustainable planning. Li Miaoyi et al. calculated the GVI based on the greenery identified by SegNet using street view, and the NDVI using Landsat 8 Operational Land Imager (OLI) remote sensing images from the same month, respectively, to clarify the greening differences between satellite scale and human scale, concluding that human scale greening evaluation is more promising [
26]. However, street view images can only be captured along streets, thus making it difficult to evaluate urban green space in non-street areas. As an index describing urban green environments, the GVI is bound to correlate with the NDVI and similar measures. Ming Tong et al. used Nanjing as a case study to explore the relationship between the GVI and NDVI, showing strong correlations between the two [
27]. Limited research has addressed how to exploit their relationship to overcome the spatial constraints of street view imagery collection.
Given the temporal inconsistency and spatial sparsity of street view image acquisition, this study first explored the correlations between remote sensing-derived vegetation indices and street-level GVI values. Subsequently, predictive models were established to estimate GVI-equivalent metrics in non-street areas. These methodological advancements provided critical data support for evidence-based urban green space planning and evaluation.
3. Methodology
Street view imagery and remote sensing imagery characterize the distribution of urban greenery from different perspectives. Considering the inaccessibility of street view images in non-road areas, GVI values derived from the street view images in road-adjacent zones were analyzed to predict non-road GVIs through Sentinel-2 data integration. First, the GVI values in the road areas were quantified using the DeepLabv3+ semantic segmentation model. Correlation analyses were conducted between Sentinel-2 spectral bands (blue, green, red, and near-infrared), vegetation indices (the NDVI and FVC), and the GVI to identify key predictors and to determine the optimal buffering radii for remote sensing indices. Multiple machine learning regression and classification models were evaluated to establish the optimal predictor for the non-road GVI and green vision perception levels. Finally, the optimal model was selected to predict spatial green coverage in non-road areas within Beijing’s Third Ring Road.
3.1. Street View-Based GVI Calculation
In this study, the DeepLabv3+ model was selected for street view image binary semantic segmentation, which is an advanced neural network architecture that is designed for semantic image segmentation, combining Atrous convolution and an Atrous Spatial Pyramid Pooling (ASPP) module to capture multi-scale contextual information. The model enhanced the previous DeepLab architectures by introducing an encoder–decoder structure, where the encoder extracted features and the decoder refined the segmentation maps for more precise results. The model was highly effective in applications requiring detailed image segmentation, such as autonomous driving and medical imaging, due to its ability to accurately segment objects of varying sizes and shapes within an image [
29]. To reduce the computing resources, the Xception backbone network was replaced by MobileNet v2 as the feature extraction backbone network. A DeepLabv3+ semantic segmentation pre-training model was built based on the Cityscapes dataset, combined with manually labeled greening samples from the street view images, and was retrained by implementing a grid search strategy to optimize the hyperparameters, such as the learning rate, output size, and batch size. Readers are referred to Wang et al. for detailed segmentation descriptions [
30].
The GVI of each street view image is calculated as the ratio of green-classified pixels to the total pixel count, based on binary semantic segmentation results of the green elements (Equation (1)). The green view index ranges from 0 to 1. Increasing values indicate greater proportions of greenery in the street view perspective.
In the formula,
represents the total number of pixels of the image calculated, which is obtained by multiplying the number of pixels in the image rows by the number of pixels in the columns, and is usually a constant of (4096 pixels × 1408 pixels);
represents the number of pixels occupied by the green elements extracted from the image.
3.2. Correlation Analysis
The Sentinel-2 10 m resolution red, green, blue, and near-infrared bands, along with the green-related NDVI and FVC indices (six variables in total), were analyzed using 4070 GVI values from street view images to explore the correlations. The NDVI is a widely used remote sensing index that measures vegetation health and density via satellite imagery (Equation (2)) [
8], while the FVC refers to the proportion of green vegetation coverage relative to total ground area (Equation (3)) [
31]. It is a key biophysical parameter for assessing vegetation density and distribution in ecological studies. The FVC is expressed as a percentage that represents live vegetation coverage.
Since a GVI value represents green vision perception within a specific area, pixel-based band values (four bands) and indices (two indices) were aggregated to the corresponding spatial units. To account for varying street widths, three circular buffering strategies were applied to GVI locations as follows: fixed 25 m, fixed 45 m, and dynamic radii (45 m for the ring streets, 35 m for the main streets, and 25 m for the secondary streets;
Figure 6). ArcGIS software was used to create buffers around each street view image location, within which the red, green, blue, near-infrared band, NDVI, and FVC values were averaged.
where
and
represent the spectral reflectance from the near-infrared and red bands, respectively.
where
is the fractional vegetation coverage;
is the normalized vegetation index of each pixel in the image;
is the value of the area covered by all the bare soil pixels in the image
which is the value of the
at the 5th percentile in ascending order in the image; and
is the value of the area covered by the pure vegetation pixels in the image
which is the value of the
at the 95th percentile in ascending order in the image.
Prior to analyzing the correlation between the GVI and Sentinel-2 data variables, we applied a non-parametric normality test to check whether the Sentinel-2 variables followed a normal distribution. This test calculates the goodness-of-fit between the variable distribution and the standard normal distribution. In this study, given the small sample size (<5000), the Shapiro–Wilk (S-W) test was performed to analyze the degree of deviation from normal distribution among the sampling data, where
p-values were calculated using SPSS 28.0 [
32]. The
p-value in a Shapiro–Wilk test ranges from 0 to 1. The closer the
p-value is to 1, the more closely it approximates a normal distribution. Subsequently, the Pearson correlation coefficient [
33] was calculated to measure the degree of correlation between the GVI and each Sentinel-2 data variable. The Pearson correlation coefficient measures the linear relationship between two variables by dividing their covariance by the product of their standard deviations, and the value is always between −1 and 1. Coefficients closer to 1 or −1 indicate a stronger linear relationship, while values near 0 suggest a weak or no linear relationship.
3.3. GVI Regression and Classification Prediction
GVI, as a key three-dimensional metric for urban greenery assessment, is primarily derived from street view imagery. However, existing street view data are limited by temporal inconsistencies and insufficient spatial coverage. To address these limitations, we innovatively integrated the street view imagery with remote sensing data and employed both regression and classification prediction models. The classification prediction model directly estimates urban residents’ green perception levels in living environments, providing data support for urban planning and greenery assessment. The regression prediction model establishes quantitative relationships between the GVI and remote sensing vegetation indices, overcoming the spatiotemporal constraints of street view imagery to deliver more accurate GVI estimations. It should be emphasized that, given the current scarcity of experimental studies in this field, we conducted comprehensive experiments using both regression and classification prediction approaches.
Based on the averaged Sentinel-2 data variables and corresponding GVI values, machine learning regression models were employed to predict specific GVI values. Five models were compared to determine the optimal regression prediction model, including K-Nearest Neighbor (KNN) regression, linear regression, decision tree (DT) regression, random forest (RF) regression, and Extra Trees Regressor. Based on the results of the regression prediction models, we selected the most stable performing models for classification prediction. We optimized the model hyperparameters using a grid search approach to ensure maximum prediction accuracy. The 4070 samples were randomly split into training and testing sets with a ratio of 4:1. The models were constructed using the open-source scikit-learn library (
https://scikit-learn.org/stable/api/sklearn.ensemble.html accessed on 19 June 2025).
Three accuracy evaluation indicators of the regression prediction model were utilized, which included the root mean square error (RMSE, Equation (4)), mean absolute error (MAE, Equation (5)), and determination coefficient (R
2, Equation (6)). RMSE is the square root of the expected squared difference between predicted and actual values. MAE represents the average absolute error between predictions and observations, reflecting the magnitude of prediction errors. Lower RMSE and MAE values indicate higher model accuracy. R
2 quantifies the proportion of variance in the dependent GVI variable explained by the independent Sentinel-2 variables. R
2 values closer to 1 demonstrate higher model accuracy [
34].
where
is the number of test samples,
is the actual GVI value from the street view images,
is the predicted value from the models, and
is the mean of the actual values of the test samples.
One goal of this study was to explore urban green perception levels. Machine learning classification models were developed to directly predict green vision perception in non-road areas, rather than categorizing the levels based on regressed GVI values. It was hypothesized that classification accuracy could be improved through this approach.
Following Japan’s National Institute for Environmental Studies, green perception levels demonstrating positive psychological effects were defined as the following: poor [0–0.15), general [0.15–0.25], and good [0.25–1.00] [
35]. Given that the GVI values in [0–0.1) were primarily attributed to sparse vegetation or image obstructions, the [0–0.15) range was subdivided into [0–0.1) and [0.1–0.15). Decision tree (DT) and random forest (RF) classifiers were implemented using the same training samples and scikit-learn library as the regression models, categorizing the predictions into four green levels. The classification accuracy was evaluated using the confusion matrix, which tabulates correct vs. incorrect classifications in the statistical classification process [
36].
Both precision and recall metrics were calculated to evaluate the GVI level accuracy, which exhibits a trade-off relationship. The F1-score, defined as the weighted harmonic mean of precision (P) and recall (R), serves as a comprehensive evaluation index. Higher F1-scores indicate superior model classification performance. Accuracy (ACC) measures the overall prediction correctness of the classification models, defined as the proportion of correctly predicted instances in the dataset. ACC ranges from 0 to 1, with values approaching 1 signifying higher model accuracy.
where TP (True Positive) represents the number of correctly classified pixels that are actually within the specific green level. FN (False Negative) represents the number of incorrectly classified pixels that are not actually within the specific green level, FP (False Positive) represents the number of incorrectly classified pixels that are actually within the specific green level, and TN (True Negative) represents the number of correctly classified pixels that are not actually within the specific green level.
4. Results and Analysis
4.1. Sentinel-2 Variables Correlation Analysis
Table 1 presents the
p-values from the Shapiro–Wilk (S-W) test for the Sentinel-2 variables across different buffer radii. The
p-values from the B2, B3, B4, B8, NDVI, and FVC indices were greater than 0.9, indicating adherence to normal distribution, regardless of the buffer radii.
Table 2 presents the correlation coefficient of each variable under different buffering radii, showing that a radius of 25 m was optimal with the highest average absolute correlation coefficient of 0.560, which is consistent with the study of Su et al. [
37]. The dynamic radius ranked second with the corresponding value of 0.549, while the radius of 45 m had the weakest correlation with 0.424. To be more specific, for the optimal radius of 25 m, the FVC and GVI showed the strongest positive correlation with a value of 0.752, which was followed by the NDVI, with a correlation coefficient of 0.750. B8 also had a relatively strong positive correlation with the GVI, with a value of 0.553. In contrast, B2, B3, and B4 had a negative correlation with the GVI, where B3 had a relatively apparent correlation with the value of −0.514. Compared with the results for the 25 m radius, all variables from the dynamic radius had weaker correlations with the GVI than B4. Variables from the radius of 45 m were expected to have the lowest correlation with the GVI.
Remote sensing observes the Earth’s surface from a vertical viewing angle, with no observation bias for vegetation information within a region [
38]. In contrast, street view cameras capture street environments from pedestrian perspectives, potentially magnifying the weight of vegetation near the cameras with consideration for the depth of field (DOF) [
39]. Street view images from the smaller-width street tend to overestimate the GVI due to more visible greenery at close ranges, while street view images from the larger one tend to underestimate the GVI from a long range with less green information. Our correlation analysis results demonstrated that the fixed 25 m radius is optimal among the three buffering radius strategies, which may signify that 35 m and 45 m dynamic radii are still excessive compared to the greening space capturing of street view collection equipment.
4.2. GVI Prediction Results
Based on the results of the regression prediction models, we eliminated two unsuitable prediction approaches—KNN and linear regression—and selected the DT, RF, and Extra Trees models for GVI classification prediction.
Table 3 presents the final model parameters of the regression and classification prediction models that were optimized through the grid search method.
Figure 7 presents a comparison of five GVI regression models, where the RF model had the best prediction performance, achieving the lowest RMSE and MAE of 0.063 and 0.045, as well as the highest R
2 of 0.787. The DT model ranked second, while the linear regression model had the worst performance with the corresponding values of 0.088, 0.061, and 0.590, respectively.
Table 4 presents the classification performance metrics, revealing that the Extra Trees classifier attained optimal results, with an F1-score of 0.735, demonstrating balanced performance with a 0.745 recall and 0.737 precision.
Based on the evaluation results from five regression models, this study conducted a comparative analysis of the decision tree, random forest, and Extra Trees models using both regression and classification approaches. We discretized the regression outputs from the decision tree, random forest, and Extra Trees models into three green perception levels—[0–0.15), [0.15–0.25), and [0.25–1.00]—and compared them with the corresponding classification results (
Table 5). In the classification prediction model, the accuracy evaluation metrics for the two classified intervals [0–0.10) and [0.10–0.15) were combined and averaged.
The Extra Trees classification model achieved the best performance for green visual perception level prediction, demonstrating that direct classification outperforms regression with subsequent discretization.
Table 5 presents the performance metrics of R, P, F1, and ACC for both the regression and classification prediction approaches using the decision tree, random forest, and Extra Trees models on the test set. In regression predictions, all models demonstrated higher Ps than Rs. Conversely, the classification predictions showed consistently higher Rs than Ps, indicating that the regression predictions achieved a higher proportion of correct predictions among the positive samples, while classification predictions were more effective at identifying positive samples. When considering both the comprehensive F1-score and ACC metrics, the Extra Trees classification model exhibited superior performance. Notably, the classification predictions generally achieved higher ACCs than the regression predictions, with the Extra Trees classifier reaching an ACC of 0.652.
Regarding the results of the regression and classification prediction models, the highest R2 accuracy metric in the regression models was 0.787, while the highest F1-score in the classification models was 0.717. After standardizing the evaluation metrics between the regression and classification prediction results, we found that the classification prediction models demonstrated significantly better performance in both accuracy metrics and overall precision compared to the regression prediction models. Through an analysis of these results, we identified two main factors. First, the presence of erroneous data in the training dataset was a key reason for the relatively lower model accuracy. Second, the higher data density near the threshold values of the green perception level classifications was primarily responsible for the classification models’ accuracy, as this significantly influenced the models’ predictive judgments.
4.3. GVI Distribution Analysis of the Study Area
Since the relationship between the GVI and Sentinel-2 satellite data at the pixel scale remains unclear and considering that the Extra Trees classification model demonstrated superior performance in both regression prediction and classification prediction of the green perception levels, this study adopted the Extra Trees classification model to predict GVI levels in non-street areas.
The prediction scope covers areas within Beijing’s Third Ring Road, where the model’s performance in predicting green perception levels for non-road areas will be evaluated for reasonableness and reliability. Within this study area, non-street regions were systematically divided into 50,480 concentric circles, each with an optimal buffering radius of 25 m. The research team calculated the mean values for pixel-based metrics (including Band 8, as well as the NDVI and FVC) within each circle, which were then processed using the Extra Trees classification model for direct prediction. To enhance the visualization of GVI spatial distribution, the researchers generated 12,620 concentric circles with a 50 m radius and then applied a majority-rules principle to aggregate the predicted values from the 25 m buffer zones (see
Figure 8).
Figure 8 presents the predicted green perception levels in non-road areas. Within the study area, we employed a four-tier color gradient, from red to green, to represent the different classification levels as follows: [0–0.10) pretty poor green vision perception (red), [0.10–0.15) poor green vision perception (orange), [0.15–0.25) moderate green vision perception (yellow), and [0.25–1.00] good green vision perception (green). The results revealed significantly inadequate green visual perception in non-street areas within the Third Ring Road, with most regions (marked in red) exhibiting low GVI values that ranged from 0 to 0.1.
The frequency and distribution of the green vision perception levels in non-street areas are shown in
Figure 9, which quantitatively indicates that the green vision perception level within Beijing’s Third Ring Road is poor. The green vision perception level, ranging from 0 to 0.15, accounts for more than half, with an exact value of 56.8%, where the ranges from 0 to 0.1 and from 0.1 to 0.15 represent 44.6% and 12.2%, respectively. The moderate green vision perception level, ranging from 0.15 to 0.25, is slightly 2% higher than the good level, which ranges from 0.25 to 1.00.
4.4. Spatial Patterns of Green Perception Distribution Within Beijing’s Third Ring Road
Figure 10 identifies four green perception areas as follows: (a) Area B of Fuli City Community in Shuangjing Street, (b) Fangcheng Community in the Fangzhuang area, (c) around the Bridge of Fenzhong Temple in the Shibalidian area, and (d) a mixed residential area on Dashilar Street. The areas (a)–(c) are ordered from the highest to lowest green perception levels in the prediction results, with orange-yellow denoting poor green perception zones, and blue indicating better green perception areas. An examination of the corresponding real-world conditions reveals that both (a) and (b) are residential communities within the Second and Third Ring roads, with (a) being a recently developed neighborhood that exhibits superior greenery conditions compared to (b), which is due to its higher economic value. Area (c), situated near a major Third Ring Road transportation hub that is surrounded by industrial facilities, demonstrates lower greening levels. Area (d), containing cultural landmarks and extensive hutong alley networks, displays particularly low green perception levels within the Second Ring Road owing to difficulties in implementing greening measures caused by building typologies and area-specific constraints. A comparative analysis between the predicted results and actual conditions across these four areas representing varying green perception levels shows fundamental consistency, thereby providing substantial validation for our experimental outcomes.
Based on the aforementioned comparative analysis conclusions, we conducted a spatial distribution analysis of the green perception levels in the non-road areas within Beijing’s Third Ring Road.
Figure 11 presents the community distribution within Beijing’s Third Ring Road, in combination with the prediction results. According to the results shown in
Figure 11a, the green vision perception levels within the Second Ring Road were found to be lower than those in the areas between the Second and Third Ring roads. The region within the Second Ring Road primarily includes the Xicheng and Dongcheng districts, which are characterized by traditional alleyways and narrow lanes. Due to rapid population growth, these areas have become densely built-up, making greening efforts challenging due to the historical nature of the buildings and the spatial constraints of the region. Consequently, the green vision perception within the Second Ring Road remains relatively low [
40].
In detail, areas with moderate to good green vision perception levels were primarily located in parks, universities, and some subdistrict communities. For example, parks along the southeast corner of Beijing’s Second Ring Road, such as the Temple of Heaven Park, Longtan Park, Taoranting Park, and Beijing Grand View Garden, exhibited good green vision perception levels, ranging from 0.25 to 1.0. At the northwest corner of the Third Ring Road, including communities such as Zizhuyuan, Beixiaguan, and Beitaiping, many universities, including Beijing Institute of Technology, Beijing Foreign Studies University, Beijing Normal University, and Beijing Jiaotong University, displayed moderate to good green vision perception levels that exceeded 0.15. Another area with a good green vision perception level was located in the Ganjiakou community to the west of the Third Ring Road, where Yuyuantan Park, Zizhuyuan Park, and the zoo area contain abundant vegetation.
Large areas with relatively poor green vision perception levels, ranging from 0 to 0.1, were primarily located within Beijing’s Second Ring Road. These areas included high-speed railway stations, commercial and cultural districts, distinctive cultural landmarks, and hutong residential areas within the main urban areas. For example, as the cultural center of the Chinese capital, the Second Ring Road encompasses most of the commercial and cultural blocks, as well as characteristic cultural attractions. Notable locations include Liulichang Ancient Cultural Street in the Dashilan community, Wangfujing Commercial Pedestrian Street in the Dongsi community, and the Qianmen community, which is primarily composed of guild halls and museums, such as Madame Tussauds Beijing, the Red Star Yuanshenghao Museum, and the Pigment Guild Hall. Additionally, Beijing’s hutong residential areas, such as the Xichangan, Financial Street, and Xinjiekou communities, also exhibited low green vision perception levels.
6. Conclusions
We estimated the green vison perception level in the non-street areas within Beijing’s Third Ring Road by combining Baidu map street view images, correlation analysis between a street view image-based GVI and Sentinel-2 B2, B3, B4, B8, NDVI, and FVC variables, as well as machine learning classification and regression prediction models. The correlation analysis demonstrated that the aggregated Sentinel-2 variables with a buffering radius of 25 m had the closest correlation with the GVI from street view images compared to the 45 m and dynamic radii with different street widths. Among the six Sentinel-2 related variables, the FVC and NDVI had a strong positive correlation with the GVI, with a correlation coefficient of 0.752 and 0.750, respectively. Sentinel-2 B2, B3, and B4 had a negative correlation with the GVI, with B3 showing the strongest negative correlation, with a coefficient of −0.514. Among the five machine learning regression models for predicting the GVI, the random forest regression model showed the best prediction performance with RMSE, MAE, and R2 values of 0.063, 0.045, and 0.787, respectively. However, the classification model outperformed the regression model in estimating green vision perception levels, with the accuracy of green vision perception increasing by 5.5% (ACC) compared to the regression and the classification strategy. The spatial distribution of green vision perception levels in the non-street areas within Beijing’s Third Ring Road showed that the green vision perception level within the Beijing Second Ring Road was lower than in the area between the Beijing Second Ring Road and the Third Ring Road. Furthermore, 56.8% of the study area was within the poor green perception level from 0 to 0.15. The proportion of moderate and good green perception levels was similar at 22.6% and 20.6%, respectively.