Determining the locations that dominantly influenced the emotional characteristics of subjects is fundamental for further analyzing the potential role of urban spatial attributes. To refine the problem, the statistical analysis was roughly carried out in three steps. First, by generating isovists at the ESPs, we extracted the architectural texture within the isovist scope and obtained basic data on the external spatial attributes of the surrounding architecture. Second, we calculated statistics using the isovist parameters based on the ESPs to judge the overall influence of these parameters on subject valences and to establish a regression model for each isovist parameter and valence. Third, we recorded the urban scenery observed by subjects while walking the route by taking photos and comprehensively applying visual entropy and visual fractal dimensions to analyze the spatial attributes represented in these photos and to explore their potential possibilities affecting the valence.
4.1. Influence from Building Texture
The isovist concept has been used for spatial analysis since 1979 [
43,
44]. The principle is to abstract space into a collection of countless viewpoints, among which the isovist is simplified as a sub-collection mutually and directly viewed between these viewpoints. On this basis, the attributes of an isovist are defined through a series of geometric parameters; then, spatial mapping is further conducted to form an isovist field covering the entire research area. Isovist can be studied from both 2 and 3 dimensional aspects. In this research, we would only limit it to the 2D aspect.
Based on the P collection (11 sets) and N collection (9 sets) of the ESPs identified along the experiment route, we set the isovist radius threshold to 200 m and generated isovist boundaries in ArcGIS for all the sets of sampling points (
Figure 4). Then, we extracted the building footprints within those boundaries and calculated the shape index, including the mean area, area dispersion, degree of fragmentation and average distance between buildings. Equation (4) shows the calculation used for area dispersion and Equation (5) shows the calculation for the degree of fragmentation. All the shape indices were normalized by dividing them by the mean value to allow non-dimensional conversion and statistical analysis (
Table 1).
Here,
Si refers to the areas of all the architectural outlines within the isovist,
refers to the average area of an architectural outline,
N refers to the number of architectures, and
P refers to the overall length of an architectural outline.
A railway divides the research area into two sites—an eastern portion and a western portion (S1 and S2); therefore, we analyzed the shape index of the two sites under circumstances of different valences (
Figure 5). In S1, the shape index of the building footprints corresponding to sites of different valences is noticeably different compared to the shape index in S2. For example, when subjects exhibit positive emotions, the building footprints within the isovist in S1 tends to be larger with a comparatively small deviation. This is possibly due to the influence of building scale; the average center-to-center spacing is large between the buildings in S1. Moreover, the building texture fragmentation is higher in S1, reflecting a complicated overall outline and a spatial hierarchy within this area. In S2, under the circumstance of different valences, the shape indices other than fragmentation have collinear characteristics, indicating that the factors affecting the subject’s emotions may not be triggered by these shape indices in S2. Therefore, in S2, it can be speculated that the influence of urban form on emotions may have a comparatively secondary status.
An independent sample t-test was further applied to analyze the shape index of the P collection and the N collection. The results showed that no indicators are significant (p > 0.05) at a confidence level of 95%. Therefore, although we can compare differences in architectural texture using the shape indices among a few ESPs, it is difficult to predict other locations. Consequently, other spatial attributes (such as isovist parameters) must be employed to conduct a deep analysis and to explore other, more dominant spatial influential factors.
4.2. Isovist Analysis
We created an isovist analysis model in Depthmap, set the analytic accuracy to 10 m and selected 6 important isovist parameters to analyze their influence on the subjects’ emotions: isovist area, isovist perimeter, isovist compactness, occlusivity, max-visibility length and min-visibility length. Of these, the formulas for isovist compactness and occlusivity are, respectively, as follows [
45]:
Here,
S refers to isovist area,
P refers to isovist perimeter, and
Pf refers to the overall lengths of solid boundaries within the isovist area.
We performed spatial matching between the values of all isovist parameters and the 348 ESPs. The values were divided into two groups based on the valence from the independent sample t-test to analyze whether significant differences occurred between the two sets of sample average values. The results indicate that when the confidence level is 95%, all isovist parameters have dominant differences; when the confidence level is 99%, most isovist parameters—except for min-visibility length—show significant differences. Therefore, we can roughly assume that these isovist parameters may influence subjects’ emotions to a certain extent.
Regression reveals relationships that may exist between one or more independent predictors and one dependent variable. Here, we applied binary logistic regression to analyze the probability that the isovist parameters influenced subjects’ emotions. First, we used each individual isovist parameter for the regression; then, we included all the isovist parameters in the model simultaneously. Finally, the predictive effects are acquired by taking the combined parameters as variables. In this case, the response variable of the logistic regression is the valence, where 0 and 1 represent the two different states (0 represents negative emotions and 1 represents positive emotions). Assuming that the response variable equals 1 (positive), the probability is P, and the formula is as follows [
46]:
Here,
P ∈ [0, 1], Xi refers to all six isovist parameters selected in this paper, and
Bi refers to the estimated coefficient of the variable.
The prediction efficiency of the regression model can be further inspected using a receiver operating characteristic analysis (ROC), which divides the prediction probability into several critical points and obtains the corresponding sensitivity and specificity of each critical point. Taking sensitivity and specificity as the coordinate axes, the points can be connected, forming a curve. When the threshold value curve coincides with the diagonal line, it means that sensitivity and specificity each account for 50%, indicating that the analysis result has no practical meaning. However, when the judged threshold value is closer to the left corner of coordinate graph, the sample’s overlapped region is smaller and the discrimination is much stronger. Overall, the area under the curve (AUC) intuitively reflects the model’s accuracy: the larger the AUC is, the higher the accuracy is. When the Youden index is the highest, the optimal critical point can be determined as follows [
47]:
where
Y refers to the Youden value;
Se refers to sensitivity, and
Sp refers to specificity.
The results show that when using only a single isovist parameter for the logical regression, the Hosmer-Lemeshow coefficients, which reflect the model’s overall goodness-of-fit, are all less than the set significance level (
p < 0.05). This indicates that the regression model does not fully extract data, and there is a dominant difference between the model’s predicted value and observed value. The results also show that the ROC curve of every individual isovist parameter is located around the coordinate’s diagonal line (
Figure 6a–f), which further verifies the conclusion of the unsatisfactory regression effect. Consequently, we can judge that the emotions of subjects cannot be estimated accurately using any single isovist parameter. Therefore, after removing those isovist parameters with poor correlations, we combined the remaining isovist parameters into the regression model. Finally, we chose isovist compactness (X1), neighborhood degree (X2) and maximum visibility (X3) as the concomitant variables. To maintain the proper proportion of estimation coefficients, corresponding scaling of parameters is required. By selecting the regression method with optimal efficiency, we can carry out an iterative computation. The Hosmer-Lemeshow coefficient reaches 0.128, which is larger than the set significance level (
p > 0.05). Thus, as a preliminary judgment, this model generally accepts the null hypothesis of the model fit. The regression coefficients of this comprehensive parameter model are all higher than 0.3, which conveys a certain statistical importance. The overall accuracy is 83.9%. The analysis showed that max-visibility length and isovist compactness have dominant influences on the model (
Table 2). Through ROC curve analysis, the AUC value (
Figure 6g) of the comprehensive parameter model is 0.849 (
p < 0.05). According to experience, 0.7 < AUC < 0.9 is a prediction range with mid-level accuracy, showing that the model based on the comprehensive isovist parameter model has quite good predictive power. By taking advantage of the Youden index, we can conclude that the optimal probability division point of the comprehensive parameter model lies at approximately 0.51, which is close to the model’s default threshold value of 0.5. Therefore, we accept the predicted result judged by the model’s division point.
4.3. Analysis of Visual Entropy and Fractals
To gain a better understanding of how spatial attributes affect people’s emotion via their visible attributes, we further analyzed the photos taken by subjects at the locations, which show significant clustering effects as described earlier. When the human visual system perceives an image, attention is not evenly distributed. This uncertainty can be measured by the visual entropy. The concept of entropy was initially used to describe the confusion degree in thermodynamics and was introduced in information theory to represent the uncertainty of a signal source [
48]. Visual Entropy (VE) is a quantitative description that reflects the visual information perceived by a subject, namely—here, the visual complexity and richness of images in an urban context. Due to the extremely high complexity of urban spaces, it is difficult to accurately measure the geometrical parameters of all the details. Thus, this paper uses real digital photos of those effective sampling points and calculates the VE values from these photos. This method has been widely applied in many psychological experiments and is highly credible [
49,
50,
51,
52]. By processing the photos into a gray-scale map with 0–255 discrete values, this paper considers each gray-scale unit as a different signal from the image signal source. The overall VE is then calculated by the distribution of pixels of each gray-scale unit using the following formula:
where
H denotes the image’s overall VE and
Pi refers to the probability that every gray-scale pixel value appears. To eliminate noise, the threshold value was set to 3%. Signals less than the threshold value are considered not to be valid data. Only those regions where the quantity of the pixels is larger than the threshold value in the image are evaluated. To simplify the calculations, this paper divides the image’s gray-scale into 25 grades. The luminance information of the green wave band is quite sufficient as it possesses better image contrast [
53]. Consequently, the gray-scale map of this band is considered in this analysis.
Furthermore, the complexity of an urban spatial environment and its visual impact on subjects can also be measured by fractals [
54]. It has been argued that nature is a complicated system that has characteristics of irregularity and self-similarity [
55]. Mandelbrot described these unordered and fragmentized natural forms using the concept of a “fractal.” A fractal can exist in the form of a fraction within the Euclidean dimension. For instance, an irregular shoreline is neither a one-dimensional straight line nor a two-dimensional plane. Its fractal lies between those two dimensions and up to the inflection degree of the shoreline. The key to understanding fractals lies in the selection of the measurement scale. It requires different “scales” to measure objects with fractal characteristics as well as different quantities. The fractal dimension can describe the complexity and inflection degree of an image. The larger the fractal dimension is, the more complicated the image will be. The operations in this step were also based on the analysis of the real photos. The step was conducted using the boxing-counting method as follows. First, the images were resized to 1450 × 950 pixels. The borders of all photos were intensified and transformed into gray-scale maps. Taking the gray-scale value 128 as a segmentation point, the photos were further transformed into binary graphs containing only black and white pixels. Considering the two-dimensional grids on the images; when the side length is d, the quantity of effective grids in the white part is N(d). According to the fractal principle, N(d) is the power-exponent function of d. The formula is as follows [
56]:
For convenience of observation and calculation, a logarithmic transformation of Equation (8) was made, and the function is drawn on the double logarithmic coordinate graph.
D is the fractal of this image:
By adding the VE and the fractal together, a comprehensive visual index can be obtained as follows:
By alternately comparing the current comprehensive index with the previous index, we can observe the changing tendencies of the data, the formula for which is as follows:
where
is the variability index of the visual index and
VIi refers to the comprehensive visual index of the sampling site. The numbers 1 and 0 are used to represent the positive and negative symbols of the calculation, matching all the sampling locations’ variability indices with the valence.
In this analysis, 13 hot-spot clusters were selected from the P and N collections for sequencing (
Figure 7). Photos were matched with the shooting locations. We used a full frame camera with a 35 mm by 24 mm CCD and set the focal length to 50 mm, which results in images similar to a human field of vision. Then, based on the photos, we calculated the VE and fractal (
Figure 8). Generally, photos in which all the elements are well-defined and integrated, resulting in high fractal and VE values, seem important in causing positive emotions in people; locations with positive emotions tend to present images with a strong sense of order and richness. In these areas, the buildings are arranged neatly, and the images reflect enclosed space (No. 1 and No. 2). Other than a compact and neat isovist form, the richness of plant landscapes and greening hierarchy may influence positive emotions (No. 7 and No. 11). After the test, subjects reflected that it is easier to feel a sense of safety in such spaces. According to Maslow’s theory, safety is an important precondition for pleasure, and locations with negative emotions show comparatively weak spatial order, such as weak orientation and open space (No. 4 and No. 13). Although No. 8 and No. 10 locations present high values of fractal and VE, the continuity of space is damaged due to intervening roadblocks and junk, which may be among the reasons that led subjects to experience negative emotions. Furthermore, positive and negative emotions overlap with each other in some locations, such as the No. 3 location. Those photos show quite strong cityscape contrasts. Rich landscaped vegetation and rigid architectures appear on both the left and right sides of the image simultaneously. Therefore, the emotions of subjects in such locations may change based on the objects that currently hold their attention, causing the sampled emotions in these locations to vary.
Through visual analysis of every photo location,
Figure 9 shows that VE and fractal have obvious wave resonances and correlation. The Pearson coefficient of those two variables is 0.694 (the confidence level is 99%), showing a strong linear correlation. The two sets of data both have comparatively high values at the No. 7 and No. 11 locations—lush trees grow at these two locations. The visible buildings are low and mostly covered by greenery. In both images, the sky accounts for only a small proportion; the landscape dominates both photos. In contrast, the No. 5 and No. 6 locations have very low VE and fractal values. No. 6 is located at the end of a bridge across the railway and has a broad view. The landscape element has a comparatively flat visual depth because the bridge and sky account for the greater part of the view. There are few trees, and the sense of any enclosed space is weak. To properly analyze the visual differences under these two valence statuses, we divided the data for VE and fractals into two sets based on the valence and then conducted an independent
t-test to compare the mean values of the two sets. The results show no significant difference overall (
p > 0.05).
Additionally, valence and variability index correlate at 9 locations (
Table 3), accounting for 70% of the 13 total locations. To a certain extent, this supports the prior assumptions, namely, that changes in emotion cannot be judged merely by isovist parameters. In addition to the comprehensive impact of all kinds of visual factors, emotion changes in the subjects were also related to the sequence in which people experienced the spaces. In addition, the switching node (e.g., crossroads and street corners) usually has significant effects. Such an influence is, moreover, affected by time. When subjects enter the next switching space, they will consciously compare it with the former node and its spatial attributes. The comprehensive differences between these two sets of spatial attributes may constitute an important trigger for changes in emotion.